Hello Andre!
I found this article very useful. I like the graphs you made for the distribution of weights throughout the layers (testing many parameters).
Do you have a reproducible code to share?
Thanks, Louis
Thanks Louis! I put up the code used to generate the plots into a folder of our article materials GitHub repository.
Hello Andre, I really appreciate your post, it has been very useful.
I have a question about the post if you don't mind. Do you think this technique will be useful to initialize weights in Conv2D layers? I should assume that the number of inputs into each neuron is the kernel size, for example if kernel size is 3x3 the sigma = sqrt(2/9)?
Thanks! Things should work fine for Conv2D layers. It sounds like you'd be using n_i = 9 here, but do keep in mind that most libraries do this for you automatically depending on the initializer you use. Check our Keras' glorot_uniform for example.
I also posted some code you can use as a reference on our intoli-article-materials repo.
Hello Andre! Thank you for this nice article. I was curious about the He initialization. In your article you write that we have to double the formula from the Xavier init (in short). This makes sense to me. But how come, that we also ignore the number of outgoing layers?
Greetings, Daniel
Hi Daniel, thanks for the question. This is actually explained a bit in the He initialization paper. Check out the comparison between equations (10) and (14) towards the end of page 4. The important part is:
if the initialization properly scales the backward signal, then this is also the case for the forward signal
So you could use either one.
Hi, your article is quite helpful for my current research and the issues I met.
Have you by any chance read this paper - Dying ReLU and Initialization: Theory and Numerical Examples?
Seems it largely reduces the chance of getting a dying ReLU network. However, the procedure seems quite complicated though.