Topic: https://intoli.com/blog/neural-network-initialization/
hide preview ▲show preview ▼

What's next? verify your email address for reply notifications!

unverified 7y, 345d ago

Hello Andre!

I found this article very useful. I like the graphs you made for the distribution of weights throughout the layers (testing many parameters).

Do you have a reproducible code to share?

Thanks, Louis

remark link
hide preview ▲show preview ▼

What's next? verify your email address for reply notifications!

andre 7y, 336d ago

Thanks Louis! I put up the code used to generate the plots into a folder of our article materials GitHub repository.

hide preview ▲show preview ▼

What's next? verify your email address for reply notifications!

unverified 7y, 133d ago

Hello Andre ! Thanks for sharing your knowledge, it was really helpful.

hide preview ▲show preview ▼

What's next? verify your email address for reply notifications!

unverified 7y, 131d ago

Hello Andre, Very good explanation with cool graphs :)

hide preview ▲show preview ▼

What's next? verify your email address for reply notifications!

unverified 7y, 110d ago

Hello Andre, I really appreciate your post, it has been very useful.

I have a question about the post if you don't mind. Do you think this technique will be useful to initialize weights in Conv2D layers? I should assume that the number of inputs into each neuron is the kernel size, for example if kernel size is 3x3 the sigma = sqrt(2/9)?

remark link
hide preview ▲show preview ▼

What's next? verify your email address for reply notifications!

andre 7y, 102d ago [edited]

Thanks! Things should work fine for Conv2D layers. It sounds like you'd be using n_i = 9 here, but do keep in mind that most libraries do this for you automatically depending on the initializer you use. Check our Keras' glorot_uniform for example.

I also posted some code you can use as a reference on our intoli-article-materials repo.

hide preview ▲show preview ▼

What's next? verify your email address for reply notifications!

unverified 7y, 96d ago

Hello Andre! Thank you for this nice article. I was curious about the He initialization. In your article you write that we have to double the formula from the Xavier init (in short). This makes sense to me. But how come, that we also ignore the number of outgoing layers?

Greetings, Daniel

remark link
hide preview ▲show preview ▼

What's next? verify your email address for reply notifications!

andre 7y, 95d ago [edited]

Hi Daniel, thanks for the question. This is actually explained a bit in the He initialization paper. Check out the comparison between equations (10) and (14) towards the end of page 4. The important part is:

if the initialization properly scales the backward signal, then this is also the case for the forward signal

So you could use either one.

hide preview ▲show preview ▼

What's next? verify your email address for reply notifications!

3gtsqPKl 7y, 24d ago [edited]

Hello Andre, I found this article is very useful, and thanks for sharing your knowledge.

I have a question: for forward part, why , not ?

Same for the backward part, why could we assume that derivative of activation function f is 0?

Cheers!

remark link
hide preview ▲show preview ▼

What's next? verify your email address for reply notifications!

andre 6y, 355d ago

This is based on the simplifying assumption that the activation function behaves like f(X) = X in which case the derivative is 1. So definitely not applicable to all activation functions, but it still seems to give useful results regardless.

hide preview ▲show preview ▼

What's next? verify your email address for reply notifications!

unverified 6y, 356d ago

Hello Andre , It is not harmonic mean right it is reciprocal of the average of two consecutive layers

remark link
hide preview ▲show preview ▼

What's next? verify your email address for reply notifications!

andre 6y, 355d ago

It's actually the harmonic mean, but of 1/n_i and 1/n_{i+1} which are inverted in the denominator.

hide preview ▲show preview ▼

What's next? verify your email address for reply notifications!

unverified 6y, 268d ago

great article

hide preview ▲show preview ▼

What's next? verify your email address for reply notifications!

unverified 6y, 236d ago

Hi, your article is quite helpful for my current research and the issues I met.

Have you by any chance read this paper - Dying ReLU and Initialization: Theory and Numerical Examples?

Seems it largely reduces the chance of getting a dying ReLU network. However, the procedure seems quite complicated though.

hide preview ▲show preview ▼

What's next? verify your email address for reply notifications!