Neural Network Weight Initialization

To get a good learning network we want to start approximating only in the linear parts of the Activation Function (tanh) and then slowly move towards the saturating ends to add nonlinearities into the mix.

where is the number of weights.

Proof

We want to keep the weights relatively small with Mean around zero and Standard Deviation of one We can easily calculate the Variance and Standard Deviation of the weights with respect to the overall domain . We then use the Strong Law of Large Numbers to calculate with and We should thus choose values of in to get good learning.

More Ressources