Neural Network Weight Initialization

To get a good learning network we want to start approximating only in the linear parts of the Activation Function (tanh) and then slowly move towards the saturating ends to add nonlinearities into the mix.

$w_{i} \in (- \frac{3}{n}, + \frac{3}{n})$ where $n$ is the number of weights.

Proof

We want to keep the weights relatively small with Mean around zero and Standard Deviation of one $tanh (σ (\sum_{i = 1}^{n} w_{i} x_{i}) =! 1)$ We can easily calculate the Variance and Standard Deviation of the weights with respect to the overall domain $(- r, + r)$ . $σ^{2} (w) = \frac{1}{2 r} \int_{- r}^{+ r} (w - 0)^{2} d w = \frac{r ^{2}}{3} \Rightarrow σ (w_{i}) = \frac{r}{3} & σ (x_{i}) = \frac{1}{3}$ We then use the Strong Law of Large Numbers to calculate $r$ with $\Rightarrow σ (\sum_{i = 1}^{n} w_{i} x_{i}) =! n σ (w_{i} x_{i}) = n \frac{r}{3} \approx 1$ and $r = \frac{3}{n}$ We should thus choose values of $w$ in $(- r, + r)$ to get good learning.

Marcs Notes

Explorer

Neural Network Weight Initialization

Neural Network Weight Initialization

Proof

More Ressources

Graphansicht

Inhaltsverzeichnis