Momentum Backpropagation

Momentum Backpropagation allows a dynamic change of the learning rate in a Neural Network by including a memory term in the weight update. This will lead to larger steps in the direction of similar Gradients and smaller steps in oscillating or zigzagging gradients. $Δ w_{k} = - η (g_{k} + α^{1} g_{k - 1} + α^{2} g_{k - 2} + α^{3} g_{k - 3} + \dots)$ The discounting factor $α$ should be between zero and one.

Via the Geometric Series we get for similar Gradients $Δ w = - \frac{η}{1 - α} g$ a way to speed up the steps.

And for oscillating gradients ( $(- 1)^{k} g$ ) we get $Δ w = - \frac{η}{1 + α} g$ a way to slow down the steps.

This method is especially helpful when we have flat regions in the parameters of the network.

It is used in the Adam optimizer.

Marcs Notes

Explorer

Momentum Backpropagation

Momentum Backpropagation

Graphansicht

Backlinks