Stochastic Gradient Descent
A Gradient Descent algorithm that does weight updates for each training sample. The weight update rule is where Over one epoch SGD accumulates to Full-Batch Gradient Descent.
We can rewrite the update rule to Here we can see that we essentially have Full-Batch Gradient Descent but we subtract some stochastic values in each iteration.
An approximation of SGD is called Vario-Eta Learning.
Advantages
- It converges faster because of more frequent weight updates
- especially for redundant data
- SGD can escape shallow local minima by stochasticity
- This method can be used to do Online Learning by partially updating the weights for a new training sample.