Stochastic Gradient Descent

A Gradient Descent algorithm that does weight updates for each training sample. The weight update rule is where Over one epoch SGD accumulates to Full-Batch Gradient Descent.

We can rewrite the update rule to Here we can see that we essentially have Full-Batch Gradient Descent but we subtract some stochastic values in each iteration.

An approximation of SGD is called Vario-Eta Learning.

Advantages

  • It converges faster because of more frequent weight updates
    • especially for redundant data
  • SGD can escape shallow local minima by stochasticity
  • This method can be used to do Online Learning by partially updating the weights for a new training sample.