Stochastic Gradient Descent

A Gradient Descent algorithm that does weight updates for each training sample. The weight update rule is $Δ w_{t} = - η g_{t}$ where $g_{t} = \frac{\partial E _{t}}{\partial w} .$ Over one epoch SGD accumulates to Full-Batch Gradient Descent.

We can rewrite the update rule to $Δ w_{t} = - η g_{t} = - η g - η (g_{t} - g) .$ Here we can see that we essentially have Full-Batch Gradient Descent but we subtract some stochastic values in each iteration.