Mini-Batch Gradient Descent

A widely used Gradient Descent algorithm that is SGD on batches. Instead of updating weights for every individual sample we create batches and accumulate all Gradients in a batch before doing the weight update.

This will result in some more speedup and smooths some of the stochasticity that is introduced by SGD.

The update rule is $Δ w = - η (\frac{1}{batch} \sum_{t = 1}^{batch} g_{t})$ where $1 \leq batch \leq #data$ is the batch size.