Mini-Batch Gradient Descent

A widely used Gradient Descent algorithm that is SGD on batches. Instead of updating weights for every individual sample we create batches and accumulate all Gradients in a batch before doing the weight update.

This will result in some more speedup and smooths some of the stochasticity that is introduced by SGD.

The update rule is where is the batch size.