Full-Batch Gradient Descent

A Gradient Descent algorithm that uses the entire dataset to calculate one weight update. The update is $Δ w = η \cdot (- g)$ with $g_{t} = \frac{\partial E _{t}}{\partial w}, g = \frac{1}{T} \sum_{t = 1}^{T} g_{t} .$

As you can see this can become computationally expensive when the size of the data set grows. For each update we would need to calculate all Gradients.

Faster methods are Mini-Batch Gradient Descent or Stochastic Gradient Descent which both do more frequent weight updates instead.

Proof

With Taylor Expansion we can show

E(w+\Delta w) & =E(w)+g^T \Delta w+\frac{1}{2} \Delta w^T G \Delta w \\ & =E(w)-\eta g^T g+\frac{\eta^2}{2} g^T G g<E(w) \quad \text { for } \eta \text { small } \end{aligned}$$ So, the error decreases with every weight update.

Marcs Notes

Explorer

Full-Batch Gradient Descent

Full-Batch Gradient Descent

Proof

Graphansicht

Inhaltsverzeichnis

Backlinks