Stable Models

As we work with overparameterized models we have a whole continous manifold of minima. To find a unique minimum we want to focus on modelling stable models.

A model is stable when removing one datapoint from the training set does not have a big impact on the overall gradient distribution.

In a local minimum we always have a fixed balanced force field of the gradients. If we didn’t have this balanced field we could still improve in the direction of the gradients and would therefore not be in a local minimum.

Doing a Taylor Expansion we get for Pattern by Pattern Learning: $⟨ E (w)⟩ = \frac{1}{T} \sum_{t} E (w + Δ_{t}) = E (w) + \frac{η ^{2}}{2} \sum_{i} var (g_{i t}) \frac{\partial ^{2} E}{\partial w _{i}^{2}}$ Here we have a Smoothness Penalty weighted by the Variance of the Gradient for sample $t$ and weight $i$ . So with a high variance gradient we try to enforce smoother weights such that the next gradient will be of less variance.

The same can be done for Vario-Eta Learning: $⟨ E (w)⟩ = \frac{1}{T} \sum_{t} E (w + Δ w_{t}) = E (w) + \frac{η ^{2}}{2} \sum_{i} \frac{\partial ^{2} E}{\partial w _{i}^{2}}$ where we have a global unconditional penalty.

Marcs Notes

Explorer

Stable Models

Stable Models

Graphansicht