Overfitting and Underfitting

How to detect them

Underfitting Overfitting Complexity vs. Generalization Error

Train Validation Split
Train Validation Split

Splitting the Dataset into a validation and training set → you can use hyperparameter search on the validation set (not the test set!!!). If the Dataset is big enough everything is ok, with a small Dataset however it is difficult to find a good split ration.

A static split might lead to problems where the validation split is not representative (some data is not present in this split). → K-Fold Cross Validation
Link zum Original

K-Fold Cross Validation
K-Fold Cross Validation

shuffle data

split data in $k$ folds (with each having approximately the same size)

for each foldc(so $k$ times)

use the fold as validation set

use the rest to train the model

train the model

evaluate the model on the validation set → Performance Evaluation Metrics

Take the average of all folds

This procedure ensures that your model has been validated on every sample of your Dataset and not only on a simple 30-70 split.

A good value for $k$ is 10 (empirical analysis).

For very small data sets use Leave-One-Out K-Fold Cross Validation.

Example

Link zum Original

Marcs Notes

Explorer

Overfitting and Underfitting

Overfitting and Underfitting

How to detect them

Train Validation Split

Train Validation Split

K-Fold Cross Validation

K-Fold Cross Validation

Example

Graphansicht

Inhaltsverzeichnis

Backlinks