Overfitting and Underfitting

How to detect them

Underfitting Overfitting Complexity vs. Generalization Error

Train Validation Split

Train Validation Split

Splitting the Dataset into a validation and training set you can use hyperparameter search on the validation set (not the test set!!!). If the Dataset is big enough everything is ok, with a small Dataset however it is difficult to find a good split ration.

A static split might lead to problems where the validation split is not representative (some data is not present in this split). K-Fold Cross Validation

Link zum Original

K-Fold Cross Validation

K-Fold Cross Validation

  1. shuffle data
  2. split data in folds (with each having approximately the same size)
  3. for each foldc(so times)
    1. use the fold as validation set
    2. use the rest to train the model
    3. train the model
    4. evaluate the model on the validation set → Performance Evaluation Metrics
  4. Take the average of all folds

This procedure ensures that your model has been validated on every sample of your Dataset and not only on a simple 30-70 split.

A good value for is 10 (empirical analysis).

For very small data sets use Leave-One-Out K-Fold Cross Validation.

Example

Link zum Original