Overfitting and Underfitting
How to detect them
Underfitting Overfitting Complexity vs. Generalization Error
Train Validation Split
Train Validation Split
Splitting the Dataset into a validation and training set → you can use hyperparameter search on the validation set (not the test set!!!). If the Dataset is big enough everything is ok, with a small Dataset however it is difficult to find a good split ration.
A static split might lead to problems where the validation split is not representative (some data is not present in this split). → K-Fold Cross Validation
Link zum Original
K-Fold Cross Validation
K-Fold Cross Validation
- shuffle data
- split data in folds (with each having approximately the same size)
- for each foldc(so times)
- use the fold as validation set
- use the rest to train the model
- train the model
- evaluate the model on the validation set → Performance Evaluation Metrics
- Take the average of all folds
This procedure ensures that your model has been validated on every sample of your Dataset and not only on a simple 30-70 split.
A good value for is 10 (empirical analysis).
For very small data sets use Leave-One-Out K-Fold Cross Validation.
Example
Link zum Original