Sampling
Obtain a set of samples / subset from the whole Dataset that represents it really well.
It allows algorithms to run faster maybe even sub-linear.
Methods
- random sampling → may have poor performance in the presence of skew (Central Tendency)
- sample without repetition
- Stratified Sampling