Outlier
A Data Object that deviates significantly from normal data objects as if it was generated by a different mechanism or distribution.
WARNING
Outliers are not noise, thus noise has to be filtered out before doing any outlier detection. Also it is really important to always justify the removal or detection of an outlier. This could be done by specifying the degree of an outlier which means the unlikelihodd of the object being generated by a normal mechanism
There are different types of outliers which outliers can fall into. They are not restricted to only one of those types:
- Global (Point Anomaly)
- Contextual (Conditional Outlier / Local Outlier)
- Deviates significantly based on a selected context
- Contextual Attributes vs. Behavioral Attributes
- Collective Outlier
- A subset that collecitvely deviates significantly
- Prior domain knowledge necessary
Detection
Can also be a by-product of Clustering or Regression or Isolation Forest.
Simple approaches
Values higher or lower than 1.5 * IQR Values where Z-score is higher than 3.
- Hampel
- Quantile