Outlier

A Data Object that deviates significantly from normal data objects as if it was generated by a different mechanism or distribution.

WARNING

Outliers are not noise, thus noise has to be filtered out before doing any outlier detection. Also it is really important to always justify the removal or detection of an outlier. This could be done by specifying the degree of an outlier which means the unlikelihodd of the object being generated by a normal mechanism

There are different types of outliers which outliers can fall into. They are not restricted to only one of those types:

  • Global (Point Anomaly)
  • Contextual (Conditional Outlier / Local Outlier)
    • Deviates significantly based on a selected context
    • Contextual Attributes vs. Behavioral Attributes
  • Collective Outlier
    • A subset that collecitvely deviates significantly
    • Prior domain knowledge necessary

Detection

Can also be a by-product of Clustering or Regression or Isolation Forest.

Simple approaches

Values higher or lower than 1.5 * IQR Values where Z-score is higher than 3.