Outlier Detection with Parametric Statistical Method

Example for Gaussian Process

We assume that data is generated from some distribution (e.g. Normal Distribution) Learn parameters from input data and identify points with low probability as Outliers.

Use Maximum Likelihood Estimation to estimate parameters of Distribution. Now we want to find good parameters to maximize this function.

General Procedure

  1. Generate gradients for each parameter
  2. Solve each Gradient by setting it to zero

Using the Log-Likelihood of the function makes it easier to calculate the derivatives.

Results

Now we can use the property of a Normal Distribution that 3 Standard Deviations from the mean contain 99.7 percent of the data to calculate outliers.

Grubbs Test

Grubbs Test tests the hypothesis if the data contains no outliers or exactly one outlier.

For multivariate data:

Transform multivariate task into univariate problem.

Methods:

For normally distributed data we can use the chi-squared Test Statistic which is large if there is an outlier.

Mixture of multiple distributions

For example every datapoint can be generated by two normal distributions: Use Gaussian Mixture Expectation Maximization to estimate the parameters from the data.