Outlier Detection with Parametric Statistical Method
Example for Gaussian Process
We assume that data is generated from some distribution (e.g. Normal Distribution) Learn parameters from input data and identify points with low probability as Outliers.
Use Maximum Likelihood Estimation to estimate parameters of Distribution. Now we want to find good parameters to maximize this function.
General Procedure
Using the Log-Likelihood of the function makes it easier to calculate the derivatives.
Results
- Expectation is Arithmetic Mean
- Standard Deviation is square root of Sample Variance
Now we can use the property of a Normal Distribution that 3 Standard Deviations from the mean contain 99.7 percent of the data to calculate outliers.
Grubbs Test
Grubbs Test tests the hypothesis if the data contains no outliers or exactly one outlier.
For multivariate data:
Transform multivariate task into univariate problem.
Methods:
For normally distributed data we can use the chi-squared Test Statistic which is large if there is an outlier.
Mixture of multiple distributions
For example every datapoint can be generated by two normal distributions: Use Gaussian Mixture Expectation Maximization to estimate the parameters from the data.