Kolmogorov Smirnov Test
Should be used if data sequence is continous.
What can we test with this test?
- GOF between two samples
- If one of two samples is stochastically bigger or smaller than the other sample
- GOF for one sample to a certain CDF
- Using Lilliefors Test of Normality we can test if a sample comes from a class of normal distributions
GOF
The Null Hypothesis is
The Test Statistic is given by the theorem of Glivenko-Cantelli with
Given the test statistic should be small.
We can implement a supremum function in R like this:
sup = function(x, F){
fn = ecdf(x)
xi = c(knots(fn),Inf)
eta = c(-Inf, knots(fn))
max(abs(fn(xi) - F(xi)), abs(fn(eta) - F(xi)))
}
where knots
gives the jump positions of a function.
One can then use the function to get the supremum of the differences between the ECDF and a CDF like this:
sup(x, function(t) punif(t))
We have for the slightly changed Test Statistic (multiplied by ) : where is the Kolmogorov Distribution.
We can also test the Null Hypothesis with the Test Statistic for which we can simulate the Distribution with any continous distribution as it is independent on the underlying distribution if it is continous.
Now for large the distributon will have an explicit approximate form, a variation of the Kolmogorov Distribution:
How to perform the test
- Formulate Null Hypothesis
- Fix probability of Type 1 Error
- Determine Quantile of the Kolmogorov Distribution
- Perform experiment and calculate Test Statistic
- accept if
Test of Equality of two continous distributions
In R
Test if equal
x = rnorm(100)
y = rnorm(100, 0.2, 1.1)
ks.test(x=x, y=y, alternative="two.sided")
Test if equal to some CDF
x = rnorm(100)
ks.test(x=x, "norm", alternative="two.sided")
Test if less or greater:
x = rnorm(100)
y = rnorm(100, 0, -1)
ks.test(x=x, y=y, alternative="l")
ks.test(x=x, y=y, alternative="g")