Kolmogorov Smirnov Test

Should be used if data sequence is continous.

What can we test with this test?

  • GOF between two samples
  • If one of two samples is stochastically bigger or smaller than the other sample
  • GOF for one sample to a certain CDF
  • Using Lilliefors Test of Normality we can test if a sample comes from a class of normal distributions

GOF

The Null Hypothesis is

The Test Statistic is given by the theorem of Glivenko-Cantelli with

Given the test statistic should be small.

We can implement a supremum function in R like this:

sup = function(x, F){
  fn = ecdf(x)
  xi = c(knots(fn),Inf)
  eta = c(-Inf, knots(fn))
  max(abs(fn(xi) - F(xi)), abs(fn(eta) - F(xi)))
}

where knots gives the jump positions of a function. One can then use the function to get the supremum of the differences between the ECDF and a CDF like this:

sup(x, function(t) punif(t))

We have for the slightly changed Test Statistic (multiplied by ) : where is the Kolmogorov Distribution.


We can also test the Null Hypothesis with the Test Statistic for which we can simulate the Distribution with any continous distribution as it is independent on the underlying distribution if it is continous.

Now for large the distributon will have an explicit approximate form, a variation of the Kolmogorov Distribution:

How to perform the test

Test of Equality of two continous distributions

Test Statistic

In R

Test if equal

x = rnorm(100)
y = rnorm(100, 0.2, 1.1)
 
ks.test(x=x, y=y, alternative="two.sided")

Test if equal to some CDF

x = rnorm(100)
 
ks.test(x=x, "norm", alternative="two.sided")

Test if less or greater:

x = rnorm(100)
y = rnorm(100, 0, -1)
 
ks.test(x=x, y=y, alternative="l")
ks.test(x=x, y=y, alternative="g")