Chi-Squared GOF Test for finitely many values
- data sequence size
- frequencies of Item Expression (in Random Variable )
- probability of Item Expression
- Some data sequence from unknown CDF .
- A known discrete CDF
Given the Test Statistic is small. Thats why the p-Value is given by
We can simulate the test statistic by n-fold drawing with replacement from balls with probability .
Or use the fact that for the CDFs of the test statistic will converge to the Chi-Squared Distribution with Degrees of Freedom. This approximation is only good if the Class Condition is met. Otherwise use simulation above.
Example Code
Estimate parameters with Plug-In-Method.
N = 6
n = 100
# Data with unknown probability
x = rbinom(n, 5, 0.8)
# Estimate probability
pt = mean(x) / N
# Use estimates proability to get "real" probabilities for all N outcomes
p = dbinom(0:N, N, pt)
# Calculuate the empirical relative frequencies (probabilities) for our data
tab = c()
for (i in 0:N){
tab[i+1] = sum(x == i)
}
# Calculate Test statistic (more or less: difference between real and empirical probabilities)
T = sum((tab - n*p)**2 / (n*p))
T
# Use CDF of chi squared to determine p-Value of test statistic
pval = 1 - pchisq(T, df=N-1)
pval