Introduction to Probability and Statistics for Engineers and Scientists

(Sean Pound) #1

*11.6The Kolmogorov–Smirnov Goodness of Fit Test for Continuous Data 507


whereU 1 ,...,Unare independent uniform (0, 1) random variables: the first equality
following becauseFis an increasing function and soY≤xis equivalent toF(Y)≤F(x);
and the second because of the result (whose proof is left as an exercise) that ifYhas the
continuous distributionFthen the random variableF(Y) is uniform on (0, 1).
Continuing the above, we see by lettingy=F(x) and noting that asxranges from
−∞to+∞,F(x) ranges from 0 to 1, that


PF{D≥d}=P

{
Maximum
0 ≤y≤ 1

∣∣
∣∣#i:Ui≤y
n

−y

∣∣
∣∣≥d

}

which shows that the distribution ofD, whenH 0 is true, does not depend on the actual
distributionF. ■


It follows from the above proposition that after the value ofDis determined from the
data, say,D=d, thep-value can be obtained by doing a simulation with the uniform
(0, 1) distribution. That is, we generate a set ofnrandom numbersU 1 ,...,Unand then
check whether or not the inequality


Maximum
0 ≤y≤ 1


∣∣

#i:Ui≤y
n

−y


∣∣
∣≥d

is valid. This is then repeated many times and the proportion of times that it is valid is
our estimate of thep-value of the data set. As noted earlier, the left side of the inequality
can be computed by ordering the random numbers and then using the identity


Max


∣∣

#i:Ui≤y
n

−y


∣∣
∣=Max

{
j
n

−U(j),U(j)−

(j−1)
n

,j=1,...,n

}

whereU(j)is thejth smallest value ofU 1 ,...,Un. For example, ifn= 3 andU 1 =.7,U 2 =
.6,U 3 =.4, thenU(1)=.4,U(2)=.6,U(3)=.7 and the value ofDfor this data set is


D=Max

{
1
3

−.4,

2
3

−.6, 1−.7, .4, .6−

1
3

,.7−

2
3

}
=.4

A significance levelαtest can be obtained by considering the quantityD∗defined by

D∗=(


n+.12+.11/


n)D

Lettingdα∗be such that


PF{D∗≥dα∗}=α

then the following are accurate approximations fordα∗for a variety of values:


d.1∗=1.224, d.05∗ =1.358, d.025∗ =1.480, d.01∗ =1.626
Free download pdf