Introduction to Probability and Statistics for Engineers and Scientists

484 Chapter 11:Goodness of Fit Tests and Categorical Data Analysis

to the problem of testing that sample data come from a specified probability distribution,
which we now assume is continuous. Rather than discretizing the data so as to be able to
use the test of Section 11.2, we treat the data as given and make use of theKolmogorov–
Smirnov test.

11.2 Goodness of Fit Tests When all Parameters are Specified

Suppose thatnindependent random variables —Y 1 ,...,Yn, each taking on one of the
values 1, 2,...,k— are to be observed and we are interested in testing the null hypothesis
that{pi,i=1,...,k}is the probability mass function of theYj. That is, ifYrepresents
any of theYj, then the null hypothesis is

H 0 :P{Y=i}=pi, i=1,...,k

whereas the alternative hypothesis is

H 1 :P{Y=i}=pi, for samei=1,...,k

To test the foregoing hypothesis, letXi,i=1,...,k, denote the number of theYj’s that
equali. Then as eachYjwill independently equaliwith probabilityP{Y=i}, it follows
that, underH 0 ,Xiis binomial with parametersnandpi. Hence, whenH 0 is true,


and so (Xi−npi)^2 will be an indication as to how likely it appears thatpiindeed equals
the probability thatY =i. When this is large, say, in relationship tonpi, then it is an
indication thatH 0 is not correct. Indeed such reasoning leads us to consider the following
test statistic:



i= 1



and to reject the null hypothesis whenTis large.
To determine the critical region, we need first specify a significance levelαand then
we must determine that critical valuecsuch that

PH 0 {T≥c}=α

That is, we need determinecso that the probability that the test statisticTis at least as
large asc, whenH 0 is true, isα. The test is then to reject the hypothesis, at theαlevel of
significance, whenT≥cand to accept whenT<c.
It remains to determinec. The classical approach to doing so is to use the result that
whennis largeTwill have, whenH 0 is true, approximately (with the approximation

