Introduction to Probability and Statistics for Engineers and Scientists

504 Chapter 11:Goodness of Fit Tests and Categorical Data Analysis

Therefore, the hypothesis that the percentages of women who feel they are being abused
on the job are the same for these countries is rejected at the 1 percent level of significance
(and, indeed, at any significance level above .02 percent). ■



Suppose now thatY 1 ,...,Ynrepresents sample data from a continuous distribution, and
suppose that we wish to test the null hypothesisH 0 thatFis the population distribution,
whereFis a specified continuous distribution function. One approach to testingH 0 is to
break up the set of possible values of theYjintokdistinct intervals, say,

(y 0 ,y 1 ), (y 1 ,y 2 ),...,(yk− 1 ,yk), wherey 0 =−∞,yk=+∞

and then consider the discretized random variablesYjd,j=1,...,n, defined by

Yjd=i ifYjlies in the interval (yi− 1 ,yi)

The null hypothesis then implies that

P{Yjd=i}=F(yi)−F(yi− 1 ), i=1,...,k

and this can be tested by the chi-square goodness of fit test already presented.
There is, however, another way of testing that theYjcome from the continuous dis-
tribution functionFthat is generally more efficient than discretizing; it works as follows.
After observingY 1 ,...,Yn, letFebe the empirical distribution function defined by



That is,Fe(x) is the proportion of the observed values that are less than or equal tox.
BecauseFe(x) is a natural estimator of the probability that an observation is less than or
equal tox, it follows that, if the null hypothesis thatFis the underlying distribution is
correct, it should be close toF(x). Since this is so for allx, a natural quantity on which to
base a test ofH 0 is the test quantity



where the maximum is over all values ofxfrom−∞to+∞. The quantityDis called the
Kolmogorov–Smirnov test statistic.

* Optional section.
