Introduction to Probability and Statistics for Engineers and Scientists

(Sean Pound) #1

506 Chapter 11:Goodness of Fit Tests and Categorical Data Analysis


Similarly, the maximum value ofF(x)−Fe(x) is also nonnegative and occurs immediately
before one of the jump pointsy(j); and so


Maximum
x

{F(x)−Fe(x)}=Maximum
j=1,...,n

{
F(y(j))−

j− 1
n

}
(11.6.2)

From Equations 11.6.1 and 11.6.2, we see that


D=Maximum
x

|Fe(x)−F(x)|

=Maximum{Maximum{Fe(x)−F(x)}, Maximum{F(x)−Fe(x)}}

=Maximum

{
j
n

−F(y(j)),F(y(j))−

j− 1
n

,j=1,...,n

}
(11.6.3)

Equation 11.6.3 can be used to compute the value ofD.
Suppose now that theYjare observed and their values are such thatD=d. Since a
large value ofDwould appear to be inconsistent with the null hypothesis thatFis the
underlying distribution, it follows that thep-value for this data set is given by


p-value=PF{D≥d}

where we have writtenPFto make explicit that this probability is to be computed under
the assumption thatH 0 is correct (and soFis the underlying distribution).
The abovep-value can be approximated by a simulation that is made easier by the
following proposition, which shows thatPF{D≥d}does not depend on the underlying
distributionF. This result enables us to estimate thep-value by doing the simulation
with any continuous distributionFwe choose [thus allowing us to use the uniform (0, 1)
distribution].


PROPOSITION 11.6.1
PF{D≥d}is the same for any continuous distributionF.


Proof

PF{D≥d}=PF

{
Maximum
x

∣∣
∣∣#i:Yi≤x
n

−F(x)

∣∣
∣∣≥d

}

=PF

{
Maximum
x


∣∣

#i:F(Yi)≤F(x)
n

−F(x)


∣∣
∣≥d

}

=P

{
Maximum
x


∣∣

#i:Ui≤F(x)
n

−F(x)


∣∣
∣≥d

}
Free download pdf