Introduction to Probability and Statistics for Engineers and Scientists

(Sean Pound) #1

526 Chapter 12:Nonparametric Hypothesis Tests


Suppose that we desire a significance levelαtest ofH 0. If the observed value ofTis
T=t, thenH 0 should be rejected if either


PH 0 {T≤t}≤

α
2

or PH 0 {T≥t}≤

α
2

(12.4.1)

That is, the hypothesis that the two samples are equivalent should be rejected if the sum of
the ranks from the first sample is either too small or too large to be explained by chance.
Since for integralt,


P{T≥t}= 1 −P{T<t}
= 1 −P{T≤t− 1 }

it follows from Equation 12.4.1 thatH 0 should be rejected if either


PH 0 {T≤t}≤

α
2

or PH 0 {T≤t− 1 }≥ 1 −

α
2

(12.4.2)

To compute the probabilities in Equation 12.4.2, letP(N,M,K) denote the prob-
ability that the sum of the ranks of the first sample will be less than or equal toK
when the sample sizes areNandMandH 0 is true. We will now determine a recur-
sive formula forP(N,M,K), which will then allow us to obtain the desired quantities
P(n,m,t)=PH 0 {T≤t}andP(n,m,t−1).
To compute the probability that the sum of the ranks of the first sample is less than or
equal toKwhenNandMare the sample sizes andH 0 is true, let us condition on whether
the largest of theN+Mdata values belongs to the first or second sample. If it belongs to
the first sample, then the sum of the ranks of this sample is equal toN+Mplus the sum
of the ranks of the otherN−1 values from the first sample. Hence this sum will be less
than or equal toKif the sum of the ranks of the otherN−1 values is less than or equal
toK−(N+M). But since the remainingN− 1 +M— that is, all but the largest —
values all come from the same distribution (whenH 0 is true), it follows that the sum of
the ranks ofN−1 of them will be less than or equal toK−(N+M) with probability
P(N−1,M,K−N−M). By a similar argument we can show that, given that the largest
value is from the second sample, the sum of the ranks of the first sample will be less than
or equal toKwith probabilityP(N,M−1,K). Also, since the largest value is equally
likely to be any of theN+MvaluesX 1 ,...,XN,Y 1 ,...,YM, it follows that it will come
from the first sample with probabilityN/(N+M). Putting these together, we thus obtain
that


P(N,M,K)=

N
N+M

P(N−1,M,K−N−M)

+

M
N+M

P(N,M−1,K) (12.4.3)
Free download pdf