Introduction to Probability and Statistics for Engineers and Scientists

(Sean Pound) #1

12.4The Two-Sample Problem 529


This is given by


rank ofX(i)=i+numberj:Yj<X(i)

Summing overigives


∑n

i= 1

rank ofX(i)=

∑n

i= 1

i+

∑n

i= 1

(numberj:Yj<X(i)) (12.4.4)

But since the order in which we add terms does not change the sum obtained, we see that


∑n

i= 1

rank ofX(i)=

∑n

i= 1

rank ofXi=T (12.4.5)

∑n

i= 1

(numberj:Yj<X(i))=

∑n

i= 1

(numberj:Yj<Xi)

Hence, from Equations 12.4.4 and 12.4.5, we obtain that


T=

∑n

i= 1

i+

∑n

i= 1

(numberj:Yj<Xi)

=

n(n+1)
2

+

∑n

i= 1

∑m

j= 1

Wij 

12.4.1 The Classical Approximation and Simulation


The difficulty with employing the recursion in Equation 12.4.3 to compute thep-value
of the two-sample sum of rank test statistic is that the amount of computation grows
enormously as the sample sizes increase. For instance, ifn=m=200, then even if we
choose the test statistic to be the smaller sum of ranks, since the sum of all the ranks is
1 + 2 +···+ 400 =80, 200, it is possible that the test statistic could have a value as large
as 40,100. Hence, there can be as many as 1. 604× 109 values ofP(N,M,K) that would
have to be computed to determine thep-value. Thus, for large sample sizes the approach
based on the recursion in Equation 12.4.3 is not viable. Two approximate methods that can
be utilized in such cases are (a) a classical method based on approximating the distribution
of the test statistic and (b) simulation.


(a)The Classical Approximation When the null hypothesis is true and soF=G,it
follows that alln+mdata values come from the same distribution and thus all
(n+m)! possible rankings of the valuesX 1 ,...,Xn,Y 1 ,...,Ymare equally likely.
Free download pdf