12.4The Two-Sample Problem 529
This is given by
rank ofX(i)=i+numberj:Yj<X(i)Summing overigives
∑ni= 1rank ofX(i)=∑ni= 1i+∑ni= 1(numberj:Yj<X(i)) (12.4.4)But since the order in which we add terms does not change the sum obtained, we see that
∑ni= 1rank ofX(i)=∑ni= 1rank ofXi=T (12.4.5)∑ni= 1(numberj:Yj<X(i))=∑ni= 1(numberj:Yj<Xi)Hence, from Equations 12.4.4 and 12.4.5, we obtain that
T=∑ni= 1i+∑ni= 1(numberj:Yj<Xi)=n(n+1)
2+∑ni= 1∑mj= 1Wij 12.4.1 The Classical Approximation and Simulation
The difficulty with employing the recursion in Equation 12.4.3 to compute thep-value
of the two-sample sum of rank test statistic is that the amount of computation grows
enormously as the sample sizes increase. For instance, ifn=m=200, then even if we
choose the test statistic to be the smaller sum of ranks, since the sum of all the ranks is
1 + 2 +···+ 400 =80, 200, it is possible that the test statistic could have a value as large
as 40,100. Hence, there can be as many as 1. 604× 109 values ofP(N,M,K) that would
have to be computed to determine thep-value. Thus, for large sample sizes the approach
based on the recursion in Equation 12.4.3 is not viable. Two approximate methods that can
be utilized in such cases are (a) a classical method based on approximating the distribution
of the test statistic and (b) simulation.
(a)The Classical Approximation When the null hypothesis is true and soF=G,it
follows that alln+mdata values come from the same distribution and thus all
(n+m)! possible rankings of the valuesX 1 ,...,Xn,Y 1 ,...,Ymare equally likely.