12.4The Two-Sample Problem 529
This is given by
rank ofX(i)=i+numberj:Yj<X(i)
Summing overigives
∑n
i= 1
rank ofX(i)=
∑n
i= 1
i+
∑n
i= 1
(numberj:Yj<X(i)) (12.4.4)
But since the order in which we add terms does not change the sum obtained, we see that
∑n
i= 1
rank ofX(i)=
∑n
i= 1
rank ofXi=T (12.4.5)
∑n
i= 1
(numberj:Yj<X(i))=
∑n
i= 1
(numberj:Yj<Xi)
Hence, from Equations 12.4.4 and 12.4.5, we obtain that
T=
∑n
i= 1
i+
∑n
i= 1
(numberj:Yj<Xi)
=
n(n+1)
2
+
∑n
i= 1
∑m
j= 1
Wij
12.4.1 The Classical Approximation and Simulation
The difficulty with employing the recursion in Equation 12.4.3 to compute thep-value
of the two-sample sum of rank test statistic is that the amount of computation grows
enormously as the sample sizes increase. For instance, ifn=m=200, then even if we
choose the test statistic to be the smaller sum of ranks, since the sum of all the ranks is
1 + 2 +···+ 400 =80, 200, it is possible that the test statistic could have a value as large
as 40,100. Hence, there can be as many as 1. 604× 109 values ofP(N,M,K) that would
have to be computed to determine thep-value. Thus, for large sample sizes the approach
based on the recursion in Equation 12.4.3 is not viable. Two approximate methods that can
be utilized in such cases are (a) a classical method based on approximating the distribution
of the test statistic and (b) simulation.
(a)The Classical Approximation When the null hypothesis is true and soF=G,it
follows that alln+mdata values come from the same distribution and thus all
(n+m)! possible rankings of the valuesX 1 ,...,Xn,Y 1 ,...,Ymare equally likely.