CK-12 Probability and Statistics - Advanced

(Marvins-Underground-K-12) #1

12.2. The Rank Sum Test and Rank Correlation http://www.ck12.org


When performing the rank sum test, there are several different conditions that need to be met. These include:



  • Although the population need not be normally distributed or have homogeneity of variance, the observations
    must be continuously distributed.

  • That the samples drawn from the population are independent of one another.

  • That the samples have 5 or more observations. The samples do not need to have the same number of
    observations.

  • The observations must be on a numeric or ordinal scale. They cannot be categorical variables.


Since the rank sum test evaluates both the median and the distribution of two independent samples, we establish two
null hypotheses. Our null hypotheses state that the two medians and the distributions of the independent samples are
equal. Symbolically, we could say thatHo:m 1 =m 2 andσ 1 =σ 2. The alternative hypotheses state that there is a
difference in the median and the standard deviations of the samples.


Calculating the Mean and the Standard Deviation of Rank to Calculate a Z-Score


When performing the rank sum test, we need to calculate a figure known as theUstatistic. This statistic takes both
the median and the total distribution of the two samples into account. TheUstatistic actually has its own distribution
which we use when working with small samples (in this test a ’small sample’ is defined as a samplelessthan 20
observations). This distribution is used in the same way that we would use thetand the chi-square distributions.
Similar to thetdistribution, theUdistribution approaches the normal distribution as the size of both samples grows.
When we have samples of 20 or more, we do not use theUdistribution. Instead, we use theUstatistic to calculate
the standardzscore.


To calculate theUscore we must first arrange and rank the data from our two independent samples. First, we must
rank all values from both samples from low to high without regard to which sample each value belongs to. If two
values are the same, then they both get the average of the two ranks for which they tie. The smallest number gets a
rank of 1 and the largest number gets a rank ofnwherenis the total number of values in the two groups. After we
arrange and rank the data in each of the samples, we sum the ranks assigned to the observations. We record both the
sum of these ranks and the number of observations in each of the samples. After we have this information, we can
use the following formulas to determine theUstatistic:

Free download pdf