Robert_V._Hogg,_Joseph_W._McKean,_Allen_T._Craig

(Jacob Rumans) #1
634 Nonparametric and Robust Statistics

Table 10.8.1:Data for Example 10.8.1

Year 1500 m Marathon∗ Year 1500 m Marathon∗
1896 373.2 3530 1936 227.8 1759
1900 246.0 3585 1948 229.8 2092
1904 245.4 5333 1952 225.2 1383
1906 252.0 3084 1956 221.2 1500
1908 243.4 3318 1960 215.6 916
1912 236.8 2215 1964 218.1 731
1920 241.8 1956 1968 214.9 1226
1924 233.6 2483 1972 216.3 740
1928 233.2 1977 1976 219.2 595
1932 231.2 1896 1980 218.4 663
∗Actual marathon times are 2 hours + entry.

p-value = 3.319e-06; estimates: tau 0.6947368
The test results show strong evidence to reject the hypothesis of the independence
of the winning times of the races.

10.8.2 Spearman’sRho

As above, assume that (X 1 ,Y 1 ),(X 2 ,Y 2 ),...,(Xn,Yn) is a random sample from
a bivariate continuous cdfF(x, y). The population correlation coefficientρis a
measure of linearity betweenXandY. The usual estimate is the sample correlation
coefficient given by

r=

∑n
√ i=1(Xi−X)(Yi−Y)
∑n
i=1(Xi−X)
2

√∑
n
i=1(Yi−Y)
2

; (10.8.9)

see Section 9.7. A simple rank analog is to replaceXibyR(Xi), whereR(Xi)
denotes the rank ofXiamongX 1 ,...,Xn, and likewiseYibyR(Yi), whereR(Yi)
denotes the rank ofYiamongY 1 ,...,Yn. Upon making this substitution, the de-
nominator of the above ratio is a constant. This results in the statistic


rS=

∑n
i=1(R(Xi)−

n+1
2 )(R(Yi)−

n+1
2 )
n(n^2 −1)/ 12

, (10.8.10)

which is calledSpearman’s rho.ThestatisticrSis a correlation coefficient, so
the inequality− 1 ≤rS ≤1 is true. Further, as the following theorem shows,
independence implies that the mean ofrSis 0.


Theorem 10.8.3. Suppose(X 1 ,Y 1 ),(X 2 ,Y 2 ),...,(Xn,Yn)is a sample on(X, Y),
where(X, Y)has the continuous cdfF(x, y).IfXandY are independent, then
E(rS)=0.

Free download pdf