Robert_V._Hogg,_Joseph_W._McKean,_Allen_T._Craig

(Jacob Rumans) #1
636 Nonparametric and Robust Statistics

whereγ=P[(X 2 −X 1 )(Y 3 −Y 1 )>0]. For largen,E(rS)≈6(γ− 1 /2), which is a
harder parameter to interpret than the measure of concordanceτ.
Spearman’s rho is based on Wilcoxon scores; hence, it can easily be extended to
other rank score functions. Some of these measures are discussed in the exercises.


Remark 10.8.1(Confidence Intervals).Distribution-free confidence intervals for
Kendall’sτexist; see, Section 8.5 of Hollander and Wolfe (1999). As outlined in
Exercise 10.8.6, it is easy to construct percentile bootstrap confidence intervals for
both parameters. The R functioncor.boot.ciin the CRAN packagenpsmobtains
such confidence intervals; see Section 4.8of Kloke and McKean (2014) for discussion.
It also requires the CRAN packagebootdeveloped by Canty and Ripley (2017).
We used this function to compute confidence intervals forτandρS:
library(boot); library(npsm)
cor.boot.ci(m1500,marathon,method="spearman"); # (0.719,0.955)
cor.boot.ci(m1500,marathon,method="kendall"); # (0.494,0.845)


EXERCISES


10.8.1.Show that Kendall’sτsatisfies the inequality− 1 ≤τ≤1.


10.8.2.Consider Example 10.8.1. LetY= winning times of the 1500 m race for a
particular year and letX= winning times of the marathon for that year. Obtain
a scatterplot ofYversusX, and determine the outlying point.


10.8.3.Consider the last exercise as a regression problem. Suppose we are inter-
ested in predicting the 1500 m winning time based on the marathon winning time.
Assume a simple linear model and obtain the least squares and Wilcoxon (Section
10.7) fits of the data. Overlay the fits on the scatterplot obtained in Exercise 10.8.2.
Comment on the fits. What does the slope parameter mean in this problem?


10.8.4.With regards to Exercise 10.8.3, a more interesting predicting problem is
the prediction of winning time of either race based on year.


(a)Make a scatterplot of the winning 1500 m race time versus year. Assume a
simple linear model (does the assumption make sense?) and obtain the least
squares and Wilcoxon (Section 10.7) fits of the data. Overlay the fits on the
scatterplot. Comment on the fits. What does the slope parameter mean in this
problem? Predict the winning time for 1984. How close was your prediction
to the true winning time?

(b)Same as part (a), except use the winning time of the marathon for that year.

10.8.5.Spearman’s rho is a rank correlation coefficient based on Wilcoxon scores.
In this exercise we consider a rank correlation coefficient based on a general score
function. Let (X 1 ,Y 1 ),(X 2 ,Y 2 ),...,(Xn,Yn) be a random sample from a bivariate
continuous cdfF(x, y). Leta(i)=φ(i/(n+ 1)), where


∑n
i=1a(i) = 0. In particular,
Free download pdf