Hoaglin, Mosteller, and Tukey (1983) looked at the role of beta-endorphins in re-
sponse to stress. They were interested in testing whether beta-endorphin levels rose in
stressful situations. They recorded beta-endorphin levels in 19 patients 12 hours before
surgery and again, for the same patients, 10 minutes before surgery. The data^4 follow in
fmol/ml.
12 hours 10.0 6.5 8.0 12.0 5.0 11.5 5.0 3.5 7.5 5.8 4.7
10 min. 20.0 14.0 13.5 18.0 14.5 9.0 18.0 6.5 7.4 6.0 25.0
Difference 10.0 7.5 5.5 6.0 9.5 2 2.5 13.0 3.0 2 0.1 0.2 20.3
12 hours 8.0 7.0 17.0 8.8 17.0 15.0 4.4 2.0
10 min. 12.0 15.0 42.0 16.0 52.0 11.5 2.5 2.1
Difference 4.0 8.0 25.0 7.2 35.0 2 3.5 2 1.9 0.1
Because these are paired scores, we are primarily interested in the difference scores.
We want to test the null hypothesis that the average difference score was 0.0, which would
indicate that there was no change in endorphin levels on average. The difference scores are
shown in the bottom line of the table, where it is clear that most differences are positive,
and those that are negative are relatively small. If you were to plot the differences in this
example, you would find that they are very positively skewed, which might discourage us
from using a standard parametric ttest. Moreover, if we were particularly interested in the
median of the differences, a ttest would not be appropriate. We will solve our problem by
drawing on resampling statistics.
Our resampling procedure is based on the idea that if the null hypothesis is true, a
patient’s 10-minute score was just as likely to be larger than his 12-hour score as it was
to be smaller. If a patient has scores of 8.0 and 13.5, and if the null hypothesis is true, the
13.5 could just as likely come from the 12-hour measurement as from the 10-minute
measurement. Under H 0 each difference had an equal chance of being positive or nega-
tive. This tells us how to model what the data would look like under H 0. We will simply
draw a very large number of samples of 19 difference scores each, in such a way that the
difference score has a 50:50 chance of being positive or negative. For each sample we
will calculate the median of the differences, and then plot the sampling distribution of
these differences. Remember, this is the sampling distribution of the differences when
H 0 is true. We can compare our obtained median difference against this distribution to
test H 0.
The way that we will conduct this test using Simon and Bruce’s Resampling Statsis to
take all 19 difference scores and randomly attach the sign of the difference. (Assigning the
sign at random is exactly equivalent to randomly assigning one score to the 12-hour condi-
tion and the other to the 10-minute condition.) We will then calculate the median differ-
ence and store that. This procedure will be repeated many times (in this case, 10,000
times). The program and results are shown in Figure 18.4, with the resulting histogram in
Figure 18.5.
From Figure 18.4 we can see that the obtained median difference score was 6. From
either the frequency distribution in Figure 18.4 or the histogram in Figure 18.5 we see the
results of drawing 10,000 samples from a model in which the null hypothesis is true.
666 Chapter 18 Resampling and Nonparametric Approaches to Data
(^4) I have made two very trivial changes to avoid difference scores of 0.0, just to make the explanation easier. With
differences of zero, we normally simply remove those cases from the data.