Chapter 8 Regression and Correlation 333
a large value might be followed by a small value, or large and small values
could be clustered together (see Figure 8-19). In these cases, the residuals do
not show independence, because you can predict what the sign of the next
value will be on the basis of the current value.
Figure 8-19
Residuals versus
predicted
values
residuals with alternating signs residuals of the same sign grouped together
In examining residuals, we can examine the sign of the values (either
positive or negative) and determine how many values with the same sign
are clustered together. These groups of similarly signed values are called
runs. For example, consider a data set of 10 residuals containing 5 positive
values and 5 negative values. The values could follow an order with only
two runs, such as
1 1 1 1 1 2 2 2 2 2
In this case, we would suspect that the residuals were not independent,
because the positives and negatives are clustered together in the sequence.
On the other hand, we might have the opposite problem, where there could
be as many as ten runs, such as
1 2 1 2 1 2 1 2 1 2
Here, we suspect the residuals are not independent, because the re-
siduals are constantly switching sign. Finally, we might have something
in-between, such as
1 1 2 2 2 1 1 2 2 1
which has fi ve runs. If the number of runs is very large or very small, we
would suspect that the residuals are not independent. How large (or how
small) does this value have to be? Using probability theory, statisticians
have calculated the p values for a runs test, associated with the number of
runs observed for different sample sizes. If we let n be the sample size, n 1
be the number of positive values, and n 2 be the number of negative values,
the expected number of runs μ is
m5
2 n 1 n 2
n