Data Analysis with Microsoft Excel: Updated for Office 2007

Chapter 8 Regression and Correlation 333

a large value might be followed by a small value, or large and small values could be clustered together (see Figure 8-19). In these cases, the residuals do not show independence, because you can predict what the sign of the next value will be on the basis of the current value.

Figure 8-19
Residuals versus
predicted
values

residuals with alternating signs residuals of the same sign grouped together

In examining residuals, we can examine the sign of the values (either positive or negative) and determine how many values with the same sign are clustered together. These groups of similarly signed values are called runs. For example, consider a data set of 10 residuals containing 5 positive values and 5 negative values. The values could follow an order with only two runs, such as

1 1 1 1 1 2 2 2 2 2 In this case, we would suspect that the residuals were not independent, because the positives and negatives are clustered together in the sequence. On the other hand, we might have the opposite problem, where there could be as many as ten runs, such as

1 2 1 2 1 2 1 2 1 2 Here, we suspect the residuals are not independent, because the residuals are constantly switching sign. Finally, we might have something in-between, such as

1 1 2 2 2 1 1 2 2 1 which has fi ve runs. If the number of runs is very large or very small, we would suspect that the residuals are not independent. How large (or how small) does this value have to be? Using probability theory, statisticians have calculated the p values for a runs test, associated with the number of runs observed for different sample sizes. If we let n be the sample size, n 1 be the number of positive values, and n 2 be the number of negative values, the expected number of runs μ is

m5

2 n 1 n 2 n

11

Data Analysis with Microsoft Excel: Updated for Office 2007

11

Get our desktop app

Company

Features

Documentation

Resources