not shown in Exhibit 15.1, but came from that SPSS analysis.^8 These values are obtained
by choosing the “Save” option in the regression dialog box and then selecting the appropri-
ate statistics.
If we look at these cases in the diagnostic statistics above, we can see that some of them
have large residuals and studentized residuals. The only studentized residual that is partic-
ularly noteworthy is for State 48, which is West Virginia. But when we look at Cook’s D
we see that no state comes even close to having unusual values. The highest Cook’s Dfor
this data set is .1230. From these results we are safe in concluding that no one state is hav-
ing a disproportionate influence on our results.
Diagnostic Plots
Just because no one state or collection of states does not appear to have a disproportionate
influence on our regression equation does not mean that we have nothing to worry about. It
is possible that there are other problems with the data. In fact, there was a problem that
I passed over by using the log of PctSAT.
Our tests on the regression coefficients assume that the residuals are homoscedastic,
meaning that the variance of the residuals is constant conditional on the level of each of the
predictor variables and on the overall from the final regression equation. Two important
things that we should always look at are a plot of the residuals against the predicted values
and a Q-Q plot of the residuals to check for normality. In the top of Figure 15.6 you will
see these two plots when I used PctSAT instead of LogPctSAT in the regression equation
along with Expend.
The line drawn through the plot in the upper left is a smoothed regression line fitting
the data. Notice that it is distinctly curved. There should be no pattern to the residuals, but
clearly there is. Crawley (2007) suggests that this plot should look like the sky at night,
with points scattered all over the place. That is not the case here. In the lower left you see a
similar plot but with LogPctSAT and Expend used as the predictors. Here there is much
less of a pattern to the display, which is why I chose to use LogPctSAT as my predictor.
YN
15.10 Regression Diagnostics 543
Exhibit 15.3 Diagnostic statistics
Observation Residual RStudent Cook’s D Hat Diag (hi)
1 2 4.5159 2 .1798 .0006 .0312
2 2 11.7667 .4899 .0122 .1121
3 1.4609 .0580 .0000 .0248
4 2 51.6151 2 2.0661 .0924 .0410
5 2 2.9717 2 .1189 .0003 .0407
...............
29 54.9324 2.1943 .0972 .0371
30 2 25.6401 2 1.1063 .0968 .1718
...............
48 2 61.5098 2 2.4168 .0508 .0054
49 20.5930 .8368 .0227 .0688
50. 2 34.5975 2 1.3757 .0321 .0284
(^8) SPSS calculates leverage, and hence the Studentized Range Statistic, slightly differently than do SAS, JMP,
SYSTAT, BMDP, and others. The leverage values are lower by a factor of 1N, but this makes no substantive
difference in the interpretation (except that the mean leverage will now be pN> instead of (p 1 1)>N).