Statistical Methods for Psychology

large residuals (they lie far from the regression line). Such points may represent random error, they may be data that are incorrectly recorded, or they may reflect unusual cases that don’t really belong in this data set. (An example of this last point would arise if we were trying to predict physical reaction time as a function of cognitive processing features of a task, and our subjects included one individual who suffered from a neuromuscular disorder that seriously slowed his reaction time.) Residuals are a standard feature of all regression analyses, and you should routinely request and examine them in running your analyses. Leverage (often denoted , or “hat diag”) measures the degree to which a case is unusual with respect to the predictor variables. In the case of one predictor, leverage is simply a function of the deviation of the score on that predictor from the predictor mean. Point Bin Figure 15.5 is an example of a point with high leverage because the Xscore for that point (13) is far from. Most programs for multiple regression compute and print the leverage of each observation if requested. Possible values on leverage range from a low of 1Nto a high of 1.0, with a mean of (p 1 1)N, where p 5 the number of predictors. Stevens (1992) recom- mends looking particularly closely at those leverage values that exceed 3(p 1 1)n. Points that are high on either distance or leverage do not necessarily have an important influence on the regression, but they have the potential for it. In order for a point to be high on influence, it must have relatively high values on both distance and leverage. In Figure 15.5, Point Bis very high on leverage, but it has a relatively small residual (distance). Point A, on the other hand, has a large residual but, because it is near the mean on X, has low leverage. Point Cis high on leverage and has a large residual, suggesting that it is high on influence. The most common measure of influence is known as Cook’s D.It is a function of the sum of the squared changesin bjthat would occur if the ith observation were removed from the data and the analysis rerun. Exhibit 15.2 contains various diagnostic statistics for the data shown in Figure 15.5. These diagnostics were produced by an SAS, but similar statistics would be produced by almost any other program. To take the diagnostic statistics in order, consider first the column headed Resid, which is a measure of distance. This column reflects what we can already see in Figure 15.5— that the 8th and 11th observations have the largest residuals. Considering that the Yvalues range only from 1 to 14, a residual of 2 5.89 seems substantial.

>

X

Xj

hi

15.10 Regression Diagnostics 541

Exhibit 15.2 Diagnostic statistics for data in Figure 15.5

“A” ->

“C” -> “B” ->

OBS

1 2 3 4 5 6 7 8 9

10 11 12

X 1 1 3 3 3 4 5 5 6 7

10 13

Y 1 2 3 5 7 6 8

10 5 10 4 14

PRED

3.23 3.23 4.71 4.71 4.71 5.45 6.19 6.19 6.93 7.77 9.89 12.11

RESID

–2.23 –1.22 –1.71 0.29 2.29 0.55 1.81 3.81 –1.93 2.33 –5.89 1.89

RSTUDENT

–0.87 –0.47 –0.62 0.10 0.85 0.19 0.65 1.49 –0.69 0.86 –3.54 0.98

HAT DIAG H

0.20 0.20 0.11 0.11 0.11 0.09 0.08 0.08 0.09 0.11 0.26 0.54

MSE

8.22 8.71 8.55 8.91 8.26 8.88 8.52 7.16 8.46 8.24 3.73 8.06

COOK’S D

0.10 0.03 0.03 0.00 0.05 0.00 0.02 0.09 0.02 0.05 1.01 0.55

Cook’s D

Statistical Methods for Psychology

>

>

>

X

Get our desktop app

Company

Features

Documentation

Resources