Statistical Methods for Psychology

(Michael S) #1

15.6 The Multiple Correlation Coefficient


Exhibit 15.1 shows that the multiple correlation between SAT and two predictors (Expend
and LogPctSAT) is equal to .941. The multiple correlation coefficientis often denoted
R0.123... p.The notation denotes the fact that the criterion (Yor ) is predicted from pre-
dictors 1, 2, 3... psimultaneously. When there is no confusion as to which predictors are
involved, we generally drop the subscripts and use plain old R.
As we have seen, Ris defined as the correlation between the criterion (Y) and the best
linear combination of the predictors. As such, Ris really nothing but , where

Thus, if we wished, we could use the regression equation to generate , and then correlate
Yand , as we did in Figure 15.2. Although no one would seriously propose calculating R
in this way, it is helpful to realize that this is what the multiple correlation actually repre-
sents. In practice, R(or ) is printed out by every multiple regression computer program.
For our data, the multiple correlation between SAT and Expend and LogPctSAT taken
simultaneously is .886.
The coefficient Ris a regular correlation coefficient and can be treated just like any
other Pearson product-moment correlation. (This is obviously true, because .)
However, in multiple correlation (as is often the case with simple correlation) we are more
interested in than in R, because it can be directly interpreted in terms of percentage of
accountable variation. Thus, , and we can say that 88.6% of the varia-
tion in the overall quality of the lectures can be predicted on the basis of the two predic-
tors. This is nearly 75 percentage points more than could be predicted on the basis of
Expend alone, where we explained 14.5% of the variation.
Unfortunately, is not an unbiased estimate of the corresponding parameter in the
population ( ). The extent of this bias depends on the relative size of Nand p. When
N 5 p 1 1, prediction is perfect and R 5 1, regardless of the true relationship between Y
and in the population. (A straight line will perfectly fit any two points; a
plane, like the three legs of a milking stool, will perfectly fit any three points; and so on.) A
relatively unbiased estimate of is given by

For our data,

This value agrees with the “AdjustedR Square” printed by the SPSS procedure in Exhibit 15.1.
It should be apparent from the definition of Rthat it can take on values only between 0
and 1. This follows both from the fact that it is defined as the positive square root of ,
and from the fact that it can be viewed as —we would hardly expect to be negatively
correlated with Y. This is an important point, because if we were to predict SAT just from
Expend, the multiple correlation will be .381, whereas we know that the simple correlation
was 2 .381. As long as you understand what is happening here, there should not be any
confusion.
Because R^2 adjis a less biased estimate of the squared population coefficient than R^2 ,
you might expect that people would routinely report R^2 adj. In fact, R^2 adjis seldom seen
except on computer printout. I don’t know why that should be, but Ror R^2 is what you
would normally report.

rYYN YN

R^2


est R*^2 = 12

(1 2 .886)(49)


47


=.881


est R*^2 = 12

(1 2 R^2 )(N 2 1)


N 2 p 21

R*^2


X 1 , X 2 ,... ,Xp

R*.123...^2 p

R^2


R^2 =.941^2 =.886


R^2


R=rYYN

R^2


YN


YN


YN =b 01 b 1 X 11 b 2 X 2 1 Á 1 bpXp

rYYN

X 0


532 Chapter 15 Multiple Regression


multiple
correlation
coefficient
R0.123... p

Free download pdf