or, in terms of the variables we have been using
I obtained the predicted scores from
and stored the predicted scores as PredSAT. Now if I correlate actual SAT with
PredSAT the resulting correlation will be .941, which is our multiple correlation. (A scat-
terplot of this relationship is shown in Figure 15.2.)
The point of this last approach is to show that you can think of a multiple correlation
coefficient as the simple Pearson correlation between the criterion (SAT) and the best lin-
ear combination of the predictors. When I say “best linear combination” I mean that there
is no set of weights (regression coefficients) that will do a better job of predicting
the state’s mean SAT score from those predictors. This is actually a very important point.
There are a number of advanced techniques in statistics, which we are not going to cover
in this book, that really come down to creating a new variable that is some optimal
weighted sum of other variables, and then using that variable in the main part of the analy-
sis. This approach also explains why multiple correlations are always positive, even if the
relationship between two variables is negative. You would certainly expect the predicted
values to be positively correlated with the criterion.
Review
We now have several ways of thinking of multiple regression, and each of them gives us a
somewhat different view of what is going on. If one of them makes more sense to you than
the others, you can focus on that approach.
- We can treat a regression coefficient as the coefficient we would get if we had a whole
group of states that did not differ on any of the predictors except the one under consider-
ation. In other words all predictors but one are held constant, and we look at what vary-
ing that one predictor does. - We can think of a regression coefficient in multiple regression as the same thing we
would have in simple regression if we adjusted our two variables for any of the variables
we want to control. In this example it meant adjusting both SAT and Expend for LogPct-
SAT (by computing the difference between the obtained score for that variable and the
score predicted from the “nuisance variable” [or the “to be controlled variable“]). The
1147.113
SAT=11.130*Expend 2 78.205*LogPctSAT 1
SAT=b 1 Expend 1 b 2 LogPctSAT 1 b 0
526 Chapter 15 Multiple Regression
Figure 15.2 Scatterplot showing the relationship between SAT and the
best linear combination of the predictors
850
800 900 1000
SAT
1100
900
950
1000
1050
1100
Unstandardized predicted value
r = 0.941