coefficient (slope) that we obtain is the same coefficient we find in the multiple regres-
sion solution.
- We can think of the multiple correlation as the simple Pearson correlation between the
criterion (call it Y) and another variable (call it ) that is the best linear combination of
the predictor variables.
The Educational Testing Service, which produces the SAT, tries to have everyone put a
disclaimer on results broken down by states that says that the SAT is not a fair way to com-
pare the performance of different states. Having gone through this example you can see
that one reason that they say this is that different states have different cohorts of students
taking the exam, and this makes the test inappropriate as a way of judging a state’s
performance, even if it is a good way of judging the performance of individuals. We could
create a new variable that is the SAT score adjusted for LogPctSAT, but I would be very
wary of using that measure to compare states. It is possible that it would be fair, but it is
also possible that there are a number of other variables that I have not taken into account.
15.2 Using Additional Predictors
Before we look at other characteristics of multiple regression we should ask what would
happen if we used additional variables to predict SAT. We have two potential variables in
our data that we have not used—the pupil/teacher ratio and teacher’s salaries. We could add
both of them to what we already have, but I am only going to add PTratio. Folklore would
have it that a lower ratio would be associated with better performance. At the same time,
lower pupil/teacher ratios cost money, so PTratio should overlap with Expend and might
not contribute significant new information.
Table 15.3 shows the results of using Expend, LogPctSAT, and PTratio to predict SAT.
There are several things to say about this table.
The regression equation that results from this analysis is now
Notice that Expend and LogPctSAT are still significant (t 5 3.302 and 2 17.293, respec-
tively, but PTratio is far from significant (t 5 .418). This shows us that adding PTratio to our
model did not improve our ability to predict. (Even the simple correlation between PTratio
and SAT was not significant (r 5 .081).) You will see two new columns in Table 15.3, label
Toleranceand VIF (Variance Inflation Factor).When predictor variables are correlated
among themselves we have what is called collinearityor multicollinearity.Collinearity has
the effect of increasing the standard error of a regression coefficient, which increases the
YN =1132.033 1 11.665 Expend 2 78.393 PctSAT 2 0.742 PTratio
YN
15.2 Using Additional Predictors 527
Coefficientsa
Model
1 Constant
Expend
LogPctSAT
PTratio
B
1132.033
11.665
–78.393
.742
Std. Error
39.787
3.533
4.533
1.774
Beta
.212
–1.042
.022
t
28.452
3.302
–17.293
.418
Sig.
.000
.002
.000
.678
Tolerance
.596
.679
.854
VIF
1.679
1.473
1.171
Unstandardized
Coefficients Collinearity Statistics
Standardized
Coefficients
aDependent Variable: SAT
Table 15.3 Adding PTratio to the prediction equation
Tolerance
VIF (Variance
Inflation Factor)
collinearity
multicollinearity