IN CHAPTER 9 WE CONSIDEREDthe situation in which we have one criterion variable (Y) and
one predictor variable (X) and wish to predict Yon the basis of X. In this chapter we will
consider the case in which we still have only one criterion (Y) but have multiple predictors
( ), and want to predict Yon the basis of simultaneousknowledge of all p
predictors. The situation we examined in Chapter 9 can be viewed as a special case of the
one discussed in this chapter; alternatively, this chapter can be viewed as an extension of
Chapter 9. We will continue to use many familiar concepts such as the correlation coeffi-
cient, the slope, the standard error of estimate, and.
In this chapter I am going to move away from hand calculation and focus mainly on
computer printout. There is very little to be gained, and much to be lost, by focusing on the
calculations. By freeing ourselves from computation we are able to concentrate on the
really important issues that lie behind choosing an appropriate regression solution. Gener-
ally, there can be little argument over formulae. On the other hand, questions about the
optimal number of predictors, the use of regression diagnostics, the relative importance of
various predictors, and the selection of predictors do not have universally accepted an-
swers. Be forewarned that the opinions expressed in this chapter are only opinions, and are
open to dispute—but then that is part of what makes statistics interesting. Excellent and
readable advanced sources for the study of multiple regression are Cohen, Cohen, West,
and Aiken (2003) and Stevens (1992).
15.1 Multiple Linear Regression
The problem of multiple regression is that of finding a regression equation to predict Y
(sometimes denoted ) on the basis of ppredictors ( ). Thus, we might
wish to predict success in graduate school (Y) on the basis of undergraduate grade point
average ( ), Graduate Record Exam scores ( ), number of courses taken in the major
discipline ( ), and some rating of “favorableness” of letters of recommendation ( ). Sim-
ilarly, we might wish to predict the time it takes to go from one point in a city to another on
the basis of number of traffic lights ( ), posted speed limit ( ), presence or absence of
“right turn on red” ( ), and traffic density ( ). These examples are both analyzed in the
same way, although in the first we presumably care about predictions for individual appli-
cants, whereas in the second we might be less interested in the prediction itself and more
interested in the role played by each of the predictors. In fact, the most common use of
multiple regression is to understand the relationship between variables rather than to actu-
ally make a prediction from the equation we derive.
The Regression Equation
In Chapter 9 we started with the equation of a straight line ( ) and solved for
the two unknowns (aand b) subject to the constraint that is a minimum. In mul-
tiple regression we are going to do the same thing, although in this case we will solve the
equation where represents the intercept and
are the regression coefficients for the predictors , respectively.
We will retain the least squares restriction that is to be minimized, because it
still makes sense to find predicted values that come as close as possible to the obtained val-
ues of Y.^1 The calculations required to estimate the bibecome more cumbersome as the
g(Y 2 YN)^2
b 1 ,b 2 ,... ,bp X 1 , X 2 ,... , Xp
YN=b 01 b 1 X 11 b 2 X 2 1 Á 1 bpXp b 0
g(Y 2 YN)^2
YN=bX 1 a
X 3 X 4
X 1 X 2
X 3 X 4
X 1 X 2
X 0 X 1 , X 2 , X 3 ,... , Xp
SSregression
X 1 , X 2 , X 3 ,... ,Xp
516 Chapter 15 Multiple Regression
(^1) There are alternatives to the standard least squares criteria that often produce estimates that are in some ways
superior to the estimates obtained by least squares. These procedures are less common, but many of them can be
found in Rousseeuw and Leroy (1987).