460 CHAPTER 12 MULTIPLE LINEAR REGRESSIONoften be used. One way to avoid this problem is to use several different model-building tech-
niques and see if different models result. For example, we have found the same model for the
wine quality data using stepwise regression, forward selection, and backward elimination. The
same model was also one of the two best found from all possible regressions. The results from
variable selection methods frequently do not agree, so this is a good indication that the three-
variable model is the best regression equation.
If the number of candidate regressors is not too large, the all-possible regressions method
is recommended. We usually recommend using the minimum MSEand Cpevaluation criteria
in conjunction with this procedure. The all-possible regressions approach can find the “best”
regression equation with respect to these criteria, while stepwise-type methods offer no such
assurance. Furthermore, the all-possible regressions procedure is not distorted by dependen-
cies among the regressors, as stepwise-type methods are.12-6.4 MulticollinearityIn multiple regression problems, we expect to find dependencies between the response
variable Yand the regressors xj. In most regression problems, however, we find that there
are also dependencies among the regressor variables xj. In situations where these depend-
encies are strong, we say that multicollinearityexists. Multicollinearity can have serious
effects on the estimates of the regression coefficients and on the general applicability of
the estimated model.
The effects of multicollinearity may be easily demonstrated. The diagonal elements of the
matrix C(X X)^1 can be written aswhere R^2 jis the coefficient of multiple determination resulting from regressing xjon the
otherk1 regressor variables. Clearly, the stronger the linear dependency of xjon the re-
maining regressor variables, and hence the stronger the multicollinearity, the larger the
value of R^2 jwill be. Recall that Therefore, we say that the variance of is
“inflated’’by the quantity. Consequently, we define the variance inflation
factorfor asj11 R^2 j 2 ^1V 1 ˆj 2 ^2 Cjj. ˆjCjj1
11 R^2 j 2j1, 2,p, k
These factors are an important measure of the extent to which multicollinearity is present.
Although the estimates of the regression coefficients are very imprecise when multi-
collinearity is present, the fitted model equation may still be useful. For example, suppose we
wish to predict new observations on the response. If these predictions are interpolations in the
original region of the x-space where the multicollinearity is in effect, satisfactory predictions
will often be obtained, because while individual jmay be poorly estimated, the function
may be estimated quite well. On the other hand, if the prediction of new obseva-
tions requires extrapolation beyond the original region of the x-space where the data were col-
lected, generally we would expect to obtain poor results. Extrapolation usually requires good
estimates of the individual model parameters.gkj 1 jxijVIF 1 j 2 (12-50)1
11 R^2 j 2j1, 2,... , k
c12 B.qxd 5/20/02 3:01 PM Page 460 RK UL 6 RK UL 6:Desktop Folder:TEMP WORK:MONTGOMERY:REVISES UPLO D CH 1 14 FIN L:Quark Files: