Applied Statistics and Probability for Engineers

(Chris Devlin) #1
12-6 ASPECTS OF MULTIPLE REGRESSION MODELING 461

Multicollinearity arises for several reasons. It will occur when the analyst collects data
such that a linear constraint holds approximately among the columns of the Xmatrix. For ex-
ample, if four regressor variables are the components of a mixture, such a constraint will
always exist because the sum of the components is always constant. Usually, these constraints
do not hold exactly, and the analyst might not know that they exist.
The presence of multicollinearity can be detected in several ways. Two of the more easily
understood of these will be discussed briefly.


  1. The variance inflation factors,defined in equation 12-50, are very useful measures
    of multicollinearity. The larger the variance inflation factor, the more severe the mul-
    ticollinearity. Some authors have suggested that if any variance inflation factor ex-
    ceeds 10, multicollinearity is a problem. Other authors consider this value too liberal
    and suggest that the variance inflation factors should not exceed 4 or 5. Minitab will
    calculate the variance inflation factors. Table 12-4 presents the Minitab multiple re-
    gression output for the wire bond pull strength data. Since both VIF 1 and VIF 2 are
    small, there is no problem with multicollinearity.

  2. If the F-test for significance of regression is significant, but tests on the individual
    regression coefficients are not significant, multicollinearity may be present.
    Several remedial measures have been proposed for solving the problem of multi-
    collinearity. Augmenting the data with new observations specifically designed to break up the
    approximate linear dependencies that currently exist is often suggested. However, this is
    sometimes impossible because of economic reasons or because of the physical constraints that
    relate the xj. Another possibility is to delete certain variables from the model, but this approach
    has the disadvantage of discarding the information contained in the deleted variables.
    Since multicollinearity primarily affects the stability of the regression coefficients, it would
    seem that estimating these parameters by some method that is less sensitive to multicollinearity
    than ordinary least squares would be helpful. Several methods have been suggested. One alterna-
    tive to ordinary least squares, ridge regression,can be useful in combating multicollinearity. For
    more details on ridge regression, see Section 12-6.5 on the CD material or the more extensive pre-
    sentations in Montgomery, Peck, and Vining (2001) and Myers (1990).


12-6.5 Ridge Regression (CD Only)

12-6.6 Nonlinear Regression (CD Only)

EXERCISES FOR SECTION 12-6
12-46. An article entitled “A Method for Improving the
Accuracy of Polynomial Regression Analysis’’in the Journal
of Quality Technology(1971, pp. 149–155) reported the fol-
lowing data on yultimate shear strength of a rubber com-
pound (psi) and xcure temperature (°F).

(d) Compute the residuals from part (a) and use them to eval-
uate model adequacy.
12-47. Consider the following data, which result from an
experiment to determine the effect of xtest time in hours at
a particular temperature on ychange in oil viscosity:

(a) Fit a second-order polynomial to these data.
(b) Test for significance of regression using 0.05.
(c) Test the hypothesis that  11 0 using 0.05.

(a) Fit a second-order polynomial to the data.
(b) Test for significance of regression using 0.05.
(c) Test the hypothesis that  11 0 using 0.05.

y 770 800 840 810

x 280 284 292 295
y 735 640 590 560

x 298 305 308 315

y 1.42 1.39 1.55 1.89 2.43

x .25 .50 .75 1.00 1.25
y 3.15 4.05 5.15 6.43 7.89

x 1.50 1.75 2.00 2.25 2.50

c12 B.qxd 5/20/02 10:03 M Page 461 RK UL 6 RK UL 6:Desktop Folder:TEMP WORK:MONTGOMERY:REVISES UPLO D CH114 FIN L:Quark Files:

Free download pdf