Applied Statistics and Probability for Engineers

(Chris Devlin) #1
12-6 ASPECTS OF MULTIPLE REGRESSION MODELING 457

These models should now be evaluated further using residuals plots and the other tech-
niques discussed earlier in the chapter, to see if either model is satisfactory with respect to the
underlying assumptions and to determine if one of them is preferable. It turns out that the
residual plots do not reveal any major problems with either model. The value of PRESS for
the three-variable model is 56.0525 and for the four-variable model it is 60.3927. Since
PRESS is smaller in the model with three regressors, and since it is the model with the fewest
predictors, it would likely be the preferred choice.

Stepwise Regression
Stepwise regression is probably the most widely used variable selection technique. The pro-
cedure iteratively constructs a sequence of regression models by adding or removing variables
at each step. The criterion for adding or removing a variable at any step is usually expressed
in terms of a partial F-test. Let finbe the value of the F-random variable for adding a variable
to the model, and let foutbe the value of the F-random variable for removing a variable from
the model. We must have finfout, and usually finfout.
Stepwise regression begins by forming a one-variable model using the regressor variable
that has the highest correlation with the response variable Y. This will also be the regressor
producing the largest F-statistic. For example, suppose that at this step, x 1 is selected. At the
second step, the remaining K1 candidate variables are examined, and the variable for
which the partial F-statistic

(12-48)

is a maximum is added to the equation, provided that fjfin. In equation 12-48, MSE(xj, x 1 )
denotes the mean square for error for the model containing both x 1 and xj. Suppose that this
procedure indicates that x 2 should be added to the model. Now the stepwise regression algo-
rithm determines whether the variable x 1 added at the first step should be removed. This is
done by calculating the F-statistic

(12-49)

If the calculated value f 1  fout, the variable x 1 is removed; otherwise it is retained, and we
would attempt to add a regressor to the model containing both x 1 and x 2.
In general, at each step the set of remaining candidate regressors is examined, and the
regressor with the largest partial F-statistic is entered, provided that the observed value of
fexceeds fin. Then the partial F-statistic for each regressor in the model is calculated, and the
regressor with the smallest observed value of Fis deleted if the observed f fout. The
procedure continues until no other regressors can be added to or removed from the model.
Stepwise regression is almost always performed using a computer program. The analyst
exercises control over the procedure by the choice of finand fout. Some stepwise regression
computer programs require that numerical values be specified for finand fout. Since the num-
ber of degrees of freedom on MSEdepends on the number of variables in the model, which
changes from step to step, a fixed value of finand foutcauses the type I and type II error rates
to vary. Some computer programs allow the analyst to specify the type I error levels for finand
fout. However, the “advertised” significance level is not the true level, because the variable
selected is the one that maximizes (or minimizes) the partial F-statistic at that stage.
Sometimes it is useful to experiment with different values of finand fout(or different advertised

F 1 

SSR 1  10  2 , 02
MSE 1 x 1 , x 22

Fj

SSR 1 j 0  1 , 02
MSE 1 xj, x 12

c12 B.qxd 5/20/02 3:01 PM Page 457 RK UL 6 RK UL 6:Desktop Folder:TEMP WORK:MONTGOMERY:REVISES UPLO D CH 1 14 FIN L:Quark Files:

Free download pdf