Applied Statistics and Probability for Engineers

12-6 ASPECTS OF MULTIPLE REGRESSION MODELING 453

regressors from a set that quite likely includes all the important variables, but we are sure that not all these candidate regressors are necessary to adequately model the response Y. In such a situation, we are interested in variable selection;that is, screening the candidate variables to obtain a regression model that contains the “best” subset of regressor variables. We would like the final model to contain enough regressor variables so that in the intended use of the model (prediction, for example) it will perform satisfactorily. On the other hand, to keep model maintenance costs to a minimum and to make the model easy to use, we would like the model to use as few regressor variables as possible. The compromise between these conflicting objectives is often called finding the “best” regression equation. However, in most problems, no single regression model is “best” in terms of the various evaluation criteria that have been proposed. A great deal of judgment and experience with the system being modeled is usually necessary to select an appropriate set of regressor variables for a regression equation. No single algorithm will always produce a good solution to the variable selection problem. Most of the currently available procedures are search techniques, and to perform satisfactorily, they require interaction with judgment by the analyst. We now briefly discuss some of the more popular variable selection techniques. We assume that there are Kcandidate regressors, x 1 , x 2 , p,xK, and a single response variable y. All models will include an intercept term 0 , so the model with allvariables included would have K1 terms. Furthermore, the functional form of each candidate variable (for example, x 1 1 x, x 2 ln x, etc.) is correct.

All Possible Regressions This approach requires that the analyst fit all the regression equations involving one candidate variable, all regression equations involving two candidate variables, and so on. Then these equations are evaluated according to some suitable criteria to select the “best” regression model. If there are Kcandidate regressors, there are 2Ktotal equations to be examined. For example, if K4, there are 2^4 16 possible regression equations; while if K10, there are 210 1024 possible regression equations. Hence, the number of equations to be examined increases rapidly as the number of candidate variables increases. However, there are some very efficient computing algorithms for all possible regressions available and they are widely implemented in statistical software, so it is a very practical procedure unless the number of candidate regressors is fairly large. Several criteria may be used for evaluating and comparing the different regression models obtained. A commonly used criterion is based on the value of R^2 or the value of the adjustedR^2 , R^2 adj. Basically, the analyst continues to increase the number of variables in the model until the increase in R^2 or the adjusted Radj^2 is small. Often, we will find that the Radj^2 will stabilize and actually begin to decrease as the number of variables in the model increases. Usually, the model that maximizes R^2 adjis considered to be a good candidate for the best regression equation. Because we can write Radj^2 1 {MSE[SSE(n1)]} and SSE(n1) is a constant, the model that maximizes the R^2 adjvalue also minimizes the mean square error, so this is a very attractive criterion. Another criterion used to evaluate regression models is the Cpstatistic, which is a meas- ure of the total mean square error for the regression model. We define the total standardized mean square error for the regression model as

1 ^2

31 bias 22 variance 4

1 ^2

ea

n

i 1

3 E 1 Yi 2 E 1 Yˆi 242 a

n

i 1

V 1 Yˆi2f

p

1 ^2 a

n

i 1

E 3 YˆiE 1 Yi 242

c12 B.qxd 5/20/02 10:03 M Page 453 RK UL 6 RK UL 6:Desktop Folder:TEMP WORK:MONTGOMERY:REVISES UPLO D CH114 FIN L:Quark Files:

Applied Statistics and Probability for Engineers

Get our desktop app

Company

Features

Documentation

Resources