must raise the issue of whether this whole approach is generally appropriate. In many cases
it is not.
If we assume that you have a large set of variables and a large number of data points,
and are truly interested in a question of prediction (you want to predict who will do well
at some job and have no particular theoretical axe to grind), then one of these methods
may be for you. However, if you are trying to test some theoretical model by looking to
see if certain variables are related to some outcome (e.g., can you predict adolescents’
psychological symptoms on the basis of major stressful events, daily hassles, and
parental stress), then choosing a model on the basis of some criterion such as the maxi-
mum or the minimum is not likely to be particularly helpful. In fact, it may
be particularly harmful by causing you to focus on statistically derived models that fit
only slightly, and perhaps nonsignificantly, better than some other more logically appro-
priate model. Conducting a stepwise analysis, for example, so as to report which of two
competing psychological variables is second to enter the equation often adds a spurious
form of statistical elegance to a poor theory. Solid arguments against the use of step-
wise regression for the purpose of ordering variables by importance have been given by
Huberty (1989). Henderson and Denison (1989), in an excellent article that summarizes
many of the important issues, suggest that “stepwise regression” should be called
“unwise regression.”
On the assumption that you still want to construct a regression model using some form
of variable-selection process, we will consider three alternative approaches: all subsets
regression, backward elimination, and stepwise regression. A readable and much more
thorough discussion of this topic can be found in Draper and Smith (1981, Chapter 6).
All Subsets Regression
The simplest of these methods at a conceptual level is called all subsets regressionfor the
rather obvious reason that it looks at all possible subsets of the predictor variables and
chooses that set that is optimal in some way (such as maximizing R^2 or minimizing the
mean square error). With three or four predictors and some patience you could conduct
such an analysis by using any standard computer package to calculate multiple analyses.
However, with a large number of variables the only way to go about this is to use a special-
ized program, such as SAS PROC RSQUARE, which allows you to specify the largest and
smallest number of predictors to appear in each subset and the number of subsets of each
size. (For example, you can say, “Give me the eight models with the highest R^2 s using five
predictors.”)
You can define “best” in several different ways; these ways do not always lead to the
same models. You can select models on the basis of (1) the magnitude of , (2) the magni-
tude of , (3) a statistic called Mallow’s , and (4) a statistic called PRESS. The
magnitudes of and have already been discussed. We search for that combina-
tion of predictors with the highest (or better yet, adjusted ) or that set that minimizes
error. Mallow’s statistic compares the relative magnitudes of the error term in any par-
ticular model with the error term in the complete model with all predictors present (see
Draper & Smith, 1981, p. 299). As such it only applies to nested models, as does the
PRESS statistic to follow. Because the error term in the reduced model must be greater than
(or equal to) the error term in the full model, we want to minimize that ratio.
PRESS (Predicted RESidual Sum of Squares) is a statistic similar to in that it
looks at , but in the case of PRESS the predictions are made from a data set that
includes all cases exceptthe one to be predicted. Ordering models on the basis of PRESS
would generally, though not always, be similar to ordering them on the basis of.
The advantage of PRESS is that it is more likely to focus on influential data points (see
Draper & Smith, 1981, p. 325).
MSresidual
g(Yi 2 YNi)^2
MSresidual
Cp
R^2 R^2
R^2 MSresidual
MSresidual Cp
R^2
R^2 MSresidual
15.11 Constructing a Regression Equation 547
all subsets
regression