Applied Statistics and Probability for Engineers

(Chris Devlin) #1
454 CHAPTER 12 MULTIPLE LINEAR REGRESSION

We use the mean square error from the full K1 term model as an estimate of ^2 ; that is,
Then an estimator of pis [see Montgomery, Peck, and Vining (2001) or
Myers (1990) for the details]:

ˆ^2 MSE 1 K 12.

If the p-term model has negligible bias, it can be shown that

Therefore, the values of Cpfor each regression model under consideration should be evaluated
relative to p.The regression equations that have negligible bias will have values of Cpthat are
close to p, while those with significant bias will have values of Cpthat are significantly greater
than p.We then choose as the “best” regression equation either a model with minimum Cpor
a model with a slightly larger Cp, that does not contain as much bias (i.e., ).
The PRESS statistic can also be used to evaluate competing regression models. PRESS is
an acronym for Prediction Error Sum of Squares, and it is defined as the sum of the squares of
the differences between each observation yiand the corresponding predicted value based on a
model fit to the remaining n1 points, say. So PRESS provides a measure of how well
the model is likely to perform when predicting new data, or data that was not used to fit the
regression model. The computing formula for PRESS is

yˆ 1 i 2

Cpp

E 1 Cp 0 zero bias 2 p

where is the usual residual. Thus PRESS is easy to calculate from the standard
least squares regression results. Models that have small values of PRESS are preferred.

EXAMPLE 12-13 Table 12-16 presents data on taste-testing 38 brands of pinot noir wine (the data were first
reported in an article by Kwan, Kowalski, and Skogenboe in an article in the Journal of Agri-
cultural and Food Chemistry, Vol. 27, 1979, and it also appears as one of the default data sets
in Minitab). The response variable is yquality, and we wish to find the “best” regression
equation that relates quality to the other five parameters.
Figure 12-12 is the matrix of scatter plots for the wine quality data, as constructed by
Minitab. We notice that there are some indications of possible linear relationships between
quality and the regressors, but there is no obvious visual impression of which regressors
would be appropriate. Table 12-16 lists the all possible regressions output from Minitab. In
this analysis, we asked Minitab to present the best three equations for each subset size. Note
that Minitab reports the values of R^2 , Radj^2 , Cp, and for each model. From Table
12-17 we see that the three-variable equation with x 2 aroma, x 4 flavor, and x 5 oaki-
ness produces the minimum Cpequation, whereas the four-variable model, which adds

S 1 MSE

eiyiyˆi

Cp (12-47)

SSE 1 p 2
ˆ^2

n 2 p

PRESS a

n

i 1

1 yiyˆ 1 i 222  a

n

i 1

a

ei
1 hii

b

2

c12 B.qxd 5/20/02 10:03 M Page 454 RK UL 6 RK UL 6:Desktop Folder:TEMP WORK:MONTGOMERY:REVISES UPLO D CH114 FIN L:Quark Files:

Free download pdf