Applied Statistics and Probability for Engineers

454 CHAPTER 12 MULTIPLE LINEAR REGRESSION

We use the mean square error from the full K1 term model as an estimate of ^2 ; that is, Then an estimator of pis [see Montgomery, Peck, and Vining (2001) or Myers (1990) for the details]:

ˆ^2 MSE 1 K 12.

If the p-term model has negligible bias, it can be shown that

Therefore, the values of Cpfor each regression model under consideration should be evaluated relative to p.The regression equations that have negligible bias will have values of Cpthat are close to p, while those with significant bias will have values of Cpthat are significantly greater than p.We then choose as the “best” regression equation either a model with minimum Cpor a model with a slightly larger Cp, that does not contain as much bias (i.e., ). The PRESS statistic can also be used to evaluate competing regression models. PRESS is an acronym for Prediction Error Sum of Squares, and it is defined as the sum of the squares of the differences between each observation yiand the corresponding predicted value based on a model fit to the remaining n1 points, say. So PRESS provides a measure of how well the model is likely to perform when predicting new data, or data that was not used to fit the regression model. The computing formula for PRESS is

yˆ 1 i 2

Cpp

E 1 Cp 0 zero bias 2 p

where is the usual residual. Thus PRESS is easy to calculate from the standard least squares regression results. Models that have small values of PRESS are preferred.

EXAMPLE 12-13 Table 12-16 presents data on taste-testing 38 brands of pinot noir wine (the data were first reported in an article by Kwan, Kowalski, and Skogenboe in an article in the Journal of Agri- cultural and Food Chemistry, Vol. 27, 1979, and it also appears as one of the default data sets in Minitab). The response variable is yquality, and we wish to find the “best” regression equation that relates quality to the other five parameters. Figure 12-12 is the matrix of scatter plots for the wine quality data, as constructed by Minitab. We notice that there are some indications of possible linear relationships between quality and the regressors, but there is no obvious visual impression of which regressors would be appropriate. Table 12-16 lists the all possible regressions output from Minitab. In this analysis, we asked Minitab to present the best three equations for each subset size. Note that Minitab reports the values of R^2 , Radj^2 , Cp, and for each model. From Table 12-17 we see that the three-variable equation with x 2 aroma, x 4 flavor, and x 5 oaki- ness produces the minimum Cpequation, whereas the four-variable model, which adds

S 1 MSE

eiyiyˆi

Cp (12-47)

SSE 1 p 2 ˆ^2

n 2 p

PRESS a

n

i 1

1 yiyˆ 1 i 222 a

n

i 1

a

ei 1 hii

b

2

c12 B.qxd 5/20/02 10:03 M Page 454 RK UL 6 RK UL 6:Desktop Folder:TEMP WORK:MONTGOMERY:REVISES UPLO D CH114 FIN L:Quark Files:

Applied Statistics and Probability for Engineers

Get our desktop app

Company

Features

Documentation

Resources