Applied Statistics and Probability for Engineers

(Chris Devlin) #1
12-5 MODEL ADEQUACY CHECKING 445

We would like to examine the influential points to determine whether they control many
model properties. If these influential points are “bad” points, or erroneous in any way, they should
be eliminated. On the other hand, there may be nothing wrong with these points, but at least we
would like to determine whether or not they produce results consistent with the rest of the data. In
any event, even if an influential point is a valid one, if it controls important model properties, we
would like to know this, since it could have an impact on the use of the model.
Montgomery, Peck, and Vining (2001) and Myers (1990) describe several methods for
detecting influential observations. An excellent diagnostic is the distance measuredeveloped
by Dennis R. Cook. This is a measure of the squared distance between the usual least squares
estimate of based on all nobservations and the estimate obtained when the ith point is re-
moved, say, ˆ 1 i 2. The Cook distancemeasureis

Clearly, if the ith point is influential, its removal will result in changing considerably from
the value. Thus, a large value of Diimplies that the ith point is influential. The statistic Diis
actually computed using


ˆ 1 i 2

Di

1 ˆ (^1) i 2 ˆ^2 ¿X¿X^1 ˆ (^1) i 2 ˆ^2
p ˆ^2
(^) i1, 2,p, n
Di (12-44)
ri^2
p^
hii
11 hii 2


i1, 2,p, n

From Equation 12-44 we see that Diconsists of the squared studentized residual, which
reflects how well the model fits the ith observation yi[recall that and a
component that measures how far that point is from the rest of the data is a
measure of the distance of the ith point from the centroid of the remaining n1 points]. A
value of Di1 would indicate that the point is influential. Either component of Di(or both)
may contribute to a large value.

EXAMPLE 12-10 Table 12-11 lists the values of the hat matrix diagonals hiiand Cook’s distance measure Difor
the wire bond pull strength data in Example 12-1. To illustrate the calculations, consider the
first observation:

The Cook distance measure Didoes not identify any potentially influential observations in the
data, for no value of Diexceeds unity.

0.035



3 1.57 2 5.2352 11 0.1573 242
3

̨ ̨

0.1573
11 0.1573 2



3 e 1  2 MSE 11 h 11242
p ̨^

h 11
11 h 112

D 1 

r 12
p ̨^ ̨

h 11
11 h 112

3 hii 11 hii 2


riei 2 ˆ^211 hii 24


c 12 .qxd 5/20/02 9:33 M Page 445 RK UL 6 RK UL 6:Desktop Folder:TEMP WORK:MONTGOMERY:REVISES UPLO D CH114 FIN L:Quark Files:

Free download pdf