More on influential observations
including:
How to use computer software?
How to proceed when influential
observations are identified?
Computer Packages, e.g., SAS,
STATA, SPSS produce their
own version of influence diag-
nostics
SAS’s LOGISTIC:influenceand
iplotsoptions at end of model
statement
Large collection of regression
diagnostic information
produced, including
Dbvaluesfor each variable
and
C measuresover
all variables
SAS’s LOGISTIC:CandCbarmea-
sures (similar but not identical)
EXAMPLE
MRSA no-interaction model
Logit P(X) = a + b 1 PREVHOSP + b 2 PAMU
+ g 1 AGE + g 2 GENDER
Case 9
Case 16
C
Case 9
Case 16
1 0.00921Valve
NumberCase 0 2 4 6 8 12 16(1 unit = 0.01)C
0.008570.003040.0217
.
.
.
.
0.003600.003400.00547
0.008770.009450.1311
0.004960.02040.0260
0.004560.004560.1010
0.006690.007330.00619
0.006940.002410.0153
0.002660.01120.0271
0.007600.00760
0.0464
0.1
0.0
(^05101520) Case Number 25303540
0.2C
0.006070.002040.00261
0.01410.0661
0.003140.003090.0179
0.006460.00609
(^234)
(^567)
1089
(^111213)
(^141516)
(^171819)
(^202122)
(^232425)
(^262728)
(^293031)
(^323334)
(^353637)
(^3839)
(^414042)
**
**
Influence option Iplots option
We now briefly illustrate using the MRSA
example how to use computer software to diag-
nose influential observations. We also discuss
how to proceed with the analysis when influen-
tial observations are identified.
Most computer software packages such as
SAS, STATA, and SPSS allow the user to obtain
influence diagnostics for logistic regression,
although they each have their own version of
the program code and the statistics produced.
For example, with SAS’s LOGISTIC procedure,
the user can specify two options: the “influ-
ence” option, and the “iplots” option after the
model statement.
Both these LOGISTIC options produce a large
collection of regression diagnostics informa-
tion for any fitted model. This includes Delta-
beta measures for each variable in the model
plus overall Cook’s distance-type measures.
Here, we focus on the latter, which we hence-
forth refer to as “C measures.”
Two slightly different C measures are produced
by the influence and iplot options, a “C” and a
“Cbar” measure (Pregibon, 1981). These mea-
sures typically yield similar, though not always
identical, conclusions as to which subjects are
“influential”.
Theinfluenceoption produces a figure that
vertically (on theY-axis) lists each subject and
horizontally (on theX-axis) plots the value of
the influence measure (C or Cbar). Theiplots
option, on the other hand, produces a figure
that lists the subjects horizontally and plots the
influence measure on the vertical axis.
The two figures on the left show the results for
the influence measure C for the first 42 sub-
jects in the MRSA data set for the no-interac-
tion model shown below the figures. In both
figures, subjects 9 and 16 appear to have C
scores that are much higher than the other
scores.
Presentation: V. Influential Observations 277