Logistic Regression: A Self-learning Text, Third Edition (Statistics in the Health Sciences)

(vip2019) #1

More on influential observations
including:


 How to use computer software?


 How to proceed when influential
observations are identified?


Computer Packages, e.g., SAS,
STATA, SPSS produce their
own version of influence diag-
nostics


SAS’s LOGISTIC:influenceand
iplotsoptions at end of model
statement

Large collection of regression
diagnostic information
produced, including
Dbvaluesfor each variable
and
C measuresover
all variables

SAS’s LOGISTIC:CandCbarmea-
sures (similar but not identical)


EXAMPLE

MRSA no-interaction model
Logit P(X) = a + b 1 PREVHOSP + b 2 PAMU
+ g 1 AGE + g 2 GENDER

Case 9
Case 16

C

Case 9
Case 16

1 0.00921Valve
NumberCase 0 2 4 6 8 12 16(1 unit = 0.01)C
0.008570.003040.0217
.
.

.
.

0.003600.003400.00547
0.008770.009450.1311
0.004960.02040.0260
0.004560.004560.1010
0.006690.007330.00619
0.006940.002410.0153
0.002660.01120.0271
0.007600.00760
0.0464

0.1

0.0

(^05101520) Case Number 25303540
0.2C
0.006070.002040.00261
0.01410.0661
0.003140.003090.0179
0.006460.00609
(^234)
(^567)
1089
(^111213)
(^141516)
(^171819)
(^202122)
(^232425)
(^262728)
(^293031)
(^323334)
(^353637)
(^3839)
(^414042)






**
































  • **






  • Influence option Iplots option
    We now briefly illustrate using the MRSA
    example how to use computer software to diag-
    nose influential observations. We also discuss
    how to proceed with the analysis when influen-
    tial observations are identified.
    Most computer software packages such as
    SAS, STATA, and SPSS allow the user to obtain
    influence diagnostics for logistic regression,
    although they each have their own version of
    the program code and the statistics produced.
    For example, with SAS’s LOGISTIC procedure,
    the user can specify two options: the “influ-
    ence” option, and the “iplots” option after the
    model statement.
    Both these LOGISTIC options produce a large
    collection of regression diagnostics informa-
    tion for any fitted model. This includes Delta-
    beta measures for each variable in the model
    plus overall Cook’s distance-type measures.
    Here, we focus on the latter, which we hence-
    forth refer to as “C measures.”
    Two slightly different C measures are produced
    by the influence and iplot options, a “C” and a
    “Cbar” measure (Pregibon, 1981). These mea-
    sures typically yield similar, though not always
    identical, conclusions as to which subjects are
    “influential”.
    Theinfluenceoption produces a figure that
    vertically (on theY-axis) lists each subject and
    horizontally (on theX-axis) plots the value of
    the influence measure (C or Cbar). Theiplots
    option, on the other hand, produces a figure
    that lists the subjects horizontally and plots the
    influence measure on the vertical axis.
    The two figures on the left show the results for
    the influence measure C for the first 42 sub-
    jects in the MRSA data set for the no-interac-
    tion model shown below the figures. In both
    figures, subjects 9 and 16 appear to have C
    scores that are much higher than the other
    scores.
    Presentation: V. Influential Observations 277



Free download pdf