Deviance not always appropriate
for logistic regression GOF
Alternative approach:
Hosmer–Lemeshow
WhenG<<n, we can assume
DevETðb^Þis approximatelyw^2 nk 1
underH 0 : good fit
EXAMPLE
Previous data:n¼ 40
G¼ 2 ; 4 ; 4 for Models 1 ; 2 ; 3 ;
respectively
+
w^2 test for GOF is OK
However,when Gn, we cannot
assume
DevETðb^Þis approximatelyw^2 np 1
underH 0 : good fit
(Statistical theory:ngsmall1as
n!1)
Xicontinuous, e.g.,Xi¼AGE)Gn
Many situations where predictors
are continuous
+
Cannot use deviance to test for GOF
G¼n:
Each covariate pattern:
1 subject,
ng1 for allg,g¼1,...,n
DevSSðb^Þ¼DevETðb^Þ
but notw^2 underH 0 :
GOF adequate
DevSSðb^Þ¼ 2 ~
n
i¼ 1
P^ðXiÞln
^PðXiÞ
1 ^PðXiÞ
"!
þln 1 ^PðXiÞ
#
ProvidesP^ðXiÞbut notobservedYi
We are now ready to discuss why the use of the
deviance formula is not always appropriate for
assessing GOF for a logistic regression model,
and we will describe an alternative approach,
using the Hosmer–Lemeshow statistic, which
is typically used instead of the deviance.
When the number of covariate patterns (G)is
considerably smaller than the number of obser-
vations (n), the ET deviance formula can be
assumed to have an approximate chi-square dis-
tribution withnk 1 degrees of freedom.
For the data we have illustrated above involv-
ingn¼40 subjects,G¼2 for Model 1, and
G¼4 for Models 2 and 3. So a chi-square test
for GOF is appropriate using the ET deviance
formula.
However, whenGis almost as large asn,in
particular, whenGequalsn, then the deviance
cannot be assumed to have a chi-square distri-
bution (Collett, 1991). This follows from large-
sample statistical theory, where the primary
problem is that in this situation, the number
of subjects, ng, for each covariate pattern
remains small, e.g., close to 1, as the sample
size increases.
Note that if at least one of the variables in the
model, e.g., AGE, is continuous, thenGwill
tend to be close tonwhenever the age range
in the sample is reasonably wide. Since logistic
regression models typically allow continuous
variables, there are many situations in which
the chi-square distribution cannot be assumed
when using the deviance to test for GOF.
WhenG¼n, each covariate pattern involves
only one subject, and the SS deviance formula
is equivalent to the ET deviance formula,
which nevertheless cannot be assumed to
have a chi-square distribution underH 0.
Moreover, the SS deviance formula, shown
again here contains only thepredictedvalues
P^ðXiÞfor each subject. Thus, this formula tells
nothing about the agreement betweenobserved
(0,1) outcomes and their corresponding pre-
dicted probabilities.
Presentation: III. The Deviance Statistic 317