VI. SUMMARY
3 Chapter 9: Assessing Goodness
of Fit for Logistic
Regression
Saturated model:
Contains as many parameters
(pþ1) as the number of
subjects (n) in the dataset
Provides perfect prediction of
the observed (0, 1) outcomes on
each subject
Fully parameterized model:
Contains the maximum
number of covariates that can
be defined from the basic
predictors (X) being considered
for the model
Provides perfect prediction of
the observed proportion of cases
within subgroups defined by
distinct covariate patterns ofX
Subject-specific (SS) format:
Datalines listed by subjects
Used for GOF measure of
model fit for (0, 1) outcomes
Events–trials (ET) format:
Datalines listed by subgroups
based on (G) covariate patterns
Used for GOF measure of
model fit for subgroup
proportions
Deviance:
Likelihood ratio (LR) statistic
for comparing one’s current
model to the saturated model
Not recommended whenGn
Hosmer–Lemeshow (HL) statistic:
GOF statistic appropriate when
Gn
Computed using O and E cases
and noncases in percentile
subgroups
This presentation is now complete. We have
described how to assess the extent to which a
binary logistic model of interest predicts the
observed outcomes in one’s dataset.
We have identified two alternative models, a
saturated model and a fully parameterized
model, that can be used as possible gold stan-
dard referent points for evaluating the fit of a
given model.
We have also distinguished between two alter-
native data layouts that can be used –subject
specific(SS) vs.events–trials(ET) formats.
A widely used GOF measure for many mathe-
matical models is called thedeviance. How-
ever, the deviance is not recommended for a
binary logistic regression model in which the
number of covariate patterns (G) is close to
the number of subjects (n).
In the latter situation, a popular alternative is
the Hosmer–Lemeshow (HL) statistic, which
is computed from a table of observed and
expected cases and noncases categorized by
percentile subgroups, e.g., deciles of pre-
dicted probabilities.
326 9. Assessing Goodness of Fit for Logistic Regression