Classical GOF approach:
Saturated model gives perfect
fit for individual
subjects
Why?
Yi¼0 or 1 only possible
outcomes for
subjecti
However, problematic for logistic
regression
Saturated model:kþ 1 ¼n
EXAMPLE
Previous example (n¼40, 4 covariate
patterns):
Model 4(SS saturated model)
logit PðXÞ¼o 1 Z 1 þo 2 Z 2 þo 3 Z 3
þþo 40 Z 40
Zi¼
1 if subjecti; i¼ 1 ; 2 ;...; 40
0 otherwise
(
LSS¼
Y^40
i¼ 1
PðXiÞYið 1 PðXiÞÞ^1 Yi
whereXidenotes the values ofXfor
subjecti
Subjecti:Zi¼ 1 ;otherZs¼ 0
+
logit PðXiÞ¼oiand
PðXiÞ¼ 1 =½ 1 þexpðoiÞ
Saturated model
+
Y^idefP^ðXiÞ¼Yi;i¼ 1 ; 2 ;...;n
The traditional (i.e., “classical”) GOF approach
considers the saturated model as the ideal for
“perfect fit.” This makes sense when the units
of analysis areindividual subjects, since their
actual observed outcomes are 0 or 1, rather
than some value in between. However, as we
will explain further (later below), use of the
saturated model to assess GOF for logistic
regression is problematic.
Recall that we originally defined the saturated
model as that model for which the number of
parameters (kþ1) equals the sample size (n).
For our example involving four covariate pat-
terns for 40 subjects, thesubject-specific(SS)
saturated model is shown at the left. This
model does not have an intercept term, but
does contain 40 parameters, as defined by the
oi. TheZiare dummy variables that distinguish
the 40 subjects.
The likelihood function for this (SS) saturated
model is shown at the left. In this formula,Yi
denotes the observed value (either 0 or 1) for
theith individual in the dataset.
Note that for subjecti,Zi¼1 andZk¼0 for
k 6 ¼i,soP(Xi) can be written in terms of the
regression coefficientoithat involves only that
one subject.
Furthermore, since the saturated model per-
fectly fits the data, it follows that the maximum
likelihood (ML) estimateY^i;which equalsP^ðXiÞ
by definition, must be equal to the observedYi
for each subjecti.
Presentation: II. Saturated vs. Fully Parameterized Models 311