To illustrate GOF assessment when using
binary logistic regression, consider the follow-
ing observed data from a cohort study on
40 subjects. The outcome variable is calledD,
there is one binary exposure variable (E), and
there is one binary covariate (V).
These data indicate thatVis an effect modifier
of theE,Drelationship, since the odds ratios of
2.250 and 0.184 are very different and are on
opposite sides of the null value of 1.
Three models that may be fit to these data are
shown at the left. In model 1,Eis the only
predictor. In model 2, bothEandVare predic-
tors. Model 3 includes the product termEV
in addition to bothEandVmain effect terms.
Since the total sample size is 40, whereas each
of these models contains 2, 3, and 4 para-
meters, respectively,noneof these three mod-
els are saturated becausekþ 1 <nfor each
model.
Saturated Model (general):
kþ 1 ¼n
where
kþ 1 ¼#of parameters
(including intercept)
n¼sample size
The linear regression model here involves only
two parameters,b 0 andb 1 , whose estimates
yield predicted values equal to the two
observed values of 115 and 170.
Thus, in this example, a saturated model is
obtained when the number of model para-
meters (kþ 1 ¼2) is equal to the number of
subjects in the dataset. (Note:k¼#of vari-
ables in the model, and the “ 1 ” refers to the
intercept parameter.)
More generally, the saturated model for a
given dataset is defined as any model that con-
tains as many parameters as the number of
“observations” in the dataset, i.e., the sample
size.
EXAMPLE
OR V= 1 = 2.250 OR (^) V= 0 = 0.184
Very different
⇓
V is effect modifier of ORE,D
D= 1
D= 0
D= 1
D= 0
E=1E= 0
6410
4610
3710
7310
V= 1
E=1E= 0
V= 0
1 <1
Observed Cohort Data
Model 1: logit P(X)¼aþbE
Model 2: logit P(X)¼aþbEþgV
Model 3: logit P(X)¼aþbEþgV
þdEV
n¼40 butkþ 1 ¼2, 3, or 4
i.e.,kþ 1 <nin all 3 models
i.e.,no model is saturated
(for predicting individual outcome)
EXAMPLE (continued)
SBP^ ¼^b 0 þ^b 1 ðFOOTÞ;
where^b 0 ¼ 132 : 5 and^b 1 ¼ 27 : 5
so SBP^ ¼ 132 : 5 þ 27 : 5 ð 9 Þ¼ 115
and SBP^ ¼ 132 : 5 þ 27 : 5 ð 11 Þ¼ 170
Linear model example:
kþ 1 (¼n)¼2,
wherekþ 1 ¼#of parameters
(including intercept)
andn¼#of subjects
306 9. Assessing Goodness of Fit for Logistic Regression