Logistic Regression: A Self-learning Text, Third Edition (Statistics in the Health Sciences)

(vip2019) #1
Note, however, that concluding that “none of the
models are saturated” is based on the following
assumption:theunitofanalysisisthesubject.

This assumption is equivalent to listing the
dataset using 40 lines of data, one for each
subject. For Model 2, therefore, each dataline
would contain for a given subject, the values of
the outcome (D) and the predictor variables
(EandV) in the model.

When each subject is the unit of analysis,
we can therefore claim that Model 2 is not
saturated because the number of parameters
(3) in the model is less than total units in the
dataset (n¼40).

However, there is another way to view the
dataset we have been considering:the unit of
analysis is a group of subjects, all of whom have
the same covariate pattern within a group.

This assumption is equivalent to listing the
dataset using only four lines of data, one for
each covariate pattern. Each dataline would
contain for a given group of subjects, the num-
ber of cases (D¼1) in each group (dg), the
number of subjects in each group (ng), and
the values of each predictor variable being
modeled, as shown at the left for our dataset.

This type of data layout is called anevents–trials
format, where there aredgevents andngtrials.

Using events–trials format, we can argue that
the number of observations (n) consists of the
total number of datalines (4 in our example),
rather than the number of subjects (40).

Here, the goal of model prediction no longer is
to predict an individual’s (0 or 1) outcome, but
rather to predict the observed proportion of
persons in a group (the unit of analysis) that
has the outcome, i.e.,^pg¼dg=ng.

Thus, using events–trials format, we can declare
a model to be “group-saturated” if for each covar-
iate pattern listed in the dataset, the model per-
fectly predicts the observed proportionp^g.

EXAMPLE (continued)
Key assumption:
Unit of analysis is the subject

Datalines listed by subject (e.g.,
Model 2)
Subject (i) D E V
1111
2111
... ... ... ...
39 0 0 0
40 0 0 0

If subjects are units of analysis,
then Model 2 is not saturated
(kþ 1 ¼ 3 <n¼40)

8







<










:





Alternative assumption:
Unit of analysis is a group
(subjects with same covariate
pattern)

Datalines Listed by Group (e.g.,
Model 2)
Group (g) dg ng EV
161011
241001
331010
471000

Events–trials format
(dg)(ng)

n¼#of observations¼ 4
(Model 2)

Goal of model prediction:
p^g¼dg=ng
(group predictionrather than
individual prediction)

Events–trials format:
“Group-saturated” providedn¼kþ 1
forncovariate patterns
(i.e., perfectly predictsp^g)

8





<








:





Presentation: II. Saturated vs. Fully Parameterized Models 307
Free download pdf