Guidelines applicable to:
Logistic regression
Multiple linear regression
Cox PH regression
Two modeling goals:
(1) To obtain a validE–Destimate
(2) To obtain a good predictive
model
(different strategies for different
goals)
Prediction goal:
Use computer algorithms
Validity goal:
Our focus
For etiologic research
Standard computer algorithms
not appropriate
III. Overview of
Recommended Strategy
Three stages:
(1) Variable specification
(2) Interaction assessment
(3) Confounding assessment
followed by precision
Variable specification:
Restricts attention to clinically
or biologically meaningful
variables
Provides largest possible initial
model
Modeling strategy guidelines are also important
for modeling procedures other than logistic
regression. In particular, classical multiple lin-
ear regression and Cox proportional hazards
regression, although having differing model
forms,allhaveincommonwithlogisticregres-
sion the goal of describing exposure–disease
relationships when used in epidemiologic
research. The strategy offered here, although
describedinthecontextoflogisticregression,
is applicable to a variety of modeling procedures.
There are typically two goals of mathematical
modeling: One is to obtain a valid estimate of
an exposure–disease relationship and the other
is to obtain a good predictive model. Depend-
ing on which of these is the primary goal of the
researcher, different strategies for obtaining
the “best” model are required.
When the goal is “prediction”, it may be more
appropriate to use computer algorithms, such as
backward elimination or all possible regressions,
which are built into computer packages for dif-
ferent models. [See Kleinbaum et al. (2008)]
Our focus in this presentation is on the goal of
obtaining a valid measure of effect. This goal is
characteristic of most etiologic research in epi-
demiology. For this goal, standard computer
algorithms do not apply because the roles that
variables – such as confounders and effect
modifiers – play in the model must be given
special attention.
The modeling strategy we recommend involves
three stages: (1)variable specification,(2)inter-
action assessment, and (3)confounding assess-
ment followed by consideration of precision.We
have listed these stages in the order that they
should be addressed.
Variable specification is addressed first because
this step allows the investigator to use the
research literature to restrict attention to clini-
cally or biologically meaningful independent
variables of interest. These variables can then be
defined in the model to provide the largest possi-
ble meaningful model to be initially considered.
Presentation: III. Overview of Recommended Strategy 169