Variable Specification Summary
Flow Diagram
Choose D, E, C 1 ,... , Cp
Choose Vs from Cs
Choose Ws
from Cs as
Vi or ViVj, i.e.,
interactions
of from EVi
or EViVj
V. Causal Diagrams
Approach for variable selection:
Not just quantitative
Consider causal structure
Depends on the goal
Including covariate in model could
lead to bias:
If caused by exposure
If caused by outcome
Lung cancer causes an abnormal
X-ray, i.e.,
Lung cancerðDÞ Chest X-rayðCÞ
We claim:
E,Dassociation controlling forCis
biased
Model:
logit PðD¼ 1 jXÞ¼b 0 þb 1 SMOKE
þb 2 XRY
WhereDcoded 1 for lung cancer
0 for no lung cancer
SMOKE coded 1 for smokers,
0 for nonsmokers
XRY coded 1 for abnormal X-ray
0 for normal X-ray
expðb 1 Þ¼ORðSMOKE¼ 1 vs: 0 Þ
holding X-ray status
constant
In summary, at the variable specification stage,
the investigator defines the largest possible
model initially to be considered. The flow dia-
gram at the left shows first the choice ofD, E,
and theCs, then the choice of theVs from the
Cs and, finally, the choice of theWs in terms of
theCs.
The decision of specifying which variables are
potential confounders should not just be based
on quantitative methods; we must also con-
sider the possible causal relationships between
the exposure, outcome, potential confounders,
and other relevant variables. Moreover, we
must be clear about the goal of our analysis.
Including a variable in the model that is asso-
ciated with the outcome could lead to bias of
the exposure–disease relationship if the level of
that variablewas causedby the exposure and/
or by the outcome.
Finding an abnormal X-ray could be a conse-
quence of lung cancer (we have indicated this
graphically by the one-sided arrow on the left – a
simple example of acausal diagram). If we were
interested in estimating the causal association
between cigarette smoking and lung cancer (as
opposed to developing our best predictive model
of lung cancer), it would bias our results to
include chest X-ray status as a covariate.
More specifically, consider a logistic model
with lung cancer as the outcome and smoking
status and chest X-ray status as covariates
(model stated on the left).
Now consider the interpretation of the odds
ratio for SMOKE derived from this model,
exp(b 1 ); i.e., the odds of lung cancer among the
smokers divided by the odds of lung cancer
among the nonsmokers, holding X-ray status
constant(i.e., adjusting for X-ray status).
Presentation: V. Causal Diagrams 175