Logistic Regression: A Self-learning Text, Third Edition (Statistics in the Health Sciences)

(vip2019) #1
Variable Specification Summary
Flow Diagram

Choose D, E, C 1 ,... , Cp

Choose Vs from Cs

Choose Ws
from Cs as
Vi or ViVj, i.e.,
interactions
of from EVi
or EViVj

V. Causal Diagrams


Approach for variable selection:
 Not just quantitative
 Consider causal structure
 Depends on the goal

Including covariate in model could
lead to bias:
 If caused by exposure
 If caused by outcome

Lung cancer causes an abnormal
X-ray, i.e.,

Lung cancerðDÞ Chest X-rayðCÞ


We claim:

E,Dassociation controlling forCis
biased

Model:
logit PðD¼ 1 jXÞ¼b 0 þb 1 SMOKE
þb 2 XRY

WhereDcoded 1 for lung cancer
0 for no lung cancer
SMOKE coded 1 for smokers,
0 for nonsmokers
XRY coded 1 for abnormal X-ray
0 for normal X-ray

expðb 1 Þ¼ORðSMOKE¼ 1 vs: 0 Þ


holding X-ray status
constant

In summary, at the variable specification stage,
the investigator defines the largest possible
model initially to be considered. The flow dia-
gram at the left shows first the choice ofD, E,
and theCs, then the choice of theVs from the
Cs and, finally, the choice of theWs in terms of
theCs.

The decision of specifying which variables are
potential confounders should not just be based
on quantitative methods; we must also con-
sider the possible causal relationships between
the exposure, outcome, potential confounders,
and other relevant variables. Moreover, we
must be clear about the goal of our analysis.

Including a variable in the model that is asso-
ciated with the outcome could lead to bias of
the exposure–disease relationship if the level of
that variablewas causedby the exposure and/
or by the outcome.

Finding an abnormal X-ray could be a conse-
quence of lung cancer (we have indicated this
graphically by the one-sided arrow on the left – a
simple example of acausal diagram). If we were
interested in estimating the causal association
between cigarette smoking and lung cancer (as
opposed to developing our best predictive model
of lung cancer), it would bias our results to
include chest X-ray status as a covariate.

More specifically, consider a logistic model
with lung cancer as the outcome and smoking
status and chest X-ray status as covariates
(model stated on the left).

Now consider the interpretation of the odds
ratio for SMOKE derived from this model,
exp(b 1 ); i.e., the odds of lung cancer among the
smokers divided by the odds of lung cancer
among the nonsmokers, holding X-ray status
constant(i.e., adjusting for X-ray status).

Presentation: V. Causal Diagrams 175
Free download pdf