Logistic Regression: A Self-learning Text, Third Edition (Statistics in the Health Sciences)

(vip2019) #1

VI. Other Considerations
for Variable
Specification


Data quality:


Measurement error, misclassifica-
tion?


Correct or remove missing data?


(Qualitative) Collinearity:


Are covariates supplying qualita-
tively redundant info?


Example:


Including both employment status
and personal income in
model.


Controlling for same underlying
factor?
(leads to model instability)


Controlling for meaningfully dif-
ferent factors?
(needed for proper control)


Sample size?
If large)can “tease out” effects
of similar covariates


Philosohical issue:complexity vs.
simplicity


 Complexity: If in doubt,
include the variable. Better
safe than sorry.
 Simplicity – If in doubt, keep it
out. It is a virtue to be simple.

There are other issues that need to be consid-
ered at the variable specification stage. We
briefly discuss them in this section.

First, we should consider the quality of the data:
Does the variable contain the information we
want? Is there an unacceptable level of mea-
surement error or misclassification? What is
the number of missing observations? If an
observation is missing foranycovariate in a
model, typically computer programs “throw
out” that observation when running that model.

We should also consider whether there is collin-
earity between covariates. In this context, we
are not considering collinearity as a model diag-
nostic as we describe quantitatively in Chap. 8.
Rather, here we are considering whether two
covariates are qualitatively redundant.

For example, suppose we include two variables
in a model to control for both employment
status and personal income. If these two vari-
ables control the same underlying factor, then
including them both in the same model could
lead to model instability. On the other hand, if
you believe that employment status and per-
sonal income are meaningfully different, then
including them both may be important for
proper control.

A consideration of whether a model can
include similar, but not identical covariates, is
the sample size of the data. A large dataset can
better support the “teasing out” of subtle
effects compared with a dataset with a rela-
tively small number of observations.

Another consideration is philosophical. Some
prefer simplicity – if in doubt, leave the vari-
able out. Others say – if in doubt, include the
variable as it is better to be safe than sorry.
Albert Einstein is attributed to have said
“keep everything as simple as possible, but
not simpler.”

180 6. Modeling Strategy Guidelines

Free download pdf