Logistic Regression: A Self-learning Text, Third Edition (Statistics in the Health Sciences)

(vip2019) #1
Confounding:no statistical testing
#
Validity------systematic error

(Statistical testing — random error)


Confounding in logistic
regression — a validity issue


Computer algorithms no good
(involve statistical testing)


Statistical issues beyond the scope
of this presentation:


 Multicollinearity


 Multiple testing


 Influential observations


Multicollinearity:


 Independent variables
approximately determined by
other independent variables


 Regression coefficients
unreliable


Multiple testing:


 The more tests, the more likely
significant findings, even if no
real effects
 Variable selection procedures
may yield an incorrect model
because of multiple testing


When later describing this last stage in more
detail we will emphasize thatthe assessment of
confounding is carried out without using statis-
tical testing. This follows from general epidemi-
ologic principles in that confounding is a
validity issue that addresses systematic rather
than random error. Statistical testing is appro-
priate for considering random error rather
than systematic error.

Our suggestions for assessing confounding
using logistic regression are consistent with
the principle that confounding is a validity
issue. Standard computer algorithms for vari-
able selection, such as forward inclusion or
backward elimination procedures, are not
appropriate for assessing confounding because
they involve statistical testing.

Before concluding this overview section, we
point out a few statistical issues needing atten-
tion but which are beyond the scope of this
presentation. These issues aremulticollinearity,
multiple testing, andinfluential observations.

Multicollinearityoccurs when one or more of
the independent variables in the model can be
approximately determined by some of the
other independent variables. When there is
multicollinearity, the estimated regression
coefficients of the fitted model can be highly
unreliable. Consequently, any modeling strat-
egy must check for possible multicollinearity at
various steps in the variable selection process.

Multiple testingoccurs from the many tests of
significance that are typically carried out when
selecting or eliminating variables in one’s
model. The problem with doing several tests
on the same data set is that the more tests one
does, the more likely one can obtain statistically
significant results even if there are no real asso-
ciations in the data. Thus, the process of vari-
able selection may yield an incorrect model
because of the number of tests carried out.
Unfortunately, there is no foolproof method
for adjusting for multiple testing, even though
there are a few rough approaches available.

172 6. Modeling Strategy Guidelines

Free download pdf