III. Screening Variables
Scenario:
Logistic Model
E(0,1) vs.D(0,1)
C 1 ,C 2 ,...,Cp
“large”p
Desired initial model:
Logit PðXÞ¼aþbEþ~
p
j¼ 1
gjCj
þ~
p
j¼ 1
djECj
Follow hierarchical BW elimina-
tion strategy (Chap. 7)
However, suppose:
Computer program does not run
or
Fitted model unreliable (“large”
p)
8
>>>
>>
>>
>>
>>
>>>
>>
>>
>>
>>>
>>
>>
>>
>>
>>>
>>
>>
<
>>>
>>
>>
>>
>>>
>>
>>
>>
>>
>>>
>>
>>
>>
>>>
>>
>>
>>
:
What do you do?
OPTIONS (large-number-of-vari-
ables problem)
- Screening:
Exclude someCjone-at-a-
time
Begin again with reduced
model - Collinearity diagnostics on
initial model:
Exclude someCjand/or
ECjstrongly related to
other variables in the model - Forward algorithm for
interactions:
Start withEand allCj,
j¼1,...,p
Sequentially add significant
ECj
In this section, we address the following sce-
nario: Suppose you wish to fit a binary logistic
model involving a binary exposure and out-
come variablesE and Dcontrolling for the
potential confounding and effect-modifying
effects of a “large” number of variables Cj,
j¼1, 2,...,pthat you have identified from
the literature.
You would like to begin with a model contain-
ingE, the main effects of eachCj, and all prod-
uct terms of the formECj, and then follow
the hierarchical backward elimination strategy
described in Chap. 7 to obtain a “best” model.
However, when you run a computer program
(e.g., SAS’s Proc Logistic) to fit this model, you
find that the model does not run or you decide
that, even if the model runs, the resulting fitted
model is too unreliable because of the large
number of variables being considered. What
do you do in this situation?
There are several possible options:
- Use some kind of “screening” technique to
exclude some of the Cj variables from the
model one-at-a-time, and then begin again
with a reduced-sized model that you hope is
reasonably reliable and/or at least will run. - Use “collinearity” diagnostic methods
starting with the initial model to exclude vari-
ables (typically product terms) that are
strongly related to other variables in the model. - Use a forward regression algorithm that
starts with a model containing all main effect
Cjterms and proceed to sequentially add statis-
tically significant product terms.
Presentation: III. Screening Variables 263