Logistic Regression: A Self-learning Text, Third Edition (Statistics in the Health Sciences)

III. Screening Variables

Scenario: Logistic Model E(0,1) vs.D(0,1) C 1 ,C 2 ,...,Cp “large”p

Desired initial model:

Logit PðXÞ¼aþbEþ~

p

j¼ 1

gjCj

þ~

p

j¼ 1

djECj

Follow hierarchical BW elimination strategy (Chap. 7) However, suppose:

Computer program does not run or Fitted model unreliable (“large” p)

8

>>>

>>

>>>

>>

>>>

>>

>>>

>>

<

>>>

>>

>>>

>>

>>>

>>

>>>

>>

:

What do you do?

OPTIONS (large-number-of-variables problem)

Screening:
Exclude someCjone-at-a-
time
Begin again with reduced
model

Collinearity diagnostics on
initial model:
Exclude someCjand/or
ECjstrongly related to
other variables in the model

Forward algorithm for
interactions:
Start withEand allCj,
j¼1,...,p
Sequentially add significant
ECj

In this section, we address the following scenario: Suppose you wish to fit a binary logistic model involving a binary exposure and out- come variablesE and Dcontrolling for the potential confounding and effect-modifying effects of a “large” number of variables Cj, j¼1, 2,...,pthat you have identified from the literature.

You would like to begin with a model contain- ingE, the main effects of eachCj, and all product terms of the formECj, and then follow the hierarchical backward elimination strategy described in Chap. 7 to obtain a “best” model.

However, when you run a computer program (e.g., SAS’s Proc Logistic) to fit this model, you find that the model does not run or you decide that, even if the model runs, the resulting fitted model is too unreliable because of the large number of variables being considered. What do you do in this situation?

There are several possible options:

Use some kind of “screening” technique to
exclude some of the Cj variables from the
model one-at-a-time, and then begin again
with a reduced-sized model that you hope is
reasonably reliable and/or at least will run.

Use “collinearity” diagnostic methods
starting with the initial model to exclude vari-
ables (typically product terms) that are
strongly related to other variables in the model.

Use a forward regression algorithm that
starts with a model containing all main effect
Cjterms and proceed to sequentially add statis-
tically significant product terms.

Presentation: III. Screening Variables 263

Logistic Regression: A Self-learning Text, Third Edition (Statistics in the Health Sciences)

8

>>>

>>

>>

>>

>>

>>>

>>

>>

>>

>>>

>>

>>

>>

>>

>>>

>>

>>

<

>>>

>>

>>

>>

>>>

>>

>>

>>

>>

>>>

>>

>>

>>

>>>

>>

>>

>>

:

Get our desktop app

Company

Features

Documentation

Resources