Logistic Regression: A Self-learning Text, Third Edition (Statistics in the Health Sciences)

(vip2019) #1

III. Screening Variables


Scenario:
Logistic Model
E(0,1) vs.D(0,1)
C 1 ,C 2 ,...,Cp
“large”p

Desired initial model:

Logit PðXÞ¼aþbEþ~

p

j¼ 1

gjCj

þ~

p

j¼ 1

djECj

Follow hierarchical BW elimina-
tion strategy (Chap. 7)
However, suppose:

 Computer program does not run
or
 Fitted model unreliable (“large”
p)

8


>>>


>>


>>


>>


>>


>>>


>>


>>


>>


>>>


>>


>>


>>


>>


>>>


>>


>>


<


>>>


>>


>>


>>


>>>


>>


>>


>>


>>


>>>


>>


>>


>>


>>>


>>


>>


>>


:


What do you do?

OPTIONS (large-number-of-vari-
ables problem)


  1. Screening:
     Exclude someCjone-at-a-
    time
     Begin again with reduced
    model

  2. Collinearity diagnostics on
    initial model:
     Exclude someCjand/or
    ECjstrongly related to
    other variables in the model

  3. Forward algorithm for
    interactions:
     Start withEand allCj,
    j¼1,...,p
     Sequentially add significant
    ECj


In this section, we address the following sce-
nario: Suppose you wish to fit a binary logistic
model involving a binary exposure and out-
come variablesE and Dcontrolling for the
potential confounding and effect-modifying
effects of a “large” number of variables Cj,
j¼1, 2,...,pthat you have identified from
the literature.

You would like to begin with a model contain-
ingE, the main effects of eachCj, and all prod-
uct terms of the formECj, and then follow
the hierarchical backward elimination strategy
described in Chap. 7 to obtain a “best” model.

However, when you run a computer program
(e.g., SAS’s Proc Logistic) to fit this model, you
find that the model does not run or you decide
that, even if the model runs, the resulting fitted
model is too unreliable because of the large
number of variables being considered. What
do you do in this situation?

There are several possible options:


  1. Use some kind of “screening” technique to
    exclude some of the Cj variables from the
    model one-at-a-time, and then begin again
    with a reduced-sized model that you hope is
    reasonably reliable and/or at least will run.

  2. Use “collinearity” diagnostic methods
    starting with the initial model to exclude vari-
    ables (typically product terms) that are
    strongly related to other variables in the model.

  3. Use a forward regression algorithm that
    starts with a model containing all main effect
    Cjterms and proceed to sequentially add statis-
    tically significant product terms.


Presentation: III. Screening Variables 263
Free download pdf