Introductory Biostatistics

(Chris Devlin) #1

10.3.4 Stepwise Regression


In many applications, our major interest is to identify important risk factors. In
other words, we wish to identify from many available factors a small subset of
factors that relate significantly to the outcome (e.g., the disease under investi-
gation). In that identification process, of course, we wish to avoid a large type I
(false positive) error. In a regression analysis, a type I error corresponds to
including a predictor that has no real relationship to the outcome; such an
inclusion can greatly confuse the interpretation of the regression results. In a
standard multiple regression analysis, this goal can be achieved by using a
strategy that adds to, or removes from, a regression model one factor at a time
according to a certain order of relative importance. Therefore, the two impor-
tant steps are as follows:



  1. Specify a criterion or criteria for selecting a model.

  2. Specify a strategy for applying the criterion or criteria chosen.


Strategies This is concerned with specifying the strategy for selecting vari-
ables. Traditionally, such a strategy is concerned with whether and which par-
ticular variable should be added to a model or whether any variable should be
deleted from a model at a particular stage of the process. As computers became
more accessible and more powerfull, these practices became more popular.


Forward selection procedure


  1. Fit a simple logistic linear regression model to each factor, one at a time.

  2. Select the most important factor according to a certain predetermined
    criterion.

  3. Test for the significance of the factor selected in step 2 and determine,
    according to a certain predetermined criterion, whether or not to add
    this factor to the model.

  4. Repeat steps 2 and 3 for those variables not yet in the model. At any
    subsequent step, if none meets the criterion in step 3, no more variables
    are included in the model and the process is terminated.
    Backward elimination procedure

  5. Fit the multiple logistic regression model containing all available inde-
    pendent variables.

  6. Select the least important factor according to a certain predetermined
    criterion; this is done by considering one factor at a time and treating it
    as though it were the last variable to enter.

  7. Test for the significance of the factor selected in step 2 and determine,
    according to a certain predetermined criterion, whether or not to delete
    this factor from the model.

  8. Repeat steps 2 and 3 for those variables still in the model. At any sub-
    sequent step, if none meets the criterion in step 3, no more variables are
    removed from the model and the process is terminated.


370 METHODS FOR COUNT DATA

Free download pdf