Introductory Biostatistics

(Chris Devlin) #1

and that the e¤ect of acid phosphatase on the response may be somewhat
stronger for a certain combination of levels of the other four variables.

Stepwise Regression In many applications (e.g., a case–control study on a
specific disease) our major interest is to identify important risk factors. In other
words, we wish to identify from many available factors a small subset of factors
that relate significantly to the outcome (e.g., the disease under investigation). In
that identification process, of course, we wish to avoid a large type I (false
positive) error. In a regression analysis, a type I error corresponds to including
a predictor that has no real relationship to the outcome; such an inclusion can
greatly confuse the interpretation of the regression results. In a standard multi-
ple regression analysis, this goal can be achieved by using a strategy that adds
into or removes from a regression model one factor at a time according to a
certain order of relative importance. Therefore, the two important steps are as

  1. Specify a criterion or criteria for selecting a model.

  2. Specify a strategy for applying the criterion or criteria chosen.

Strategies This is concerned with specifying the strategy for selecting vari-
ables. Traditionally, such a strategy is concerned with whether and which a
particular variable should be added to a model or whether any variable should
be deleted from a model at a particular stage of the process. As computers
became more accessible and more powerful, these practices became more

Forward selection procedure

  1. Fit a simple logistic linear regression model to each factor, one at a

  2. Select the most important factor according to a certain predetermined

  3. Test for the significance of the factor selected in step 2 and determine,
    according to a certain predetermined criterion, whether or not to add
    this factor to the model.

  4. Repeat steps 2 and 3 for those variables not yet in the model. At any
    subsequent step, if none meets the criterion in step 3, no more variables
    are included in the model and the process is terminated.
    Backward elimination procedure

  5. Fit the multiple logistic regression model containing all available inde-
    pendent variables:

  6. Select the least important factor according to a certain predetermined
    criterion; this is done by considering one factor at a time and treating it
    as though it were the last variable to enter.

Free download pdf