Introductory Biostatistics

(Chris Devlin) #1
Stepwise regression procedure. Stepwise regression is a modified version
of forward regression that permits reexamination, at every step, of the
variables incorporated in the model in previous steps. A variable entered
at an early stage may become superfluous at a later stage because of its
relationship with other variables now in the model; the information it
provides becomes redundant. That variable may be removed, if meeting
the elimination criterion, and the model is refitted with the remaining var-
iables, and the forward process goes on. The entire process, one step for-
ward followed by one step backward, continues until no more variables
can be added or removed. Without an automatic computer algorithm, this
comprehensive strategy may be too tedious to implement.

Criteria For the first step of the forward selection procedure, decisions are
based on individual score test results (chi-square, 1 df). In subsequent steps,
both forward and backward, the ordering of levels of importance (step 2)
and the selection (test in step 3) are based on the likelihood ratio chi-square
statistic:


wLR^2 ¼ 2 ½lnLðbb^;all otherX’sÞlnLðbb^;all otherX’s with oneXdeletedފ

In the case of Poisson regression, a computer-packaged program such as
SAS’s PROC GENMOD does not have an automatic stepwise option. There-
fore, the implementation is much more tedious and time consuming. In select-
ing the first variable (step 1), we have to fit simple regression models to every
factor separately, then decide, based on the computer output, on the first
selection before coming back for computer runs in step 2. At subsequent steps
we can tave advantage oftype 1 analysisresults.


Example 10.16 Refer to the data set on emergency service of Example 10.5
(Table 10.2) with all four covariates: workload (hours), residency, gender, and
revenue. This time we perform a regression analysis using forward selection in
which we specify that a variable has to be significant at the 0.10 level before it
can enter into the model. In addition, we fit all overdispersed models using
DSCALE option in PROC GENMOD.
The results of the four simple regression analyses are shown in Table 10.13.
Workload (hours) meets the entrance criterion and is selected. In the next step,
we fit three models each with two covariates: hours and residency, hours and


TABLE 10.13
Variable LRw^2 pValue
Hours 4.136 0.0420
Residency 2.166 0.1411
Gender 0.845 0.3581
Revenue 0.071 0.7897

POISSON REGRESSION MODEL 371
Free download pdf