The Essentials of Biostatistics for Physicians, Nurses, and Clinicians

(Ann) #1
7.4 Sensitivity to Outliers and Robust Regression 105

more than made up for by the reduction in the bias that the outlier(s)
may cause. Later, we shall see that there are also diagnostics that can
be used in regression to tell when the least squares estimates are infl u-
enced by outliers.
There are two strategies for dealing with outliers in regression. One
is to detect and remove the outliers. The other is to use a robust regres-
sion procedure in place of least squares. Robust regression is sometimes
preferred because it is viewed as accommodating outliers, whereas the
removal of an outlier really is a statement that the data point has no
value toward the estimation of the parameter.
Deciding that the outlier is an erroneous observation is not some-
thing that you can know by just looking at the data, and so removal of
outliers should only be done when, after checking the source for gen-
erating the data, an actual error is identifi ed. Outliers in regression also
have greater infl uence on the slope when they are near the upper or
lower limits on the x - axis. These outliers are called leverage points. In
general, any point near the upper or lower limits on the x - axis is a
leverage point. But if the leverage point does not affect the estimate
very much when it is removed, it is not an outlier with respect to the
bivariate distribution of X and Y.
One robust regression method is to fi nd the estimated coeffi cients
that minimize the sum of absolute errors. By doing this, the outliers
have less infl uence than when the deviations are squared. In the case
of the mean, a robust sample estimate is the median. It turns out that
the median minimizes the sum of absolute deviations of the observa-
tions from the estimate. So taking the mean absolute error for regres-
sion parameter estimates is analogous to using the median as an estimate
of the mean for a simple random sample. There are many other robust
regression procedures. We will not cover them here. See Huber ( 1981 )
or Maronna et al. ( 2006 ) for the details.
I choose a very dramatic example from the 2000 presidential
election votes counted in the state of Florida. Although this is not a
medical example, it is a very familiar example that makes the case very
well. The number of votes received by Patrick Buchanan in Palm Beach
County was very high relative to other counties, and hence represents
an outlier.
You may recall that the Gore campaign contested the voting
results in Florida due to several irregularities that they believe cost

Free download pdf