104 CHAPTER 7 Correlation, Regression, and Logistic Regression
(^) Ygx==+( , )θθθ θ 12 exp( 3 x),
is a simple nonlinear regression because it is nonlinear in the parameter
θ 3 , and it cannot be transformed into a linear regression. Multiple linear
regression is also linear in the parameters. There is a nonlinear regres-
sion analog to simple nonlinear regression. However, we will not cover
nonlinear regression, and the interested reader should refer to Gallant
( 1987 ) and Bates and Watts ( 1988 ), which are both authoritative texts
on nonlinear regression.
7.4 SENSITIVITY TO OUTLIERS AND
ROBUST REGRESSION
Outliers are unusual or extreme observations within a given data set.
We might expect laboratory data and other measured data taken on
humans to be normally distributed, with approximately 95% of the
cases falling within two standard deviations of the mean. Nevertheless,
particularly in large samples, extreme values may occur. This could be
due to the actually occurrence of an extreme value from the normal
distribution, or it could be a measurement, coding, or data entry error.
In small samples, this is also possible with all the same explanations.
However, the chance of an extreme outcome from a normal distribution
is much less likely to occur in small samples.
For the simple linear regression problem discussed in the previous
section, we showed how to compute the slope and intercept parameters
as a least squares solution. Since the method involves minimizing the
sum of squared residuals, these parameter estimates are very sensitive
to outliers. This is analogous to the sensitivity of the sample mean to
outliers. The sample mean is the least squares estimate of the mean
from a sample of independent identically distributed observations.
Because the sum of squares is minimized, outliers pull the estimate
toward their value, and hence execute great infl uence than observations
near the true population mean.
In regression, the slope is pulled up or down depending on the
direction of the outlier. Robust regression methods are used to mini-
mize the infl uence of outliers at the price of statistical effi ciency.
However, when outliers are possible, the sacrifi ce in effi ciency is often
