Robust Statistics 407
end we define the influence curve (IC), also called influence function, which
measures the influence of a single observation x on a statistic θ for a given
distribution F. In practice, the influence curve is generated by plotting the
value of the computed statistic with a single point of X added to Y against
that X value. For example, the IC of the mean is a straight line.
Several aspects of the influence curve are of particular interest:
■ (^) Is the curve “bounded” as the X-values become extreme? Robust statis-
tics should be bounded. That is, a robust statistic should not be unduly
influenced by a single extreme point.
■ (^) What is the general behavior as the X observation becomes extreme?
For example, does it becomes smoothly down-weighted as the values
become extreme?
■ (^) What is the influence if the X point is in the “center” of the Y points?
Let’s now introduce concepts that are important in applied work, after
which we introduce the robust estimators.
breakdown bound
The breakdown (BD) bound or point is the largest possible fraction of
observations for which there is a bound on the change of the estimate when
that fraction of the sample is altered without restrictions. For example, we
can change up to 50% of the sample points without provoking unbounded
changes of the median. On the contrary, changes of one single observation
might have unbounded effects on the mean.
Rejection point
The rejection point is defined as the point beyond which the IC becomes
zero. Note that observations beyond the rejection point make no contribu-
tion to the final estimate except, possibly, through the auxiliary scale esti-
mate. Estimators that have a finite rejection point are said to be redescending
and are well protected against very large outliers. However, a finite rejection
point usually results in the underestimation of scale. This is because when
the samples near the tails of a distribution are ignored, an insufficient frac-
tion of the observations may remain for the estimation process. This in turn
adversely affects the efficiency of the estimator.
Gross error Sensitivity
The gross error sensitivity expresses asymptotically the maximum effect that
a contaminated observation can have on the estimator. It is the maximum
absolute value of the IC.