these outliers, whereas the mean does not. This property is called the resistanceof the
estimator. In recent years, considerably more attention has been placed on developing re-
sistant estimators—such as the trimmed mean discussed earlier. These are starting to filter
down to the level of everyday data analysis, though they have a ways to go.
The Sample Variance as an Estimator
of the Population Variance
The sample variance offers an excellent example of what was said in the discussion of unbi-
asedness. You may recall that I earlier sneaked in the divisor of N 2 1 instead of Nfor the
calculation of the variance and standard deviation. Now is the time to explain why. (You
may be perfectly willing to take the statement that we divide by N– 1 on faith, but I get a lot
of questions about it, so I guess you will just have to read the explanation—or skip it.)
There are a number of ways to explain why sample variances require N 2 1 as the de-
nominator. Perhaps the simplest is phrased in terms of what has been said about the sample
variance ( ) as an unbiased estimate of the population variance ( ). Assume for the mo-
ment that we have an infinite number of samples (each containing Nobservations) from
one population and that we know the population variance. Suppose further that we are fool-
ish enough to calculate sample variances as
(Note the denominator.) If we take the average of these sample variances, we find
where E[ ] is read as “the expected value of (whatever is in brackets).” Thus the average
value of is not It is a biased estimator.
Degrees of Freedom
The foregoing discussion is very much like saying that we divide by N 2 1 because it
works. But whydoes it work? To explain this, we must first consider degrees of freedom
(df).Assume that you have in front of you the three numbers 6, 8, and 10. Their mean is 8.
You are now informed that you may change any of these numbers, as long as the mean is
kept constant at 8. How many numbers are you free to vary? If you change all three of them
in some haphazard fashion, the mean almost certainly will no longer equal 8. Only two of
the numbers can be freely changed if the mean is to remain constant. For example, if you
change the 6 to a 7 and the 10 to a 13, the remaining number is determined; it must be 4 if
the mean is to be 8. If you had 50 numbers and were given the same instructions, you
would be free to vary only 49 of them; the 50th would be determined.
Now let us go back to the formulae for the population and sample variances and see
why we lost one degree of freedom in calculating the sample variances.
In the case of , μ is known and does not have to be estimated from the data. Thus, no
dfare lost and the denominator is N. In the case of , however, μ is not known and must be
estimated from the sample mean ( ). Once you have estimated μ from , you have fixed itX X
s^2
s^2
s^2 =
a(X^2 X)
2
N 21
s^2 =
a(X2m)
2
N
g(X 2 X)^2 /N s^2.
Average
a(X^2 X)
2
N
=E C
a(X^2 X)
2
N
S=
(N 2 1)s^2
N
a(X^2 X)
2
N
s^2 s^2
Section 2.8 Measures of Variability 47
resistance
degrees of
freedom (df)