Statistical Methods for Psychology

these outliers, whereas the mean does not. This property is called the resistanceof the estimator. In recent years, considerably more attention has been placed on developing re- sistant estimators—such as the trimmed mean discussed earlier. These are starting to filter down to the level of everyday data analysis, though they have a ways to go.

The Sample Variance as an Estimator

of the Population Variance

The sample variance offers an excellent example of what was said in the discussion of unbi- asedness. You may recall that I earlier sneaked in the divisor of N 2 1 instead of Nfor the calculation of the variance and standard deviation. Now is the time to explain why. (You may be perfectly willing to take the statement that we divide by N– 1 on faith, but I get a lot of questions about it, so I guess you will just have to read the explanation—or skip it.) There are a number of ways to explain why sample variances require N 2 1 as the denominator. Perhaps the simplest is phrased in terms of what has been said about the sample variance ( ) as an unbiased estimate of the population variance ( ). Assume for the mo- ment that we have an infinite number of samples (each containing Nobservations) from one population and that we know the population variance. Suppose further that we are fool- ish enough to calculate sample variances as

(Note the denominator.) If we take the average of these sample variances, we find

where E[ ] is read as “the expected value of (whatever is in brackets).” Thus the average value of is not It is a biased estimator.

Degrees of Freedom

The foregoing discussion is very much like saying that we divide by N 2 1 because it works. But whydoes it work? To explain this, we must first consider degrees of freedom (df).Assume that you have in front of you the three numbers 6, 8, and 10. Their mean is 8. You are now informed that you may change any of these numbers, as long as the mean is kept constant at 8. How many numbers are you free to vary? If you change all three of them in some haphazard fashion, the mean almost certainly will no longer equal 8. Only two of the numbers can be freely changed if the mean is to remain constant. For example, if you change the 6 to a 7 and the 10 to a 13, the remaining number is determined; it must be 4 if the mean is to be 8. If you had 50 numbers and were given the same instructions, you would be free to vary only 49 of them; the 50th would be determined. Now let us go back to the formulae for the population and sample variances and see why we lost one degree of freedom in calculating the sample variances.

In the case of , μ is known and does not have to be estimated from the data. Thus, no dfare lost and the denominator is N. In the case of , however, μ is not known and must be estimated from the sample mean ( ). Once you have estimated μ from , you have fixed itX X

s^2

s^2 = a(X^2 X)

2

N 21

s^2 = a(X2m)

2

N

g(X 2 X)^2 /N s^2.

Average a(X^2 X)

2

N

=E C

a(X^2 X)

2

N

S=

(N 2 1)s^2 N

a(X^2 X)

2

N

s^2 s^2

Section 2.8 Measures of Variability 47

resistance

degrees of
freedom (df)

Statistical Methods for Psychology

=E C

S=

Get our desktop app

Company

Features

Documentation

Resources