like taking a multiple-choice test on Mongolian history without having taken the course.
The data follow:
Read Did Not Read
Passage Passage
Mean 69.6 46.6
SD 10.6 6.8
CV 15.2 14.6
The ratio of the two standard deviations is 10.6/6.8 5 1.56, meaning that the Read
group had a standard deviation that was more than 50% larger than that of the Did Not
Read group. On the other hand, the coefficients of variation are virtually the same for the
two groups, suggesting that any difference in variability between the groups can be ex-
plained by the higher scores in the first group. (Incidentally, chance performance would
have produced a mean of 20 with a standard deviation of 4. Even without reading the pas-
sage, students score well above chance levels just by intelligent guessing.)
In using the coefficient of variation, it is important to keep in mind the nature of the
variable that you are measuring. If its scale is arbitrary, you might not want to put too much
faith in the coefficient. But perhaps you don’t want to put too much faith in the variance ei-
ther. This is a place where a little common sense is particularly useful.
The Mean and Variance as Estimators
I pointed out in Chapter 1 that we generally calculate measures such as the mean and
variance to use as estimates of the corresponding values in the populations. Characteris-
tics of samples are called statistics and are designated by Roman letters (e.g., ). Char-
acteristics of populations are called parameters and are designated by Greek letters.
Thus, the population mean is symbolized by μ (mu). In general, then, we use statistics as
estimates of parameters.
If the purpose of obtaining a statistic is to use it as an estimator of a parameter, then it
should come as no surprise that our choice of a statistic (and even how we define it) is based
partly on how well that statistic functions as an estimator of the parameter in question. Actu-
ally, the mean is usually preferred over other measures of central tendency because of its
performance as an estimator of μ. The variance ( ) is defined as it is, with (N– 1) in the
denominator, specifically because of the advantages that accrue when is used to estimate
the population variance ( ).
Four properties of estimators are of particular interest to statisticians and heavily influ-
ence the choice of the statistics we compute. These properties are those of sufficiency, un-
biasedness, efficiency, and resistance. They are discussed here simply to give you a feel for
why some measures of central tendency and variability are regarded as more important
than others. It is not critical that you have a thorough understanding of estimation and
related concepts, but you should have a general appreciation of the issues involved.
Sufficiency
A statistic is a sufficient statisticif it contains (makes use of ) all the information in a sam-
ple. You might think this is pretty obvious because it certainly seems reasonable to base
your estimates on all the data. The mean does exactly that. The mode, however, uses only
the most common observations, ignoring all others, and the median uses only the middle
one, again ignoring the values of other observations. Similarly, the range, as a measure of
dispersion, uses only the two most extreme (and thus most unrepresentative) scores. Here
you see one of the reasons that we emphasize the mean as our measure of central tendency.
s^2
s^2
s^2
X
Section 2.8 Measures of Variability 45
sufficient
statistic