The Average Deviation
At first glance it would seem that if we want to measure how scores are dispersed around
the mean (i.e., deviate from the mean), the most logical thing to do would be to obtain all
the deviations (i.e., ) and average them. You might reasonably think that the more
widely the scores are dispersed, the greater the deviations and therefore the greater the av-
erage of the deviations. However, common sense has led you astray here. If you calculate
the deviations from the mean, some scores will be above the mean and have a positive de-
viation, whereas others will be below the mean and have negative deviations. In the end,
the positive and negative deviations will balance each other out and the sum of the devia-
tions will be zero. This will not get us very far.
The Mean Absolute Deviation
If you think about the difficulty in trying to get something useful out of the average of the
deviations, you might well be led to suggest that we could solve the whole problem by tak-
ing the absolute values of the deviations. (The absolute value of a number is the value of
that number with any minus signs removed. The absolute value is indicated by vertical bars
around the number, e.g., | 2 3| 5 3.) The suggestion to use absolute values makes sense be-
cause we want to know how much scores deviate from the mean without regard to whether
they are above or below it. The measure suggested here is a perfectly legitimate one and
even has a name: the mean absolute deviation (m.a.d.).The sum of the absolute devia-
tions is divided by N(the number of scores) to yield an average (mean) deviation: m.a.d.
For all its simplicity and intuitive appeal, the mean absolute deviation has not played an
important role in statistical methods. Much more useful measures, the variance and the
standard deviation, are normally used instead.
The Variance
The measure that we will consider in this section, the sample variance (s^2 ),represents a
different approach to the problem of the deviations themselves averaging to zero. (When
we are referring to the population variance,rather than the sample variance, we use
[lowercase sigma squared] as the symbol.) In the case of the variance we take advantage of
the fact that the square of a negative number is positive. Thus, we sum the squared devia-
tions rather than the absolute deviations. Because we want an average, we next divide that
sum by some function of N, the number of scores. Although you might reasonably expect
that we would divide by N, we actually divide by (N 2 1). We use (N 2 1) as a divisor for
the sample variance because, as we will see shortly, it leaves us with a sample variance that
is a better estimate of the corresponding population variance. (The population variance is
calculated by dividing the sum of the squared deviations, for each value in the population,
by Nrather than (Nā 1). However, we only rarely calculate a population variance; we al-
most always estimate it from a sample variance.)
If it is important to specify more precisely the variable to which refers, we can sub-
script it with a letter representing the variable. Thus, if we denote the data in Set 4 as X, the
variance could be denoted as. You could refer to , but long subscripts are usually
awkward. In general, we label variables with simple letters like Xand Y.
For our example, we can calculate the sample variances of Set 4 and Set 32 as
follows:^10
s^2 X s^2 Set 4
s^2
s^2
Xi 2 X
40 Chapter 2 Describing and Exploring Data
(^10) In these calculations and others throughout the book, my answers may differ slightly from those that you obtain
for the same data. If so, the difference is most likely caused by rounding. If you repeat my calculations and arrive
at a similar, though different,answer, that is sufficient.
mean absolute
deviation (m.a.d.)
sample
variance (s^2 )
population
variance