Chapter 4 Describing Your Data 155
The total number of observations in the sample is represented by the
symbol n, and each individual value is represented by x followed by a sub-
script. The fi rst value is x 1 , the second value is x 2 , and so forth, up to the last
value (the nth value), which is represented by xn. The formula calls for us
to sum all of these values, an operation represented by the Greek symbol S
(pronounced “sigma”), a summation symbol. In this case, we’re instructed
to sum the values of xi, where i changes in value from 1 up to n; in other
words, the formula tells us to calculate the value of x 1 1 x 2 1 ... 1 xn. The
average, or mean, is equal to this expression divided by the total number of
observations.
How do these two measures, the median and the mean, compare? One weak-
ness of the mean is that it can be infl uenced by extreme values. Figure 4-17
shows a distribution of professional baseball salaries. Note that most of the
salaries are less than $1 million per year, but there are a couple of players
who make more than $20 million per year. What, then, is a typical salary?
The median value for this distribution is about $3,500,000, but the mean sal-
ary is almost $4,700,000. The median seems more representative of what the
typical player makes, whereas the mean salary is higher as a result of the
infl uence of a couple of much larger salaries. If you were a union representa-
tive negotiating a new contract, which fi gure would you quote? If you repre-
sented management, which value better refl ects your expenses in salaries?
Figure 4-17
Distribution
of baseball
salaries
The lesson from this example is that you should not blindly accept any
single summary measure. The mean is sensitive to extreme values; the me-
dian overcomes this problem by ignoring the magnitude of the upper and