Data Analysis with Microsoft Excel: Updated for Office 2007

(Tuis.) #1

156 Fundamentals of Statistics


lower values. Both approaches have their limitations, and the best approach
is to examine the data, create a histogram or stem and leaf plot of the dis-
tribution, and thoroughly understand your data before attempting to sum-
marize it. Even then, it may be best to include several summary measures to
compare.
The mean and median are the most common summary statistics, but there
are others. Let’s examine those now.
One method of reducing the effect of extreme values on the mean is to
calculate the trimmed mean. The trimmed mean is the mean of the data val-
ues calculated after excluding a percentage of the values from the lower and
upper tails of the distribution. For example, the 10% trimmed mean would
be equal to the average of the middle 90% of the data after exclusion of val-
ues from the lower and upper 5% of the range. The trimmed mean can be
thought of as a compromise between the mean and the median.
Another commonly used measure of the center is the geometric mean.
The geometric mean is the nth root of the product of the data values.

Geometric mean 5

n
"^1 x 12 #^1 x 22 # c^1 xn^2

Once again, the symbols x 1 to xn represent the individual data values from
a data set with n observations. The geometric mean is most often used when
the data come in the form of ratios or percentages. Certain drug experiments
are recorded as percentage changes in chemical levels relative to a baseline
value, and those values are best summarized by the geometric mean. The
geometric mean can also be used in situations where the distribution of the
values is highly skewed in the positive or negative direction. The geometric
mean cannot be used if any of the data values are negative or zero.
Another measure, not widely used today (though the ancient Greeks used it
extensively), is the harmonic mean. The formula for the harmonic mean H is

1
H

5

1

n

(^) a
n
i 51


1

xi

The harmonic mean can be used to calculate the mean values of rates. For
example, a car traveling at a rate of S miles per hour to a destination and
then at a rate of T miles per hour on the return trip, travels at an average rate
equal to the harmonic mean of S and T.
Our fi nal measure of the center is the mode. The mode is the most fre-
quently occurring value in a distribution. The mode is most often used when
we are working with qualitative data or discrete quantitative data, basically
any data in which there are a limited number of possible values. The mode
is not as useful in continuous quantitative data, because if the data are truly
continuous, we would expect few, if any, repeat values.
Table 4-5 displays the Excel functions used to calculate the various mea-
sures of the distribution’s center.
Free download pdf