Basic Statistics

(Barry) #1
50 MEASURES OF LOCATION AND VARIABILITY

5.1 MEASURES OF LOCATION

The number most often used to describe the center of a distribution is called the
average or arithmetic mean. Here, we call it the mean to avoid confusion: There are
many types of averages. Additional measures of location are described that are useful
in particular circumstances.


5.1.1 The Arithmetic Mean

The Greek letter mu, p, is used to denote the mean of a population; x (X-bar) is
used to denote the mean of a sample. In general, Greek letters denote parameters of
populations and Roman letters are used for sample statistics.
The mean for a sample is defined as the sum of all the observations divided by the
number of observations. In symbols, if n is the number of observations in a sample,
and the first, second, third, and so on, observations are called XI. X2, X3,... , X,,
then x = (XI + X2 + X3 +... + X,)/n.
As an example, consider the sample consisting of the nine observations 8, 1, 2, 9,
3, 2, 8, 1, 2. Here n, the sample size, is 9; XI, the first observation, is 8; Xz, the
second observation, is 1. Similarly, X3 = 2. X, = 9, and so on, with X, = 2. Then,


  • X = (8 + 1 + 2 +9 + 3+ 2 + 8+ 1 + 2)/9
    = 36/9 = 4


The formula x = (XI + X2 +... + X,)/n may be stated more concisely by
using summation notation. In this notation, the formula is written x = C:=l X,/n.
The symbol C means summation and C:=, X, may be read as “the sum of the X,’s
from X1 to Xn,” where n is the sample size. The formula is sometimes simplified
by not including the subscript i and writing 7 = C X/n. Here X stands for any
observation, and C X means “the sum of all the observations.”
A similar formula p = C X/N holds for p, the population mean, where N stands
for the number of observations in the population, or the population size. We seldom
calculate p, for we usually do not have the data from the entire population. Wishing
to know p, we compute
The mean has several interesting properties. Here, we mention several; others are
given in Section 6.3. If we were to physically construct a histogram making the bars
out of some material such as metal and not including the axes, the mean is the point
along the bottom of the histogram where the histogram would balance on a razor
edge.
The total sum of the deviations around the mean will always be zero. Around any
other value the sum of the differences will not be zero (see Weisberg [1992]). That
is, c(X - x) = 0. Further, the sum of the squared deviations around the mean is
smaller than the sum of the squared deviations around any other value. That is, the
numerical value of C(X - x)2 will be smaller than if x were replaced by any other
number.
Since the mean is the total divided by the sample size, we can easily obtain the
mean from the total, and vice versa. For example, if we weigh three oranges whose


and use 7 as an approximation to p.

Free download pdf