Basic Statistics

(Barry) #1
SAMPLING PROPERTIES OFTHE MEAN AND VARIANCE 55

the distribution into four equal parts, with 25% of the distribution in each part. We
have already introduced one of the quartiles, Q2, which is the median. The quartile
Q1 divides the lower half of the distribution into halves; Q3 divides the upper half of
the distribution into halves. Quartiles are computed by first ordering the data, and the
locationofQ1 is .25(n+l),Qzis .50(n+l),andQ3is .75(n+1). Theinterquartile
range is available in many statistical programs. The Q1 and Q3 quartiles are not easy
measures to compute by hand, as they often require interpolation. (Interpolation is
a method of estimating an unknown value by using its position among a series of
known values. An example of interpolation will be given in Section 6.2.2.) Since the
quartiles are not sensitive to the numerical values of extreme observations, they are
considered measures of location resistant to the effect of outliers.
Note that the numerical value of the difference between the median and Q1 does
not have to equal the difference between Q3 and the median. If the distribution is
skewed to the right, then Q3 minus the median usually is larger than the median minus
Q1. But the proportion of observations is the same.
For small samples, fourths are simpler measures to compute. If n is even, we
simply compute the median of the lower and upper halves of the ordered observations
and call them the lower and upper fourths, respectively. If n is odd, we consider
the middle measurement, or median, to be part of the lower half of measurements
and compute the median of these measurements, Q1. Then assign the median to
the upper half of the measurements and compute the median of the upper half, Q3.
Another name for fourths is hinges. The difference between the upper and lower
fourths (called the fourth-spread) should be close but not necessarily equal to the
interquartile range since the quartiles are not necessarily equal to the fourths. For a
more complete discussion of fourths, see Hoaglin et al. [1983].
Quartiles or fourths are often used when the distribution is skewed or outliers are
expected. An additional reason for using them is given in Section 5.5.


5.3 SAMPLING PROPERTIES OFTHE MEAN AND VARIANCE


To illustrate the behavior of the means and variances of samples, a population con-
sisting of just four numbers (2, 10, 4, 8) is considered. The population has a mean
p = 24/4 = 6 and a variance of c(X - P)~/N = 40/4 = 10. All possible samples
of size 2 from this small population are given in the first column of Table 5.2. There
are 16 possible samples since we have sampled with replacement. The means and
variances have been calculated from each sample and are listed in the second and last
columns of the table. These columns are labeled x and s2.
The 16 sample means may be considered to be a new population-a population
of sample means for samples of size 2. The new population contains 16 numbers,
and so has a mean and a variance. The mean and variance of this new population
of means are denoted by p~ and 0 s, respectively. The mean of the sample means
is calculated by summing the means in the second column and dividing by 16, the
number of means. That is,
py = 96/16 = 6

Free download pdf