Introductory Biostatistics

(Chris Devlin) #1
groups. However, the mean can be so expressed. If component groups are
of sizesn 1 andn 2 and have meansx 1 andx 2 respectively, the mean of the
combined group is


n 1 x 1 þn 2 x 2
n 1 þn 2


  1. In large data sets, the median requires more work to calculate than the
    mean and is not much use in the elaborate statistical techniques (it is still
    useful as a descriptive measure for skewed distributions).


A third measure of location, themode, was introduced briefly in Section
2.1.3. It is the value at which the frequency polygon reaches a peak. The mode
is not used widely in analytical statistics, other than as a descriptive measure,
mainly because of the ambiguity in its definition, as the fluctuations of small
frequencies are apt to produce spurious modes. For these reasons, in the
remainder of the book we focus on a single measure of location, the mean.


2.2.3 Measures of Dispersion


When the meanxof a set of measurements has been obtained, it is usually a
matter of considerable interest to measure the degree of variation or dispersion
around this mean. Are thex’s all rather close tox, or are some of them dis-
persed widely in each direction? This question is important for purely descrip-
tive reasons, but it is also important because the measurement of dispersion or
variation plays a central part in the methods of statistical inference described in
subsequent chapters.
An obvious candidate for the measurement of dispersion is therange R,
defined as the di¤erence between the largest value and the smallest value, which
was introduced in Section 2.1.3. However, there are a few di‰culties about use
of the range. The first is that the value of the range is determined by only two
of the original observations. Second, the interpretation of the range depends in
a complicated way on the number of observations, which is an undesirable
feature.
An alternative approach is to make use ofdeviationsfrom the mean,xx;
it is obvious that the greater the variation in the data set, the larger the magni-
tude of these deviations will tend to be. From these deviations, thevariance s^2
is computed by squaring each deviation, adding them, and dividing their sum
by one less thann:


s^2 ¼

P


ðxxÞ^2
n 1

The use of the divisor (n1) instead ofnis clearly not very important whenn
is large. It is more important for small values ofn, and its justification will be
explained briefly later in this section. The following should be noted:


NUMERICAL METHODS 77
Free download pdf