Basic Statistics

(Barry) #1

54 MEASURES OF LOCATION AND VARIABILITY


It should also be noted that knowing the mean and standard deviation does not tell
us everything there is to know about a distribution. The figures shown in Figure 4.5
(a) and (c) may have the same mean and standard deviation; nevertheless, they look
quite different. Examining numerical descriptions of a data set should always be
accompanied by examining graphical descriptions. Note that the mean, variance, and
standard deviation given here are used for continous data. In Section 5.4, information
is given for statistics that should be used depending on the type of data.
In statistical programs, the user is easily able to obtain the statistics described
above, plus other measures.


5.2.2 Other Measures of Variability


Some useful measures of variation are based on starting with ordered values for the
variable under study. The simplest of these to obtain is the range, which is computed
by taking the largest value minus the smallest value, that is, X, - XI. Unlike the
standard deviation, the range tends to increase as the sample size increases. The more
observations you have, the greater the chance of very small or large values. The range
is a useful descriptive measure of variability and has been used in certain applications,
such as quality control work in factories. It has mainly been used where repeated
small samples of the same size are taken. It is easy to compute for a small sample,
and if the sample sizes are equal, the problem of interpretation is less. The range is
commonly included in the descriptive statistics output of statistical programs, so it is
readily available.
In many articles, the authors include the smallest and largest values in the tabled
output along with the mean since they want to let the reader know the limits of their
measurements. Sometimes, unfortunately, they fail to report the standard deviation.
The range can be used to gain a rough approximation to the standard deviation. If
the sample size is very large, say > 500, simply divide the range by 6. If the sample
size is 100-499 observations, divide the range by 5; if it is 15-99 observations, divide
the range by 4; if it is 8-14, divide the range by 3; if the sample size is 3-7, divide
by 2; and if it is a range of 2 numbers, divide the range by 1.1. (Note that more
detailed estimates can be obtained from Table A-8b( 1) in Dixon and Massey [ 19831.)
Such estimates assume that the data follow a particular distribution called the normal
distribution, which is discussed in Chapter 6.
If outliers are suspected in the data set, the range is a poor measure of the spread
since it is computed from the largest and smallest values in the sample. Here, an outlier
is defined as an observation that differs appreciably from other observations in the
sample. Outliers can simply be errors of measurement or of recording data, or they
can be the result of obtaining an observation that is not from the same population as
the other observations. In any case, using a measure of variation that is computed
from largest and smallest values is risky unless outliers do not exist or have been
removed before computing the range.
A safer procedure is to obtain a type of range that uses observations that are not the
largest and smallest values. The interquartile range (IQR) is one that is commonly
used. The interquartile range is defined as IQR = Q3 - Q1. Three quartiles divide

Free download pdf