Encyclopedia of Sociology

(Marcin) #1
DESCRIPTIVE STATISTICS

Sex 15 and below 16–20 21-25 26–30 31-25 Total
Male 9 12 19 9 8 57
45.0% 48.0% 52.8% 45.0% 42.1% 47.5%
15.8% 21.1% 33.3% 15.8% 14.0%
Female 11 13 17 11 11 63
55.0% 52.0% 47.2% 55.0% 57.9% 52.5%
17.5% 20.6% 27.0% 17.5% 17.5%
20 25 36 20 19 120
Total 16.7% 20.8% 30.0% 16.7% 15.8% 100%

Age

Table 2


the seventeenth and eighteenth persons. The me-
dian, like mean, can only tell the value of the
physical center in an array of numbers, but cannot
tell the dispersion. For example, the median of 21,
30, 45, and 100 is 27.5 and the median of 0, 27, 28,
and 29 is also 27.5, but the two distributions are
different. The mode is the most common value,
category, or attribute in a distribution. Like the
median, the mode has its limitations. For a set of
values of 0, 2, 2, 4, 4, 4, 4, 5, and 10, the mode is
four. For a set of values of 0, 0, 1, 4, 4, 4, 5, and 6,
the mode is also four. One cannot tell one distribu-
tion from the other simply by examining the mode
or median alone. The mode and median can be
used to describe the central tendency of both
continuous and discrete variables, and values of
mode and median are less affected by the extreme
value or the outlier than the mean.


One may also use upper and lower quartiles
and percentiles to measure the central tendency.
The n percentile is a number such that n percent of
the distribution falls below it and (100−n) percent
falls above it. The lower quartile is the twenty-fifth
percentile, the upper quartile is the seventy-fifth
percentile, and the median is the fiftieth percen-
tile. For example, the lower quartile or the twenty-
fifth percentile is two and the upper quartile or the
seventy-fifth percentile is seven for a set of values
of 1, 2, 3, 4, 5, 6, 7, and 8. Apparently, the upper
and lower quartiles and the percentiles can pro-
vide more information about a distribution than
the other measures of the central tendency.


Dispersion. The central tendency per se does
not provide much information on the distribu-
tion. Yet the combination of measures of central
tendency and dispersion becomes useful to study a


distribution. The most popular measures of dis-
persion are range, standard deviation, and variance.
Range is the crude measure of a distribution from
the highest value to the lowest value or the differ-
ence between the highest and the lowest values.
For example, the range for a set of values of 1, 2, 3,
4, and 5 is one to five. The range is sensitive to the
extreme value and may not provide sufficient in-
formation about the distribution. Alternatively,
the dispersion can be measured by the distance
between the mean and each value. The standard
deviation is defined as the square root of the
arithmetic mean of the squared deviation from the
mean. For example, the standard deviation for a
set of values of 1, 2, 3, 4, and 5 is 1.44. We take the
square root of the squared deviation from the
mean because the sum of the deviation from the
mean is always zero. The variance is the square of
the standard deviation. The variance is two in the
previous array of numbers. The standard devia-
tion is used as a standardized unit in statistical
inference. Comparing with standard deviation,
the unit of the variance is not substantively mean-
ingful. It is, however, valuable to explain the rela-
tionship between variables. Mathematically, the
variance defines the area of the normal curve
while the standard deviation defines the average
distance between the mean and each data point.
Since they are derived from the distance from the
mean, standard deviation and variance are sensi-
tive to the extreme values.

The interquartile range (IQR) and mean absolute
deviation (MAD) are also commonly used to meas-
ure the dispersion. The IQR is defined as the
difference between the first and third quartiles. It
is more stable than the range. MAD is the average
Free download pdf