Encyclopedia of Sociology

(Marcin) #1
DESCRIPTIVE STATISTICS

absolute values of the deviation of the observa-
tions from the mean. As standard deviation, MAD
can avoid the problem that the sum of the devia-
tion from the mean is zero, but it is not as useful in
statistical inference as variance and standard
deviation.


Bivariate Relationship. One may use the co-
variance and correlation coefficients to measure the
direction and size of a relationship between two
variables. The covariance is defined as the average
product of the deviation from the mean between
two variables. It also reports the extent to which
the variables may vary together. On average, while
one variable deviates one unit from the mean, the
covariance tells the extent to which the corre-
sponding value of the other variable may deviate
from its own mean. A positive covariance suggests
that, while the value of one variable increases, that
of the other variable tends to increase. A negative
covariance suggests that, while the value of one
variable increases, that of the other variable tends
to decrease. The correlation coefficient is defined
as the ratio of the covariance to the product of the
standard deviations of two variables. It can also be
seen as a covariance rescaled by the standard
deviation of both variables. The value of the corre-
lation coefficient ranges from −1 to 1, where zero
means no correlation, −1 means perfectly nega-
tively related, and 1 means perfectly positively
related. The covariance and correlation are meas-
ures of the bivariate relationship between continu-
ous variables. Many measures of association be-
tween categorical variables are calculated using
cell frequencies or percentages in the cross-tabula-
tion, for example, Yule’s Q, phi, Goodman’s tau,
Goodman’s gamma, and Somer’s d. Though meas-
ures of association alone show the direction and
size of a bivariate relationship, it is statistical infer-
ence to test the existence of such a relationship.


RELATIONSHIPS BETWEEN GRAPHS AND
SUMMARY STATISTICS

The box plot is a useful tool to summarize the
statistics and distribution. The box plot is consist-
ed of a rectangular divided box and two extended
lines attached to the ends of the box. The ends of
the box define the upper and lower quartiles. The
range of the distribution on each side is shown by
an extended line attached to each quartile. A line


dividing the box shows the median. The plot can
be placed vertically or horizontally. The box plot
became popular because it can express the center
and spread of the data simultaneously. Several
boxes may be placed next to one another for
comparison.

The order of mode, median, and mean is
related to the shape of the distribution of a con-
tinuous variable. If mean, median, and mode are
equal to each other, the shape of the histogram
approximates a bell curve. However, a uniform
distribution, in which all cases are equally distrib-
uted among all values and three measures of the
central tendency are equal to each other, has a
square shape with the width as the range and the
height as the counts or relative frequency. In a
bimodal distribution, two modes are placed in two
ends of the distribution equally distanced from the
center where the median and the mean are placed.
We seldom see the true bell-curved, uniform, and
bimodal distributions. Most of the distributions
are more or less skewed to the left or to the right. If
the mean is greater than median and the median is
greater than mode, the shape is skewed to the
right. If the mean is smaller than the median, and
the median is smaller than the mode, the shape is
skewed to the left. The outliers mainly lead the
direction.

The shape and direction of the scatter plot can
diagnose the relationship of two variables. When
the distribution directs from the upper-right side
to the lower-left side, the correlation coefficient is
positive; when it directs from the upper-left side to
the lower-right side, the correlation coefficient is
negative. The correlation of a loosely scattered
plot is weaker than that of a tightly scattered plot.
A three-dimensional scatter plot can be used to
show a bivariate relationship and its frequency
distribution or a relationship of three variables.
The former is commonly seen as a graph to exam-
ine a joint distribution.

Descriptive statistics is the first step in study-
ing the data distribution. In omitting this step, one
might misuse the advanced methods and thus be
led to wrong estimates and conclusions. Some
summary statistics such as standard deviation, vari-
ance, mean, correlation, and covariance, are also
essential elements in statistical inference and ad-
vanced statistical methods.
Free download pdf