Encyclopedia of Sociology

(Marcin) #1
DESCRIPTIVE STATISTICS

population are parameters. Descriptive statistics
are usually used to describe the characteristics of a
sample. The procedure and methods to infer the
statistics to parameters are the statistical infer-
ence. Descriptive statistics do not include statisti-
cal inference.


Though descriptive statistics are usually used
to examine the distribution of single variables,
they may also be used to measure the relationship
of two or more variables. That is, descriptive statis-
tics may refer to either univariate or bivariate
relationship. Also, the level of the measurement of
a variable, that is, nominal, ordinal, interval, and
ratio level, can influence the method chosen.


DATA DISTRIBUTION

To describe a set of data effectively, one should
order the data and examine the distribution. An
eyeball examination of the array of small data is
often sufficient. For a set of large data, the aids of
tables and graphs are necessary.


Tabulation. The table is expressed in counts
or rates. The frequency table can display the distri-
bution of one variable. It lists attributes, catego-
ries, or intervals with the number of observations
reported. Data expressed in the frequency distri-
bution are grouped data. To examine the central
tendency and dispersion of large data, using
grouped data is easier than using ungrouped data.
Data usually are categorized into intervals that are
mutually exclusive. One case or data point falls
into one category only. Displaying frequency dis-
tribution of quantitative or continuous variables
by intervals is especially efficient. For example, the
frequency distribution of age in an imaginary sam-
ple can be seen in Table 1.


Here, age has been categorized into five intervals,
i.e., 15 and below, 16–20, 21–25, 26–30, and 31–
35, and they are mutually exclusive. Any age falls
into one category only. This display is very effi-
cient for understanding the age distribution in our
imaginary sample. The distribution shows that
twenty cases are aged fifteen or younger, twenty-
five cases are sixteen to twenty years old, thirty-six
cases are twenty-one to twenty-five years old, twen-
ty cases are twenty-six to thirty years old, and
nineteen cases are thirty-one to thirty-five years
old. To compare categories or intervals and to


compare various samples or populations, the re-
porting percent or relative frequency of each cate-
gory is important. The third column shows the
percent of sample in each interval or category. For
example, 30 percent of the sample falls into the
range of twenty-one to twenty-five years old. The
fourth column shows the proportion of observa-
tion for each interval or category. The proportion
was called relative frequency. The cumulative fre-
quency, the cumulative percent, and the cumula-
tive relative frequency are other common ele-
ments in frequency tabulation. They are the sum
of counts, percents, or proportions below or equal
to the corresponding category or interval. For
instance, the cumulative frequency of age thirty
shows 101 persons or 84.2 percent of the sample
age thirty or younger.

The frequency distribution displays one vari-
able at a time. To study the joint distribution of two
or more variables, we cross-tabulate them first. For
example, the joint distribution of age and sex in
the imaginary sample can be expressed in Table 2.

This table is a two-dimensional table: age is the
column variable and sex is the row variable. We
call this table a ‘‘two-by-five’’ table: two categories
for sex and five categories for age. The marginal
frequency can be seen as the frequency distribu-
tion of the corresponding variables. For example,
there are fifty seven men in this sample. The
marginal frequency for age is called column frequen-
cy and the marginal frequency for sex is called row
frequency. The joint frequency of age and sex is cell
frequency. For example, there are seventeen wom-
en twenty-one to twenty-five years old in this sam-
ple. The second number in each cell is column
percentage; that is, the cell frequency divided by the
column frequency and times 100 percent. For
example, 47 percent in the group of twenty-one to
twenty-five year olds are women. The third num-
ber in each cell is row percentage; that is, the cell
frequency divided by the row frequency. For ex-
ample, 27 percent of women are twenty-one to
twenty-five years old. The marginal frequency can be
seen as the frequency distribution of the corre-
sponding variables. The row and column percent-
ages are useful in examining the distribution of on
variable conditioning on the other variable.

Charts and Graphs. Charts and graphs are
efficient ways to show data distribution. Popular
Free download pdf