Encyclopedia of Sociology

(Marcin) #1
DESCRIPTIVE STATISTICS

Codes Frequency Percent Relative Cumulative Cumulative
Frequency Frequency Percent
15 and below 20 16.7 .17 20 16.7
16–20 25 20.8 .21 45 37.5
21–25 36 30.0 .30 81 67.5
26–30 20 16.7 .17 101 84.2
31–35 19 15.8 .16 120 100
Total 120 100 1.0

Age Distribution of an Imaginary Sample

Table 1


graphs for single variables are bar graphs, histo-
grams, and stem-and-leaf plots. The bar graph shows
the relative frequency distribution of discrete vari-
ables. A bar is drawn over each category with
height of the bar representing the relative frequen-
cy of observations in that category. The histogram
can be seen as a bar graph for the continuous
variable. By connecting the midpoints of tops of all
bars, a histogram becomes the frequency polygon.
Histograms effectively show the shape of the
distribution.


Stem-and-leaf plots represent each observa-
tion by its higher digit(s) and its lowest digit. The
value of higher digits is the stem while the value of
the final digit of each observation is the leaf. The
stem-and-leaf plot conveys the same information
as the bar graph or histogram. Additionally, it tells
the exact value of each observation. Despite pro-
viding more information than bar graphs and
histograms, stem-and-leaf plots are used mostly for
small data.


Other frequently used graphs include line
graphs, ogives, and scatter plots. Line graphs and
ogives show the relationship between time and the
variable. The line graph usually shows trends. The
ogive is a form of a line graph for cumulative
relative frequency or percentage. It is commonly
used for survival data. The scatter plot shows the
relationship between variables. In a two-dimen-
sional scatter plot, x and y axises label values of the
data. Conventionally, we use the horizontal axis (x-
axis) for the explanatory variable and use the
vertical axis (y-axis) for the outcome variable. The
plain is naturally divided into four areas by two
axises. For continuous variables, the value at the
joint point of two axises is zero. When the x-axis


goes to the right or y-axis goes up, the value
ascends; when the x-axis goes to the left or y-axis
goes down, the value descends. The data points,
determinated by the joint attributes of the vari-
ables, are scattered in four areas or along the axises.

SUMMARY STATISTICS

We may use measures of central tendency and
dispersion to summarize the data. To measure the
central tendency of a distribution is to measure its
center or typicality. To measure the dispersion of a
distribution is to measure its variation, heteroge-
neity, or deviation.

Central Tendency. Three popular measures
of the central tendency are mean, median, and
mode. The arithmetic mean or average is computed
by taking the sum of the values and dividing by the
number of the values. It is the balanced point of
the sample or population weighted by values. Mean
is an appropriate measure for continuous (ratio or
interval) variables. However, the information might
be misleading because the arithmetic mean is sen-
sitive to the extreme value or outliers in a distribu-
tion. For example, the ages of five students are 21,
19, 20, 18, and 20. The ages of another five stu-
dents are 53, 9, 12, 13, and 11. Though their
distributions are very different, the mean age for
both groups is 19.6.
Median is the value or attribute of the central
case in an ordered distribution. If the number of
cases is even, the median is the arithmetic average
of the central two cases. In an ordered age distribu-
tion of thirty-five persons, the median is the age of
the eighteenth person, while, in a distribution of
thirty-six persons, the median is the average age of
Free download pdf