Basic Statistics

(Barry) #1
142 CATEGORICAL DATA: ANALYSIS OF TWO-WAY FREQUENCY TABLES

11.1 DIFFERENT TYPES OF TABLES

Since the analyses performed and the inferences drawn depend on the type of samples
that were taken, we first discuss tables in terms of the type of study and the sampling
used to obtain the counts.


11.1.1 Tables Based on a Single Sample

A common type of table is based on data from a single sample concerning two vari-
ables. For example, in a hypothetical survey of respondents in their 50s, it was
determined which respondents were current smokers (yes, no) and which had or did
not have low vital capacity (a measure of lung function). Here, smoking could be
considered the exposure or risk variable and low vital capacity the outcome or dis-
ease variable. The total number of respondents is n = 120; the counts are given in
Table 11.1A.
In tables in which observations are from a single sample, the purpose of the analysis
is to determine whether the distribution of results on one variable is the same regardless
of the results on the other variable. For example, in Table 11. IA, is the outcome of
low vital capacity in a person in his or her 50’s the same regardless of whether or
not the person smokes? From a statistical standpoint, the question being asked is
whether low vital capacity is independent of smoking or if there is an association
between smoking and low vital capacity in this age group.
In Table 11.1 A, rows provide information on vital capacity outcomes, and columns
provide information on smoking status. The sums over the two rows are added and
placed in the total row, showing 30 smokers and 90 nonsmokers. Likewise, the sums
over the two columns show the number with and without low vital capacity (21 and
99). The overall sample size can be computed either from the sum of the row totals
or the sum of the column totals. The locations in the interior of the table that include
the frequencies 11, 10, 19, and 80 are called cells.
In Table 1 1. lB, symbols have been used to replace the counts given in Table 1 1.1A.
The total sample size is n; a-tbresponders have low vital capacity and aScresponders
smoke. Only a responders have low vital capacity and also smoke.
In Table 11.1A there is only one sample of size 120. All the frequencies in the
table are divided by n = 120 to obtain the proportions given in Table 11.2. We can
see that .25 or 25% of the patients smoked and .175 or 17.5% had low vital capacity.
About two-thirds of the patients neither smoked nor had low vital capacity. No matter
whether counts or percentages are displayed in the table, it is difficult to see whether
there is any association between smoking and low vital capacity. Descriptive statistics
and tests of hypotheses are thus needed.
Graphs such as the pie charts and bar charts introduced in Section 10.1.1 are often
useful in displaying the frequencies. Separate pie charts for smokers and nonsmokers
would be displayed side by side and the proportion with low vital capacity and normal
vital capacity would be given in each pie. Similarly, bar graphs (see Figure 10.2) for
smokers and nonsmokers could be placed in the same figure. The height of the bars

Free download pdf