column variable describes from which population an observation comes, and the row
variable is a categorical response (measured) variable.
The research hypothesis is that the distribution of proportions for one categorical
variable, the row variable, is different in the k populations (the other categorical column
variable). A more general way of stating this research hypothesis is to say that the
proportions differ across the two groups. To test the null hypothesis we compare
observed cell counts with the expected cell counts under the assumption that there is no
difference in proportions.
For the χ^2 test of independence the null hypothesis is that the two categorical variables
are independent. The alternative hypothesis is that there is a relationship between the row
and column variables. No matter which sampling design is used, it is essential that each
subject appears in only one cell (is counted once) in the contingency table.
Test Assumptions
In general these are the same as for the two sample χ^2 test (see section 6.1) except that at
least one of the categorical variables has three or more categories.
Example from the Literature
In a study which investigated aspects of maternal employment and child health care,
Dowswell and Hewison (1995) examined the relationship between maternal employment,
full-time, part-time or at home, and educational achievements, no examinations, CSEs or
O-levels. In this study a sample of mothers was selected from a target population
consisting of white families where the youngest child was attending school. The
statistical analysis used by the investigators was a Chi-square test of independence, since
a single sample of mothers was selected.
If, however, we were to assume that samples were selected from three independent
populations based on maternal employment categories this sampling design would
require an r×k χ^2 test of homogeneity (3-sample design). Whichever sampling design is
used, computational details are the same only the nature of the inferences differ. Data was
set out in a contingency table as follows:
Maternal employment
At home Part-time Full-time
n=41 n=65 n=30
No examinations 19 32 10
Mothers’ educational achievements
CSEs 9 10 5
O-levels or above (^1323 15)
In this example the column variable represents the three populations of maternal
employment (full-time, part-time or at home) and the row variable (examination
achievement) is the categorical response variable. A research question that could have
been considered by the investigators was whether the distribution of responses for the
variable examination achievement was the same in the three populations. A homogeneity
χ^2 test is appropriate.
Inferences involving binomial and nominal count data 193