CHAPTER 14
Inference for Categorical Data: Chi-square
IN THIS CHAPTER
Summary: In this final chapter, we will look at inference for categorical variables. Up until now, we
have studied primarily data analysis and inference only for one or two numerical variables (those whose
outcomes we can measure), and proportions, which are categorical variable with only two possible
values (success and failure). The material in this chapter opens up for us a wide range of research topics,
allowing us to compare categorical variables across several values. For example, we will ask new
questions like, “Is there an association between gender and political party preference?”
Key Ideas
Chi-Square Goodness-of-Fit Test
Chi-Square Test for Independence
Chi-Square Test for Homogeneity of Proportions (Populations)
χ^2 versus Z 2
Chi-Square Goodness-of-Fit Test
The following are the approximate percentages for the different blood types among white Americans: A:
40%; B: 11%; AB: 4%; O: 45%. A random sample of 1000 black Americans yielded the following blood
type data: A: 270; B: 200; AB: 40; O: 490. Does this sample provide evidence that the distribution of
blood types among black Americans differs from that of white Americans, or could the sample values
simply be due to sampling variation? This is the kind of question we can answer with the chi-square
goodness-of-fit test . (“Chi” is the Greek letter χ ; chi-square is, logically enough, χ 2 .) With the chi-
square goodness- of-fit test, we note that there is one categorical variable (blood type) and one
population (black Americans). In this chapter we will also encounter a situation in which there is one