Basic Statistics

(Barry) #1
CHI-SQUARE TESTS FOR LARGER TABLES 161

11.4.3 Chi-square Test for Larger Tables; More Than Two Samples or
Outcomes

We can also analyze larger tables that have more than two samples or treatments
andor more than two outcomes. The method of computing chi-square is precisely
the same as that just given for the case of a single sample. The d.f.’s are also the
same, namely, (T - l)(c - 1). The null hypothesis is that the proportion of cases
in the outcome categories in the population are the same regardless of which sample
they were in (or treatment received).
If the P value is significant, the results should be interpreted. It is often useful to
look at the results by treatment group and examine what outcomes occur frequently
or infrequently, depending on which group they are in. Visually, bar graphs can be
displayed for each group. The use of bar graphs is often the simplest way to interpret
the results, and this option is available in many statistical programs. Alternatively,
the proportions in each group can be displayed in a two-way table.


11.4.4 Necessary Sample Size for Large Tables


The likelihood of having expected values too small in some cells to obtain an accurate
estimate of the P value from Table A.4 tends to increase as the number of rows and
columns in a table increases. Even if the overall sample size is large, it is possible
that one or more rows or columns may contain very few individuals. For example,
in biomedical applications some symptoms or test results can occur very rarely. If a
row or column total is small, often at least some of the cells in that row or column
will have small expected values. Note that in computing the expected value for a cell,
we multiply the row total by the column total that the cell falls in and divide by the
overall n. If, for example, a row total was only 1, the expected value would be the
column total divided by n, which surely is < 1.
When there are large tables (> 1 d.f.), a few cells having an expected value of
about 1 can be tolerated. The total sample size should be at least four or five times the
number of cells (see Wickens [1989]). The rules vary from author to author. From a
practical standpoint it does not seem sensible to have such small numbers that shifting
one answer from one cell to another will change the conclusion.
When faced with too-small expected values in large tables, the most common
practice is to combine a row or column that has a small total with another row or
column. For example, if the survey regarding health status and access to medical
care had been performed on college students, we would probably have very few
students who stated that they had poor health. If that happened, we might combine
the categories of poor and fair health to get a larger row total. The choice of what to
combine also depends on the purpose of the study. Rows or columns should not be
combined unless the resulting category is sensible. It is sometimes better to compute
an inaccurate chi-square than to disregard meaningful results. Whenever this is done,
the reader of the results should be warned that the chi-square may be quite inaccurate.
Usually, combinations of rows and columns can be found such that the resulting
categories are worth analyzing.

Free download pdf