Basic Statistics

(Barry) #1
156 CATEGORICAL DATA: ANALYSIS OF TWO-WAY FREQUENCY TABLES

The computed chi-square has a chi-square distribution with 1 d.f. If we decided to use
a = .05, we would reject the null hypothesis since the computed value of chi-square
(7.26) is greater than the value (3.84) in Table A.4. In the population, we appear to
have more instances where the case was exposed to the risk factor and the control
was not than when the control was exposed and the case was not.
This test is commonly called McNemar’s test. Note that when computing McNe-
mar’s test, we do not use a = 59 or d = 10 in the computation. This mirrors the
results in Section 11.1.3, where we showed that the differences in the proportions did
not depend on the ties for paired data.


11.3.4 Assumptions for the Chi-square Test

To use the chi-square distribution for testing hypotheses from two-way frequency
table data (single sample or two sample), we need to make several assumptions. Note
these assumptions apply to any size table. One assumption is that we have either a
simple random sample from a single population or two simple random samples from
two populations. Second, within each sample, the outcomes are distributed in an
identical fashion. For example, if we have a sample of patients, we are assuming that
the chance of a successful treatment is the same for all the patients. These assumptions
may not be completely met in practice.
For the matched sample chi-square test, we have to assume that a simple random
sample of pairs has been taken. Further, the sample size must be large enough to justify
using the chi-square distribution. This is discussed in Section 11.3.5 for two-by-two
tables and in Section 1 1.4.4 for larger tables.


11.3.5 Necessary Sample Size: Two-by-Two Tables

The results given in Table A.4 for the chi-square distribution are a satisfactory approx-
imation for testing hypotheses only when the expected frequencies are of sufficient
size. There is some difference of opinion on the needed size of the expected values
for the single-sample and the two-sample cases. For tables with two rows and two
columns, many authors say that no expected frequency should be < 5. Wicken [ 19891
says that all expected values should be > 2 or 3 and list additional conditions. Small
expected frequencies occur either when the overall sample size is small or when one
of the rows or columns has very few observations in one of the row or column totals.
For example, if a disease is rare, a prospective study where patients are followed until
they get the disease may find very few diseased subjects.
Fortunately, there is a test that is widely available in statistical programs that can
be used for tables with two rows and two columns when the expected frequencies are
less than the recommended size. The test is called Fisher’s exact test. An explanation
of this test is beyond the scope of this book (see Agresti [1996], Fleiss [1981], and
particularly Wickens [ 19891 on the pros and cons of using this test).
Fisher’s exact test is widely used when the expected value in any cell is small. We
recommend using a statistical program to perform the test since it takes considerable
effort to do it otherwise. One difference between Fisher’s exact test and the chi-

Free download pdf