Basic Statistics

(Barry) #1
152 CATEGORICAL DATA: ANALYSIS OF TWO-WAY FREQUENCY TABLES

Table 11.6 Association between Smoking and Vital Capacity: Expected Frequencies


Smoking
Low Vital Capacity Yes No Total
Yes ll(5.25) lO(15.75) 21
No 19(24.75) gO(74.25) 99
Total 30 90 120

from the observed frequency, square this difference, and then divide it by the expected
frequency. These are summed over all the cells to obtain x’. We have


all cells
(observed - expected)’
x2= c expected

Chi-square will serve as a measure of how different the observed frequencies are
from the expected frequencies; a large value of x’ indicates lack of agreement, a
small value of x’ indicates close agreement between what was expected under HO
and what actually occurred. In the example,

(11 - 5.25)’ (10 - 15.75)’ (19 - 24.75)’ (80 - 74.25)2
74.24

+
+ 15.75 + 24.75

x’ =
5.25
or

5.75’ (-5.75)’ (-5.75)’ 5.752
+-


  • 5.25 + 15.75 + 24.75 74.25


= 6.298 + 2.099 + 1.336 + .445 = 10.178


The value of chi-square computed from the particular experiment is thus 10,178, and
on the basis of 10.178 we must decide whether or not the null hypothesis is true.
If the experiment were repeated over and over, the chi-square calculated would vary
from one time to the next. The values of chi-square thus have a sampling distribution,
just as does any sample statistic. The distribution of the values of chi-square is of
the general shape pictured in Figure 1 1.1, a skewed distribution, with, of course, no
values of x’ below 0. A x’ value > 3.84 is expected to occur 5% of the time.
The necessary distribution has been tabled in Table A.4, and by using this table we
may find P approximately. (If HO is true, P is the probability of obtaining a value of
chi-square at least as large as 10.178.) In Table A.4, the first column is headed d.f.
(degrees of freedom). The area under the x’ curve from 0 to x’ [A] is listed across the
top of Table A.4 and the values of x’ are given in the body of the table. The shape of
the distribution of chi-square differs for different degrees of freedom. The d.f.’s for
the problem may be found by counting the number of independent cell frequencies
in Table 11.6. That is, we count the number of cells that could be filled in arbitrarily
and still keep all the same totals as in the table.

Free download pdf