Robert_V._Hogg,_Joseph_W._McKean,_Allen_T._Craig

(Jacob Rumans) #1
4.7. Chi-Square Tests 289

Fair Red Medium Dark Black Margin
Blue 1368 170 1041 398 1 2978
Light 2577 474 2703 932 11 6697
Medium 1390 420 3826 1842 33 7511
Dark 454 255 1848 2506 112 5175
Margin 5789 1319 9418 5678 157 22361

The table indicates that hair and eye color are dependent random variables. For
example, the observed frequency of children with blue eyes and black hair is 1
while the expected frequency under independence is 2978× 157 /22361 = 20.9. The
contribution to the test statistic from this one cell is (1− 20 .9)^2 / 20 .9=19. 95
that nearly exceeds the test statistic’sχ^2 critical value at level 0.05, which is
qchisq(.95,12) = 21.026.Theχ^2 -test statistic for independence is tedious to
compute and the reader is advised to use a statistical package. For R, assume
that the contingency table without margin sums is in the matrixscotteyehair.
Then the codechisq.test(scotteyehair)returns theχ^2 test statistic and the
p-value as: X-squared = 3683.9, df = 12, p-value < 2.2e-16.Thusthere-
sult is highly significant. Based on this study, hair color and eye color of Scottish
children are dependent on one another. To investigate where the dependence is the
strongest in a contingency table, we recommend considering the table of expected
frequencies and the table ofPearson residuals. The later are the square roots
(with the sign of the numerators) of the summands in expression (4.7.2) defining the
test statistic. The sum of the squared Pearson residuals equals theχ^2 -test statistic.
In R, the following code obtains both of these items:
fit = chisq.test(scotteyehair); fit$expected; fit$residual
Based on running this code, the largest residual is 32.8 for the cell dark hair and
dark eyes. The observed frequency is 2506 while the expected frequency under
independence is 1314.


In each of the four examples of this section, we have indicated that the statistic
used to test the hypothesisH 0 has an approximate chi-square distribution, provided
thatnis sufficiently large andH 0 is true. To compute the power of any of these tests
for values of the parameters not described byH 0 , we need the distribution of the
statistic whenH 0 is not true. In each of these cases, the statistic has an approximate
distribution called anoncentral chi-square distribution. The noncentral chi-
square distribution is discussed later in Section 9.3.


EXERCISES

4.7.1.Consider Example 4.7.2. Suppose the observed frequencies ofA 1 ,...,A 4
are 20, 30, 92, and 105, respectively. Modify the R code given in the example to
calculate the test for these new frequencies. Report thep-value.

4.7.2.A number is to be selected from the interval{x:0<x< 2 }by a random
process. LetAi={x:(i−1)/ 2 <x≤i/ 2 },i=1, 2 ,3, and letA 4 ={x:
3
2 <x<^2 }.Fori=1,^2 ,^3 ,4, suppose a certain hypothesis assigns probabilities
pi 0 to these sets in accordance withpi 0 =


Ai(

1
2 )(2−x)dx, i=1,^2 ,^3 ,4. This
Free download pdf