3 Population covariances for all pairs of treatments are the same. Heterogeneity of
variances and covariances is common in studies involving binary repeated measures.
Interactions between subjects and treatment conditions would indicate heterogeneity
of covariance. Severe heterogeneity introduces a positive bias to the Type I error rate
and if this is suspected, Q should be corrected. (See Myers, et al., 1982). In most cases
the procedure described by Myers et al. should be followed:
a Evaluate the Q statistic against a conservative criterion, ( means
Chi-square with 1 df and c is number of columns). For example, with alpha of 5 per
cent and with four columns in a contingency table, we would evaluate Q against the
conservative critical value of If the test statistic Q
exceeds the critical value of 11.52 we would reject the null hypothesis.
b If the first conservative test is not significant (this allows for covariance) at the same
alpha level compare the test statistic Q with a If this liberal
criterion is not significant, do not reject the null hypothesis and stop at this point.
c If the first test against the conservative critical value is not significant but the second
test against the more liberal critical value is, consider computing an adjusted Q (see
Myers, 1975) and test this against χ^2 with (c−1) df.
Example from the Literature
In a comparative study of four diagnostic systems (majority opinion of medical
specialists, and three computer-based diagnostic systems) Gustafson et al. (1973) tested
the null hypothesis that all four diagnostic methods were equally effective. Each
diagnosis for eleven hypothyroid patients was coded as correct, 1, or incorrect, 0.
Alpha was set to 5 per cent and the null hypothesis was that all four methods were
equally effective in diagnosing hypothyroid patients. The alternative hypothesis was that
the four methods differ in their ability to produce a correct diagnosis. There were 44
(11×4) cells in the original contingency table but this was reduced to 28 (7×4) because 4
subjects had either all 1’s or 0’s. These rows of 1’s or 0’s do not contribute to the value of
Q (Q would be the same if all eleven subjects were included in the calculation). As the
obtained number of cells, 28, exceeds 24 the χ^2 approximation is valid. The obtained test
statistic Q was 7.70, which is not significant at the 5 per cent level,
As Q has an approximate χ^2 distribution with c−1 df, A Q (or χ^2 ) value ≥7.81 is required
to reject the null hypothesis. The investigators were able to conclude that there was no
significant difference in the diagnostic performance of the four methods.
Worked Example
Returning to the example of the vocabulary acquisition experiment which was part of a
student’s dissertation study, the complete data set for eleven pupils is presented in Figure
6.7.
Treatments (Measurement occasions)
Inferences involving binomial and nominal count data 197