In Chapter 7 we considered the Levene (1960) test for heterogeneity of variance, and
I mentioned a similar test by O’Brien (1981). The Levene test is essentially a t test on the
deviations (absolute or squared) of observations from their sample mean or median. If one
group has a larger variance than another, then the deviations of scores from the mean or
median will also, on average, be larger than for a group with a smaller variance. Thus, a
significant t test on the absolute values of the deviations represents a test on group vari-
ances. Both Levene’s test and O’Brien’s test can be readily extended to the case of more
than two groups in obvious ways. The only difference is that with multiple groups the t test
on the deviations would be replaced by an analysis of variance on those deviations. There
is evidence to suggest that the Levene test is the weaker of the two, but it is the one tradi-
tionally reported by most statistical software. Wilcox (1987b) reports that this test appears
to be conservative.
If you are not willing to ignore the existence of heterogeneity or nonnormality in your
data, there are alternative ways of handling the problems that result. Many years ago Box
(1954a) showed that with unequal variances the appropriate Fdistribution against which to
compare is a regular Fwith altered degrees of freedom. If we define the true critical
value of F(adjusted for heterogeneity of variance) as , then Box has proven that
In other words, the true critical value of Flies somewhere between the critical value of
Fon 1 and (n 2 1) dfand the critical value of Fon (k 2 1) and k(n 2 1) df. This latter limit
is the critical value we would use if we met the assumptions of normality and homogeneity
of variance. Box suggested a conservative test by comparing to. If this
leads to a significant result, then the means are significantly different regardless of the
equality, or inequality, of variances. (For those of you who raised your eyebrows when I
cavalierly declared the variances in Eysenck’s study to be “close enough,” it is comforting
to know that even Box’s conservative approach would lead to the conclusion that the
groups are significantly different: , whereas our obtained Fwas 9.08.)
The only difficulty with Box’s approach is that it is extremely conservative. A different
approach is one proposed by Welch (1951), which we will consider in the next section, and
which is implemented by much of the statistical software that we use.
Wilcox (1987b) has argued that, in practice, variances frequently differ by more than a
factor of four, which is often considered a reasonable limit on heterogeneity. He has some
strong opinions concerning the consequences of heterogeneity of variance. He recom-
mends Welch’s procedure with samples having different variances, especially when the
sample sizes are unequal. Tomarken and Serlin (1986) have investigated the robustness and
power of Welch’s procedure and the procedure proposed by Brown and Forsythe (1974).
They have shown Welch’s test to perform well under several conditions. The Brown and
Forsythe test also has advantages in certain situations. The Tomarken and Serlin paper is a
good reference for those concerned with heterogeneity of variance.
The Welch Procedure
Kohr and Games (1974) and Keselman, Games, and Rogan (1979) have investigated al-
ternative approaches to the treatment of samples with heterogeneous variances (including
the one suggested by Box) and have shown that the procedure proposed by Welch (1951)
has considerable advantages in terms of both power and protection against Type I errors,
at least when sampling from normal populations. The formulae and calculations are some-
what awkward, but not particularly difficult, and you should use them whenever a test,
such as Levene’s, indicates heterogeneity of variance—especially when you have unequal
sample sizes.
F.05(1, 9)=5.12
Fobt Fa(1, n 2 1)
Fa(1, n 2 1)ÚF¿aÚFa 3 k 2 1, k(n 2 1) 4
F¿a
Fobt
Section 11.8 Violations of Assumptions 335