deviation— —where iand jrepresent the ith subject in the jth group. He then
proposed running a standard two-sample t test on the s. This test makes intuitive sense, be-
cause if there is greater variability in one group, the absolute, or squared, values of the devia-
tions will be greater. If t is significant, we would then declare the two groups to differ in their
variances. Alternative approaches have been proposed; see, for example, O’Brien (1981), but
they are rarely implemented in standard software, and I will not elaborate on them here.
The procedures just described are suggested as replacements for the more traditional F
test, which is a ratio of the larger sample variance to the smaller. This Fhas been shown by
many people to be severely affected by nonnormality of the data, and should not be used.
The Ftest is still computed and printed by many of the large computer packages, but I do
not recommend using it.
The Robustness of t with Heterogeneous Variances
I mentioned that the t test is what is described as robust,meaning that it is more or less unaf-
fected by moderate departures from the underlying assumptions. For the t test for two inde-
pendent samples, we have two major assumptions and one side condition that must be
considered. The two assumptions are those of normality of the sampling distribution of dif-
ferences between means and homogeneity of variance. The side condition is the condition of
equal sample sizes versus unequal sample sizes. Although we have just seen how the prob-
lem of heterogeneity of variance can be handled by special procedures, it is still relevant to
ask what happens if we use the standard approach even with heterogeneous variances.
Box (1953), Norton (1953), Boneau (1960), and many others have investigated the ef-
fects of violating, both independently and jointly, the underlying assumptions of t. The gen-
eral conclusion to be drawn from these studies is that for equal sample sizes, violating the
assumption of homogeneity of variance produces very small effects—the nominal value of
a 5 .05 is most likely within 6 0.02 of the true value of a. By this we mean that if you set
up a situation with unequal variances but with trueand proceed to draw (and compute t
on) a large number of pairs of samples, you will find that somewhere between 3% and 7%
of the sample t values actually exceed. This level of inaccuracy is not intolerable.
The same kind of statement applies to violations of the assumption of normality, provided
that the true populations are roughly the same shape or else both are symmetric. If the dis-
tributions are markedly skewed (especially in opposite directions), serious problems arise
unless their variances are fairly equal.
With unequal sample sizes, however, the results are more difficult to interpret. I would
suggest that whenever your sample sizes are more than trivially unequal you employ the
Welch–Satterthwaite approach. You have little to lose and potentially much to gain.
The investigator who has collected data that she thinks may violate one or more of the
underlying assumptions should refer to the article by Boneau (1960). This article may be
old, but it is quite readable and contains an excellent list of references to other work in the
area. A good summary of alternative procedures can be found in Games, Keselman, and
Rogan (1981).
Wilcox (1992) has argued persuasively for the use of trimmed samples for comparing
group means with heavy-tailed distributions. (Interestingly, statisticians seem to have a
fondness for trimmed samples, whereas psychologists and other social science practition-
ers seem not to have heard of trimming.) He provides results showing dramatic increases
in power when compared to more standard approaches. Alternative nonparametric ap-
proaches, including “resampling statistics” are discussed in Chapter 18 of this book. These
can be very powerful techniques that do not require unreasonable assumptions about the
populations from which you have sampled. I suspect that resampling statistics and related
procedures will be in the mainstream of statistical analysis in the not too-distant future.
6 t.025
H 0
dij
dij=(Xij 2 Xj)^2
Section 7.7 Heterogeneity of Variance: The Behrens–Fisher Problem 215
robust