A Caution
When Welch, Satterthwaite, Behrens, and Fisher developed tests on means that are not de-
pendent on homogeneous variances they may not have been doing us as much of a favor as
we think. Venables (2000) pointed out that such a test “gives naive users a cozy feeling of
protection that perhaps their test makes sense even if the variances happen to come out
wildly different.” His point is that we are often so satisfied that we don’t have to worry
about the fact that the variances are different that indeed we often don’t worry about the
fact that variances are different. That sentence may sound circular, but we really shouldpay
attention to unequal variances. It is quite possible that the variances are of more interest
than the means in some experiments. For example, it is entirely possible that a study com-
paring family therapy with cognitive behavior therapy for treatment of anorexia could
come out with similar means but quite different variances. In that situation perhaps we
should focus on the thought that one therapy might be very effective for some people and
very ineffective for others, leading to a high variance. Venables also points out that if one
treatment produces a higher mean than another that may not be of much interest if it also
has a high variance and is thus unreliable. Finally, Venables pointed out that we are all
happy and comfortable with the fact that we can now run a ttest without worrying overly
much about heterogeneity of variance. However, when we come to the analysis of variance
in Chapter 11 we will not have such a correction and, as a result we will happily go our
way acting as if the lack of equality of variances is not a problem.
I am not trying to suggest that people ignore corrections for heterogeneity of variance.
I think that they should be used. But I think that it is even more important to consider what
those different variances are telling us. They may be the more important part of the story.
7.8 Hypothesis Testing Revisited
In Chapter 4 we spent time examining the process of hypothesis testing. I pointed out that
the traditional approach involves setting up a null hypothesis, and then generating a statis-
tic that tells us how likely we are to find the obtained results if, in fact, the null hypothesis
is true. In other words we calculate the probability of the data given the null, and if that
probability is very low, we reject the null.
In that chapter we also looked briefly at a proposal by Jones and Tukey (2000) in which
they approached the problem slightly differently. Now that we have several examples, this
is a good point to go back and look at their proposal. In discussing Adams et al.’s study of
homophobia I suggested that you think about how Jones and Tukey would have approached
the issue. I am not going to repeat the traditional approach, because that is laid out in each
of the examples of how to write up our results.
The study by Adams et al. (1996) makes a good example. I imagine that all of us would
be willing to agree that the null hypothesis of equal population means in the two conditions
is highly unlikely to be true. Even laying aside the argument about differences in the
10th decimal place, it just seems unlikely that people who differ appreciably in terms of
homophobia would show exactly the same mean level of arousal to erotic videos. We may
not know which group will show the greater arousal, but one population mean is certain to
be larger than the other. So we can rule out the null hypothesis (H 0 : mH– mN 5 0) as a vi-
able possibility. That leaves us with three possible conclusions we could draw as a result of
our test. The first is that mH,mN, the second is that mH.mN, and the third is that we do
not have sufficient evidence to draw a conclusion.
Now let’s look at the possibilities of error. It could actually be that mH,mN, but that
we draw the opposite conclusion by deciding that the nonhomophobic participants are
216 Chapter 7 Hypothesis Tests Applied to Means