against. In this case, the value of is given by Welch (1938) and
Satterthwaite (1946) as
where
and are the corresponding degrees of freedom. For our example,
Rounding to the nearest integer gives 5 57. Thus, our Fis distributed on (g 2 1, ) 5
(2, 57) dfunder. For 2 and 57 df, F.05 5 3.16. Only the difference at Interval 1 is signif-
icant. By the end of 30 minutes, the three groups were performing at equivalent levels. It is
logical to conclude that somewhere between the first and the sixth interval the three groups
become nonsignificantly different, and many people test at each interval to find that point.
However, I strongly recommend against this practice as a general rule. We have already run
a number of significance tests, and running more of them serves only to increase the error
rate. Unless there is an important theoretical reason to determine the point at which the
group differences become nonsignificant—and I suspect that there are very few such
cases—then there is nothing to be gained by testing each interval. Tests should be carried
out to answer important questions, not to address idle curiosity or to make the analysis look
“complete.”
Multiple Comparisons
Several studies have investigated the robustness of multiple-comparison procedures for
testing differences among means on the within-subjects variable. Maxwell (1980) studied a
simple repeated-measures design with no between-subject component and advised adopt-
ing multiple-comparison procedures that do not use a pooled error term. We discussed such
a procedure (the Games-Howell procedure) in Chapter 12. (I did use a pooled error term in
the analysis of the migraine study, but there it was reasonable to assume homogeneity of
variance and I was using all of the weeks. If I had only been running a contrast involving
three of the weeks, I would seriously consider calculating an error term based on just the
data from those weeks.)
Keselman and Keselman (1988) extended Maxwell’s work to designs having one be-
tween-subject component and made a similar recommendation. In fact, they showed that
when the Groups are of different sizes and sphericity is violated, familywise error rates can
become very badly distorted. In the simple effects procedures that we have just considered,
I recommended using separate error terms by running one-way repeated-measures analy-
ses for each of the groups. For subsequent multiple-comparison procedures exploring those
simple effects, especially with unequal sample sizes, it would probably be wise to employ
H 0
f¿ f¿
f¿=
(384,722.03 1 281,199.34)^2
384,722.03^2
21
1
281,199.34^2
105
=56.84
v=281,199.34 dfv= 105
u=384,722.03 dfu= 21
dfu and dfv
v=SSI 3 Ss w/in groups
u=SSSs w/in groups
f¿=
(u 1 v)^2
u^2
dfu
1
v^2
dfv
Fobt F.05(a 2 1, f¿) f¿
482 Chapter 14 Repeated-Measures Designs