Confidence Limits on m 1 – m 2
In addition to testing a null hypothesis about population means (i.e., testing H 0 : m 1 2 m 25 0),
and stating an effect size, it is useful to set confidence limits on the difference between
m 1 and m 2. The logic for setting these confidence limits is exactly the same as it was for the one-
sample case. The calculations are also exactly the same except that we use the difference
between the means and the standard error of differencesbetween means in place of the mean
and the standard error of the mean. Thus for the 95% confidence limits on m 1 2 m 2 we have
For the homophobia study we have
The probability is .95 that an interval computed as we computed this interval encloses
the difference in arousal to homosexual videos between homophobic and nonhomophobic
participants. Although the interval is wide, it does not include 0. This is consistent with our
rejection of the null hypothesis, and allows us to state that homophobic individuals are, in
fact, more sexually aroused by homosexual videos than are nonhomophobic individuals.
However, I think that we would be remiss if we simply ignored the width of this interval.
While the difference between groups is statistically significant, there is still considerable
uncertainty about how large the difference is. In addition, keep in mind that the dependent
variable is the “degree of sexual arousal” on an arbitrary scale. Even if your confidence
interval were quite narrow, it is difficult to know what to make of the result in absolute terms.
To say that the groups differed by 7.5 units in arousal is not particularly informative. Is that a
big difference or a little difference? We have no real way to know, because the units (mm of
penile circumference) are not something that most of us have an intuitive feel for. But when
we standardize the measure, as we will in the next section, it is often more informative.
Effect Size
The confidence interval that we just calculated has shown us that we still have considerable
uncertainty about the difference in sexual arousal between groups, even though our statisti-
cally significant difference tells us that the homophobic group actually shows more arousal
than the nonhomophobic group. Again we come to the issue of finding ways to present in-
formation to our readers that conveys the magnitude of the difference between our groups.
We will use an effect size measure based on Cohen’s d. It is very similar to the one that we
used in the case of two dependent samples, where we divide the difference between the
means by a standard deviation. We will again call this statistic d. In this case, however, our
standard deviation will be the estimated standard deviation of either population. More
specifically, we will pool the two variances and take the square root of the result, and that
will give us our best estimate of the standard deviation of the populations from which the
numbers were drawn.^12 (If we had noticeably different variances, we would most likely use
the standard deviation of one sample and note to the reader that this is what we had done.)
1.46...(m 1 2m 2 )...13.54
=7.50 6 2.00(3.018)=7.5 6 6.04
CI.95=(X 12 X 2 ) 6 t.025 sX 12 X 2 =(24.00 2 16.5) 6 2.00
B
144.48
35
1
144.48
29
CI.95=(X 12 X 2 ) 6 t.025 sX 12 X 2
Section 7.5 Hypothesis Tests Applied to Means—Two Independent Samples 209
(^12) Hedges (1982) was the one who first recommended stating this formula in terms of statistics with the pooled
estimate of the standard deviation substituted for the population value. It is sometimes referred to as Hedges’ g.