Basic Statistics

(Barry) #1
11 4 TESTS OF HYPOTHESES ON POPULATION MEANS

out that if one wants to test whether the difference between two point estimates (say,
two population means) is statistically significant, the method of examining overlap
of confidence limits is conservative. That is, it rejects the null hypothesis less often
than the standard method when the null hypothesis is true and fails to reject the null
hypothesis more frequently than the standard method when the null hypothesis is
false.
Confidence intervals are often used when describing the patients at their entrance
to the study. They are useful when the main interest is in determining how different
the population means are. The questions asked in many studies by epidemiologists
are best answered with confidence intervals (see Rothman [ 19861).
In practice, tests are reported more often because statistical programs tend to
provide them and readers expect to see them. Programs do provide the means and
standard deviations from which confidence intervals can easily be computed, but
they do not all display the actual confidence intervals. Minitab provides confidence
intervals using either z or t. SAS, SPSS, and Stata will give the confidence intervals
using t.


8.7 CORRECTING FOR MULTIPLE TESTING

When we make multiple tests from a single data set, we know that with each test
we reject, we have an a chance of making a type I error. This leaves us with the
uncomfortable feeling that if we make enough tests, our chance that at least one will
be significant is > a. For example, if we roll a die, our chances of getting a 1 is only
1 in 6. But if we roll the die numerous times, our chance of getting a 1 at some time
increases.
Suppose that we know in advance that we will make m tests and perform two-
sided tests. If we want to have an overall chance of making an type I error be
5 a, we compare the computed t values we obtain from our computations or the
computer output with t[l - a/2m] using the usual d.f. instead of t[l - a/2]. For
example, suppose we know that we will make m = 4 tests and want a = .05. Each
test has 20 d.f.’s. In this case, 2m = 8. From a computer program, we obtain
t[l - .05/8] = t[.9938] = 2.748. We can see from Table A.3 that t[.9938] with 20
d.f.’s lies between 2.528 and 2.845, so the t value of 2.748 obtained from a statistical
program seems reasonable. Suppose our four computed t values were 1.54, 2.50,
2.95, and 3.01. If we correct for multiple testing, only the t tests that had values of
2.95 and 3.01 would be significant since they exceed 2.748. Without correcting for
multiple testing, we would also reject the second test with at value of 2.50 (see Table
A.3). To correct for multiple testing using one-sided tests, we compare the computed
t’s with tabled t[l - a/m] instead of tabled t[l - a] values.
This method of correcting for multiple testing is perfectly general and works for any
type of test. It is called the Bonferroni correction for multiple testing. The Bonferroni
correction also works for confidence intervals. Here, for a two-sided interval with 0
known, we use z[l -a/2m] instead of z[1- a/2] when we compute the m confidence
intervals. For CT unknown, we use t[l - a/2m] instead of t[l - a/2] in computing

Free download pdf