Bonferroni approach:
To achievea*a 0 , seta¼a 0 /T
EXAMPLE
e.g.,a 0 ¼0.05,T¼10:
a¼0.05/10¼0.005
+
a*¼ 1 (10.005)^10
¼0.49a 0 ¼0.05
Problem with Bonferroni:
Over-adjusts: does not reject
enough- low power (model may
be underfitted)
Bonferroni-type alternatives avail-
able to:
Increase power
Allow for nonindependent tests
Another approach:
Replaces FWER with
False Discovery Rate (FDR)¼T 0 /T,
where
T 0 ¼no. of tests incorrectly
rejected, i.e.,H 0 itrue
T¼total no. of tests
Criticisms of multiple testing:
(1) Assuminguniversal H 0 :all
H 0 itrue unrealistic
(2) Paying a “penalty for peeking”
reduces importance of specific
tests of interest
(3) Where do you stop correcting
for multiple-testing?
A popular (Bonferroni) approach for insuring
thata* never exceeds a desired FWER of, say,
a 0 is to require the significance level (a) for
each test to bea 0 /T. To illustrate, ifa 0 ¼0.05
andT¼10, thena¼0.005, anda* calculates
to 0.049, close to 0.05.
A problem, however, with using the Bonferroni
approach is that it “over-adjusts” by making it
more difficult to reject any givenH 0 i; that is, its
“power” to reject true alternative hypotheses is
typically too low.
Alternative formulae for adjusting for multiple-
testing(e.g.,Sidak,1967;Holm,1979;Hochberg,
1988) have been offered to provide increased
power and to allow for nonindependent signifi-
cance tests.
Moreover, another adjustment approach
(Benjamini and Hochberg, 1995) replaces the
“overall” goal of adjustment from obtaining a
desired “family-wise error rate” (FWER) to
obtaining a desired “false discovery rate”
(FDR), which is defined as the proportion of
the number of significance tests that incor-
rectly reject the null (i.e., truly Type 1 errors).
Nevertheless, there remains some controversy
in the methodologic literature (Rothman,
1990) as to whether any attempt to correct for
multiple-testing is even warranted. Criticisms
of “adjustment” include (1) the assumption of a
“universal” null hypothesis that allH 0 iare non
significant is unrealistic (2) paying a “penalty
for peeking” (Light and Pillemer, 1984) reduces
the importance of specific contrasts of interest;
(3) where does the need for adjustment stop
when considering all the tests that an individ-
ual researcher performs?
Presentation: VI. Multiple Testing 281