Logistic Regression: A Self-learning Text, Third Edition (Statistics in the Health Sciences)

Bonferroni approach:
To achievea*a 0 , seta¼a 0 /T

EXAMPLE e.g.,a 0 ¼0.05,T¼10: a¼0.05/10¼0.005 + a*¼ 1 (10.005)^10 ¼0.49a 0 ¼0.05

Problem with Bonferroni:

Over-adjusts: does not reject
enough- low power (model may
be underfitted)

Bonferroni-type alternatives avail-
able to:

Increase power
Allow for nonindependent tests

Another approach:

Replaces FWER with
False Discovery Rate (FDR)¼T 0 /T,
where

T 0 ¼no. of tests incorrectly
rejected, i.e.,H 0 itrue
T¼total no. of tests

Criticisms of multiple testing:

(1) Assuminguniversal H 0 :all
H 0 itrue unrealistic

(2) Paying a “penalty for peeking”
reduces importance of specific
tests of interest

(3) Where do you stop correcting
for multiple-testing?

A popular (Bonferroni) approach for insuring thata* never exceeds a desired FWER of, say, a 0 is to require the significance level (a) for each test to bea 0 /T. To illustrate, ifa 0 ¼0.05 andT¼10, thena¼0.005, anda* calculates to 0.049, close to 0.05.

A problem, however, with using the Bonferroni approach is that it “over-adjusts” by making it more difficult to reject any givenH 0 i; that is, its “power” to reject true alternative hypotheses is typically too low.

Alternative formulae for adjusting for multiple- testing(e.g.,Sidak,1967;Holm,1979;Hochberg, 1988) have been offered to provide increased power and to allow for nonindependent significance tests.

Moreover, another adjustment approach (Benjamini and Hochberg, 1995) replaces the “overall” goal of adjustment from obtaining a desired “family-wise error rate” (FWER) to obtaining a desired “false discovery rate” (FDR), which is defined as the proportion of the number of significance tests that incorrectly reject the null (i.e., truly Type 1 errors).

Nevertheless, there remains some controversy in the methodologic literature (Rothman, 1990) as to whether any attempt to correct for multiple-testing is even warranted. Criticisms of “adjustment” include (1) the assumption of a “universal” null hypothesis that allH 0 iare non significant is unrealistic (2) paying a “penalty for peeking” (Light and Pillemer, 1984) reduces the importance of specific contrasts of interest; (3) where does the need for adjustment stop when considering all the tests that an individ- ual researcher performs?

Presentation: VI. Multiple Testing 281

Logistic Regression: A Self-learning Text, Third Edition (Statistics in the Health Sciences)

Get our desktop app

Company

Features

Documentation

Resources