Palgrave Handbook of Econometrics: Applied Econometrics

(Grace) #1

130 Metastatistics for the Non-Bayesian Regression Runner


The usual set-up begins with a “null hypothesis” and an “alternative hypothesis.”
Hypotheses can be simple or composite: an example of a simple hypothesis is
“The population mean of a binomially distributed random variable is 0.5.” That
is, we can completely characterize the distribution of the random variable under
the hypothesis. A “composite” hypothesis is a hypothesis that does not completely
characterize the distribution of the random variable. An example of such a hypoth-
esis is “The population mean of a binomially distributed variable is greater than
0.5.” In addition to a set of “maintained hypotheses” (“the experimental appara-
tus is working correctly”), the next step is specifying atest statistic.In the usual
hypothesis testing procedure, the distribution of this test statistic under the null
hypothesis is known.
There are many ways to demonstrate that the probabilities that are used in
hypothesis testing do not represent the probability that some hypothesis is true. A
distinction that is sometimes made is “before trial” and “after trial” views of power
and size. The following example comes from Hacking (1965).
Consider two hypotheses, a null (H 0 ) and an alternative (H 1 ), which are the only
two possible states of the world. LetE 1 ,E 2 ,E 3 ,E 4 be the four possible outcomes
and let the following be true about the world:


P(E 1 ) P(E 2 ) P(E 3 ) P(E 4 )
H 0 : 0 0.01 0.01 0.98
H 1 : 0.01 0.01 0.97 0.01

We are interested in two tests,RandS, and specifically the power and size of the
tests. Let the size of a test be the probability of incorrectly rejecting the null when
it is true, and let the power of the test be 1 less the probability of Type II error
(not rejectingH 0 when it is false). For tests of a given size, more powerful tests are
“better.” The caveat about “a given size” is necessary since we can always minimize
size by deciding on a rule that always rejects.


Before trial
Size Power
TestR RejectH 0 if and only ifE 3 occurs 0.01 0.97
TestS RejectH 1 if and only ifE 1 orE 2 occurs 0.01 0.02

If one takes a naive view of “power” and “size” of tests, the example is problem-
atic. The size of both tests are the same, but testRis much more powerful – much
less likely to fail to reject the null when it is false.Before the trial, we would surely
pick testR.
What aboutafterthe trial? Consider the case whenE 1 occurs. In that case test
Rinstructs us to “accept” the null whenafterthe trial weknowwith complete
certainty that the null is false. The standard “evasion” of the problem for non–
Bayesians is to observe (as Hacking, 1965, and Mayo, 1979, observe), that this is
not a test that would usually be countenanced since there exist uniformly more
powerful tests thanR. This evasion, however, does not get to the heart of the
problem.

Free download pdf