Palgrave Handbook of Econometrics: Applied Econometrics

(Grace) #1

132 Metastatistics for the Non-Bayesian Regression Runner


Consider the following two tests:


Test 1 RejectH 0 ⇐⇒ h=0, 4 Size = 0.1935 Power = 1−0.3438
Test 2 RejectH 0 ⇐⇒ h= 0 Size = 0.1785 Power = 1−0.3439

Given the set-up, the most powerful test of size 0.1935 is Test1–itis(slightly) more
powerful than Test 2. But preferring Test 1 clearly doesn’t make sense: if one sees
all heads, it is surely more likely thatH 0 is true, yet Test 1 instructs you to reject.
Mayo’s solution is to observe that Test 1 fails to use anappropriatetest statistic –
one that measures how well the data “fits” the hypothesis. Even though one is
searching for tests of size 0.1935 or better with the most “power,” one chooses Test
1 at the cost of a nonsensical test statistic. Theusualsort of test statistic might be
the fraction of heads (F) less 0.35. Such a statistic has the property of punishing
the hypothesis in a sensible way.
In this case, the test statistic takes on the following values:


#Heads F− 0. 35
0 −0.35
1 −0.10
2 0.15
3 0.40
4 0.65

In this account, Test 2 corresponds to the decision rule “Reject ifF−0.35<−0.1”
and the outcomes are now ordered by their departure from the null (in the direction
of the alternative). The use of an appropriate sense of “fit” serves to show that the
probabilitiesper seare not important – they don’t directly correspond to a measure
of belief. Rather, they are one step in assessing how good the test is at revealing
an “error” (Mayo, 2003). The theory doesn’t tell you in most non-trivial cases,
however, how to generate a sensible test statistic – that depends on context.
While this example is admittedly superficial, it helps explain why, in con-
structing a good experiment, the importance of other (possibly not well defined)
alternatives cannot be ignored. Howseverea test is is always relative to some other
possible alternatives. Suppose we collect data on unaided eyesight and the use of
corrective glasses or contact lenses. If one proposed to “test” the theory that eye
glass wearingcausedunaided eyesight to get worse and found a “significant” rejec-
tion of the null of no correlation in favor of the alternative that the correlation was
negative the “p-value” might be small but it would fail to be aseveretest against
the hypothesis that people with poor uncorrected vision are more likely to wear
eye glasses.


3.5.6 Randomization and severity


One place where Bayesians and non-Bayesians differ is on the usefulness of
randomization. Here, we can only introduce the problem.

Free download pdf