124 Metastatistics for the Non-Bayesian Regression Runner
to make a statement about apopulationfrom a “random sample”:
Preamble State the maintained hypothesis [for example, the random
variableXis normally distributed withσ^2 equal to...].
Step 1 State the null hypothesis and the alternative hypothesis [for
example,H 0 :μ=μ 0 andHA:μ=μ 0 ].
Step 2 Select the test statistics [for example,Xbased on sample size
n=...].
Step 3 Determine the distribution of the test statistic under the null
hypothesis [for example,
√
n(X−μ 0 )/σis distributedN(0, 1)–
normal, with mean zero and variance 1].
Step 4 Choose the level of significance and determine the acceptance
and the rejection region [for example,“do not rejectH 0 if−1.96≤
√
n(X−σμ^0 )≤1.96; otherwise reject it”].
Step 5 Draw a sample and evaluate the results [for example,“the value
ofXis...which lies inside (outside) the acceptance region”].
Step 6 Reach a conclusion [for example,“the sample does (does not) pro-
vide evidence against the null hypothesis”]. To distinguish between
5% and 1% levels of significance we may add the word “strong”
before “evidence” when using the 1% level.
It will be worth noting Kmenta’s observations about the procedure: “According to
the above scheme, the planning of the test and the decision strategy are setbefore
the actual drawing of the sample observations, which does not occur until step 5.
This prevents rejudging the verdict to suit the investigator’s wishes.”
This observation comes up frequently in non-Bayesian discourse, but less fre-
quently among Bayesians: does the investigator want to ensure him/herself against
“rejudging the verdict?” Perhaps they should “rejudge the verdict?” As we will see,
this points to a notion ofseverityas being primary, as opposed to merely a con-
cern about the correctness of the various statistical tests (although the two are not
unrelated).
3.5.2 The introductory puzzle revisited
With this in mind, we can now reintroduce the puzzle. Specifically, the puzzle
arises because, by using some variant of the above procedure, under one experi-
ment observing 9 black of 12 balls allows one torejectthe null hypothesis; in the
other, observing 9 black of 12 balls would not permit the researcher to reject the
null. Some Bayesians point to this example as evidence of a flaw in non-Bayesian
reasoning: why should what is “locked up in the head” of the researcher – his/her
intentions about what he/she was going to do – matter? In both cases, he/she has
the same “data.” This problem appears in many guises: in clinical trials there is a
debate about what should be done if, for example, “early” evidence from a trial
suggests that a drug is effective. The non-Bayesian response is that the Bayesian
view misconstrues the purpose of error probabilities.