128 Metastatistics for the Non-Bayesian Regression Runner
There exist many discussions of this problem. Our discussion follows Sober
(2002), who poses a problem involving a deck of cards with 52 different types
of cards. Suppose 5 cards are randomly drawn from a typical 52-card deck. Call the
configuration of cards that resultsX. We are now going to use data onXto revise
our beliefs about various theories of the world.
Two theories that can “explain”Xinclude:
1.Theory A. The particular 5 cards were randomly drawn from a deck of 52 cards.
2.Theory B. A powerful demon intervened to ensure that the configurationAwas
drawn.
The essence of Bayesian analysis requires calculating the likelihood of observing
Xif theoryAis true and calculating the likelihood of observingXif theoryBis
true. Your actual priors aren’t particularly important, but assume thatP(A)>0 and
P(B)>0, although the probability you attach to them can be small.
The problem arises because the likelihood of the second (silly) theory is higher
in the second (false) theory than in the first (true) theory. Since there are
2,598,960 different 5-card hands that can result:
P[X|A]= 1 /2, 598, 960
P[X|B]=1.
Regardless of your prior beliefs aboutAorB, whatever you believed before,
equation (3.9) instructs you to increase the “weight” you give to the demon hypoth-
esis! (Of course, your posterior density might assign little weight toB, but our
interest is merely in the fact that the “experiment” induces you to give more weight
than you did before toB.) If we continued drawing 5-card hands, and continued
to elaborate our demon hypothesis after the fact, we could in principle move you
even closer to believing that hypothesis!
If that example strikes you as fanciful, consider a more familiar example, usually
called the “optional stopping” problem. To fix ideas, imagine being interested in
whether some normally distributed variable (with a known variance of 1) has a
mean of zero or otherwise.
- Take a sample of size 100 and do the usual non-Bayesian hypothesis test in the
manner suggested by Kmenta earlier. In this case computez=
∑N
√i=^1 Xi
N
.
- Continue sampling until|z|≥k0.05orN=1000, whichever comes first, where
k0.05is the appropriate 5% critical value.
As the non-Bayesian knows, the first procedure provides a far more reliable indi-
cator that the mean is zero than the second test. With the sampling size fixed in
advance, if|z|turns out to be greater than the appropriate critical value, the usual
conclusion is that either the null is false or “something surprising happened.”
Under the second DGP, the probability of Type I error is 53% (see Mayo and Kruse,
2002).