120 Metastatistics for the Non-Bayesian Regression Runner
3.4.7 Reasoning or estimating with Bayes’ rule?
Not surprisingly, Bayes’ rule is viewed differently by Bayesians: it is a multi- (or all-)
purpose tool of reasoning. Consider first the version given by equation (3.4). To fix
ideas, let us consider one example of “Bayesian inference.” In the above notation,
letAibe a specific hypothesis about the world and letBrefer to some “data” that
has somehow come in to our possession. For example,Aimight be the hypothesis
that a coin is fair andBis the fact that you observed a single toss of the coin and it
landed “heads.”Yourjob is to ascertain how you should revise your beliefs in light
of the data.
- The “model” or likelihood for the behavior ofNtosses of a coin is given by the
following likelihood:
L(θ|N,h)=
(
N
h
)
θh( 1 −θ)N−h. (3.6)
As described by Poirier (1995),Lis a “window” by which to view the world –
perhaps an “approximation” to the truth. We might debate what window is
appropriate but, in the usual context, it isn’t something to be “tested” or “eval-
uated.” Moreover, the likelihood is a function which tells us “how likely we
were to have observed the data we did (N,h)” given the truth of the model and
a specific value ofθ. (NB: here the likelihood is a device that tells you, given the
parameterθ, what is the probability of observing the occurrence ofhheads in
Ntosses of a coin.)
Instead of using the coin toss mechanism to help you randomize, you are
going to study the coin (and the mechanism) and learn about it.
- The next step is to specify a prior distribution – one particularlyconvenient
choice is the beta distribution. Priors are subtle things, but let us consider our
beliefs about the value ofθto be describable by the following two parameter
distribution:
f(θ;α,δ)=
(α+δ)
(α)(δ)
θα−^1 ( 1 −θ)δ−^1
=
1
B(α,δ)
θα−^1 ( 1 −θ)δ−^1 , (3.7)
where(·)is the gamma function andB(·)is the beta function. This is a very flex-
ible distribution which can put weight on all values between 0 and 1. Figure 3.1
displays some of the wide variety of shapes the prior distribution can take for
different values ofαandδ.
Different values ofαandδcorrespond to different beliefs. One way to get
some intuition about what type of beliefs the parameters correspond to is to
observe, for example, that the mode of the prior distribution (when it exists)
occurs at:
α− 1
α+δ− 2
.