Pattern Recognition and Machine Learning

72 2. PROBABILITY DISTRIBUTIONS

μ

a=0. 1 b=0. 1

0 0.5 1

0

1

2

3

μ

a=1 b=1

0 0.5 1

0

1

2

3

μ

a=2 b=3

0 0.5 1

0

1

2

3

μ

a=8 b=4

0 0.5 1

0

1

2

3

Figure 2.2 Plots of the beta distributionBeta(μ|a, b)given by (2.13) as a function ofμfor various values of the
hyperparametersaandb.

wherel=N−m, and therefore corresponds to the number of ‘tails’ in the coin example. We see that (2.17) has the same functional dependence onμas the prior distribution, reflecting the conjugacy properties of the prior with respect to the like- lihood function. Indeed, it is simply another beta distribution, and its normalization coefficient can therefore be obtained by comparison with (2.13) to give

p(μ|m, l, a, b)=

Γ(m+a+l+b) Γ(m+a)Γ(l+b)

μm+a−^1 (1−μ)l+b−^1. (2.18)

We see that the effect of observing a data set ofmobservations ofx=1and lobservations ofx=0has been to increase the value ofabym, and the value of bbyl, in going from the prior distribution to the posterior distribution. This allows us to provide a simple interpretation of the hyperparametersaandbin the prior as aneffective number of observationsofx=1andx=0, respectively. Note that aandbneed not be integers. Furthermore, the posterior distribution can act as the prior if we subsequently observe additional data. To see this, we can imagine taking observations one at a time and after each observation updating the current posterior

Pattern Recognition and Machine Learning

72 2. PROBABILITY DISTRIBUTIONS

Get our desktop app

Company

Features

Documentation

Resources