Pattern Recognition and Machine Learning

(Jeff_L) #1
Exercises 129

2.7 ( ) Consider a binomial random variablexgiven by (2.9), with prior distribution
forμgiven by the beta distribution (2.13), and suppose we have observedmoccur-
rences ofx=1andloccurrences ofx=0. Show that the posterior mean value ofx
lies between the prior mean and the maximum likelihood estimate forμ. To do this,
show that the posterior mean can be written asλtimes the prior mean plus(1−λ)
times the maximum likelihood estimate, where 0 λ 1. This illustrates the con-
cept of the posterior distribution being a compromise between the prior distribution
and the maximum likelihood solution.

2.8 ( ) Consider two variablesxandywith joint distributionp(x, y). Prove the follow-
ing two results

E[x]=Ey[Ex[x|y]] (2.270)
var[x]=Ey[varx[x|y]] + vary[Ex[x|y]]. (2.271)

HereEx[x|y]denotes the expectation ofxunder the conditional distributionp(x|y),
with a similar notation for the conditional variance.

2.9 ( ) www. In this exercise, we prove the normalization of the Dirichlet dis-
tribution (2.38) using induction. We have already shown in Exercise 2.5 that the
beta distribution, which is a special case of the Dirichlet forM=2, is normalized.
We now assume that the Dirichlet distribution is normalized forM− 1 variables
and prove that it is normalized forMvariables. To do this, consider the Dirichlet
distribution overMvariables, and take account of the constraint

∑M
k=1μk=1by
eliminatingμM, so that the Dirichlet is written

pM(μ 1 ,...,μM− 1 )=CM

M∏− 1

k=1

μαkk−^1

(
1 −

M∑− 1

j=1

μj

)αM− 1
(2.272)

and our goal is to find an expression forCM. To do this, integrate overμM− 1 , taking
care over the limits of integration, and then make a change of variable so that this
integral has limits 0 and 1. By assuming the correct result forCM− 1 and making use
of (2.265), derive the expression forCM.

2.10 ( ) Using the propertyΓ(x+1) =xΓ(x)of the gamma function, derive the
following results for the mean, variance, and covariance of the Dirichlet distribution
given by (2.38)


E[μj]=

αj
α 0

(2.273)

var[μj]=

αj(α 0 −αj)
α^20 (α 0 +1)

(2.274)

cov[μjμl]=−

αjαl
α^20 (α 0 +1)

,j =l (2.275)

whereα 0 is defined by (2.39).
Free download pdf