Pattern Recognition and Machine Learning

Exercises 129

2.7 ( ) Consider a binomial random variablexgiven by (2.9), with prior distribution forμgiven by the beta distribution (2.13), and suppose we have observedmoccur- rences ofx=1andloccurrences ofx=0. Show that the posterior mean value ofx lies between the prior mean and the maximum likelihood estimate forμ. To do this, show that the posterior mean can be written asλtimes the prior mean plus(1−λ) times the maximum likelihood estimate, where 0 λ 1. This illustrates the con- cept of the posterior distribution being a compromise between the prior distribution and the maximum likelihood solution.

2.8 ( ) Consider two variablesxandywith joint distributionp(x, y). Prove the following two results

E[x]=Ey[Ex[x|y]] (2.270) var[x]=Ey[varx[x|y]] + vary[Ex[x|y]]. (2.271)

HereEx[x|y]denotes the expectation ofxunder the conditional distributionp(x|y), with a similar notation for the conditional variance.

2.9 ( ) www. In this exercise, we prove the normalization of the Dirichlet distribution (2.38) using induction. We have already shown in Exercise 2.5 that the beta distribution, which is a special case of the Dirichlet forM=2, is normalized. We now assume that the Dirichlet distribution is normalized forM− 1 variables and prove that it is normalized forMvariables. To do this, consider the Dirichlet distribution overMvariables, and take account of the constraint

∑M k=1μk=1by eliminatingμM, so that the Dirichlet is written

pM(μ 1 ,...,μM− 1 )=CM

M∏− 1

k=1

μαkk−^1

( 1 −

M∑− 1

j=1

μj

)αM− 1 (2.272)

and our goal is to find an expression forCM. To do this, integrate overμM− 1 , taking care over the limits of integration, and then make a change of variable so that this integral has limits 0 and 1. By assuming the correct result forCM− 1 and making use of (2.265), derive the expression forCM.

2.10 ( ) Using the propertyΓ(x+1) =xΓ(x)of the gamma function, derive the
following results for the mean, variance, and covariance of the Dirichlet distribution
given by (2.38)

E[μj]=

αj α 0

(2.273)

var[μj]=

αj(α 0 −αj) α^20 (α 0 +1)

(2.274)

cov[μjμl]=−

αjαl α^20 (α 0 +1)

,j =l (2.275)

whereα 0 is defined by (2.39).

Pattern Recognition and Machine Learning

(2.273)

(2.274)

Get our desktop app

Company

Features

Documentation

Resources