Pattern Recognition and Machine Learning

476 10. APPROXIMATE INFERENCE

We now consider a variational distribution which factorizes between the latent variables and the parameters so that

q(Z,π,μ,Λ)=q(Z)q(π,μ,Λ). (10.42)

It is remarkable that this is theonlyassumption that we need to make in order to obtain a tractable practical solution to our Bayesian mixture model. In particular, the functional form of the factorsq(Z)andq(π,μ,Λ)will be determined automatically by optimization of the variational distribution. Note that we are omitting the sub- scripts on theqdistributions, much as we do with thepdistributions in (10.41), and are relying on the arguments to distinguish the different distributions. The corresponding sequential update equations for these factors can be easily derived by making use of the general result (10.9). Let us consider the derivation of the update equation for the factorq(Z). The log of the optimized factor is given by

lnq(Z)=Eπ,μ,Λ[lnp(X,Z,π,μ,Λ)]+const. (10.43)

We now make use of the decomposition (10.41). Note that we are only interested in the functional dependence of the right-hand side on the variableZ. Thus any terms that do not depend onZcan be absorbed into the additive normalization constant, giving

lnq(Z)=Eπ[lnp(Z|π)] +Eμ,Λ[lnp(X|Z,μ,Λ)]+const. (10.44)

Substituting for the two conditional distributions on the right-hand side, and again absorbing any terms that are independent ofZinto the additive constant, we have

lnq(Z)=

∑N

n=1

∑K

k=1

znklnρnk+const (10.45)

where we have defined

lnρnk = E[lnπk]+

1

2

E[ln|Λk|]−

D

2

ln(2π)

−

1

2

Eμk,Λk

[ (xn−μk)TΛk(xn−μk)

] (10.46)

whereDis the dimensionality of the data variablex. Taking the exponential of both sides of (10.45) we obtain

q(Z)∝

∏N

n=1

∏K

k=1

ρznknk. (10.47)

Requiring that this distribution be normalized, and noting that for each value ofn
Exercise 10.12 the quantitiesznkare binary and sum to 1 over all values ofk, we obtain

q(Z)=

∏N

n=1

∏K

k=1

rznknk (10.48)

Pattern Recognition and Machine Learning

476 10. APPROXIMATE INFERENCE

1

2

D

2

−

1

2

Get our desktop app

Company

Features

Documentation

Resources