Pattern Recognition and Machine Learning

(Jeff_L) #1
476 10. APPROXIMATE INFERENCE

We now consider a variational distribution which factorizes between the latent
variables and the parameters so that

q(Z,π,μ,Λ)=q(Z)q(π,μ,Λ). (10.42)

It is remarkable that this is theonlyassumption that we need to make in order to
obtain a tractable practical solution to our Bayesian mixture model. In particular, the
functional form of the factorsq(Z)andq(π,μ,Λ)will be determined automatically
by optimization of the variational distribution. Note that we are omitting the sub-
scripts on theqdistributions, much as we do with thepdistributions in (10.41), and
are relying on the arguments to distinguish the different distributions.
The corresponding sequential update equations for these factors can be easily
derived by making use of the general result (10.9). Let us consider the derivation of
the update equation for the factorq(Z). The log of the optimized factor is given by

lnq(Z)=Eπ,μ,Λ[lnp(X,Z,π,μ,Λ)]+const. (10.43)

We now make use of the decomposition (10.41). Note that we are only interested in
the functional dependence of the right-hand side on the variableZ. Thus any terms
that do not depend onZcan be absorbed into the additive normalization constant,
giving

lnq(Z)=Eπ[lnp(Z|π)] +Eμ,Λ[lnp(X|Z,μ,Λ)]+const. (10.44)

Substituting for the two conditional distributions on the right-hand side, and again
absorbing any terms that are independent ofZinto the additive constant, we have

lnq(Z)=

∑N

n=1

∑K

k=1

znklnρnk+const (10.45)

where we have defined

lnρnk = E[lnπk]+

1

2

E[ln|Λk|]−

D

2

ln(2π)


1

2

Eμk,Λk

[
(xn−μk)TΛk(xn−μk)

]
(10.46)

whereDis the dimensionality of the data variablex. Taking the exponential of both
sides of (10.45) we obtain

q(Z)∝

∏N

n=1

∏K

k=1

ρznknk. (10.47)

Requiring that this distribution be normalized, and noting that for each value ofn
Exercise 10.12 the quantitiesznkare binary and sum to 1 over all values ofk, we obtain


q(Z)=

∏N

n=1

∏K

k=1

rznknk (10.48)
Free download pdf