Pattern Recognition and Machine Learning

(Jeff_L) #1
10.2. Illustration: Variational Mixture of Gaussians 477

where
rnk=


ρnk
∑K

j=1

ρnj

. (10.49)

We see that the optimal solution for the factorq(Z)takes the same functional form
as the priorp(Z|π). Note that becauseρnkis given by the exponential of a real
quantity, the quantitiesrnkwill be nonnegative and will sum to one, as required.
For the discrete distributionq(Z)we have the standard result
E[znk]=rnk (10.50)


from which we see that the quantitiesrnkare playing the role of responsibilities.
Note that the optimal solution forq(Z)depends on moments evaluated with respect
to the distributions of other variables, and so again the variational update equations
are coupled and must be solved iteratively.
At this point, we shall find it convenient to define three statistics of the observed
data set evaluated with respect to the responsibilities, given by


Nk =

∑N

n=1

rnk (10.51)

xk =

1

Nk

∑N

n=1

rnkxn (10.52)

Sk =

1

Nk

∑N

n=1

rnk(xn−xk)(xn−xk)T. (10.53)

Note that these are analogous to quantities evaluated in the maximum likelihood EM
algorithm for the Gaussian mixture model.
Now let us consider the factorq(π,μ,Λ)in the variational posterior distribu-
tion. Again using the general result (10.9) we have


lnq(π,μ,Λ)=lnp(π)+

∑K

k=1

lnp(μk,Λk)+EZ[lnp(Z|π)]

+

∑K

k=1

∑N

n=1

E[znk]lnN

(
xn|μk,Λ−k^1

)
+const. (10.54)

We observe that the right-hand side of this expression decomposes into a sum of
terms involving onlyπtogether with terms only involvingμandΛ, which implies
that the variational posteriorq(π,μ,Λ)factorizes to giveq(π)q(μ,Λ). Further-
more, the terms involvingμandΛthemselves comprise a sum overkof terms
involvingμkandΛkleading to the further factorization


q(π,μ,Λ)=q(π)

∏K

k=1

q(μk,Λk). (10.55)
Free download pdf