Pattern Recognition and Machine Learning

(Jeff_L) #1
478 10. APPROXIMATE INFERENCE

Identifying the terms on the right-hand side of (10.54) that depend onπ,wehave

lnq(π)=(α 0 −1)

∑K

k=1

lnπk+

∑K

k=1

∑N

n=1

rnklnπk+const (10.56)

where we have used (10.50). Taking the exponential of both sides, we recognize
q(π)as a Dirichlet distribution

q(π)=Dir(π|α) (10.57)

whereαhas componentsαkgiven by

αk=α 0 +Nk. (10.58)

Finally, the variational posterior distributionq(μk,Λk)does not factorize into
the product of the marginals, but we can always use the product rule to write it in the
formq(μk,Λk)=q(μk|Λk)q(Λk). The two factors can be found by inspecting
(10.54) and reading off those terms that involveμkandΛk. The result, as expected,
Exercise 10.13 is a Gaussian-Wishart distribution and is given by


q(μk,Λk)=N

(
μk|mk,(βkΛk)−^1

)
W(Λk|Wk,νk) (10.59)

where we have defined

βk = β 0 +Nk (10.60)

mk =

1

βk

(β 0 m 0 +Nkxk) (10.61)

W−k^1 = W− 01 +NkSk+

β 0 Nk
β 0 +Nk

(xk−m 0 )(xk−m 0 )T (10.62)

νk = ν 0 +Nk. (10.63)

These update equations are analogous to the M-step equations of the EM algorithm
for the maximum likelihood solution of the mixture of Gaussians. We see that the
computations that must be performed in order to update the variational posterior
distribution over the model parameters involve evaluation of the same sums over the
data set, as arose in the maximum likelihood treatment.
In order to perform this variational M step, we need the expectationsE[znk]=
rnkrepresenting the responsibilities. These are obtained by normalizing theρnkthat
are given by (10.46). We see that this expression involves expectations with respect
to the variational distributions of the parameters, and these are easily evaluated to
Exercise 10.14 give


Eμk,Λk

[
(xn−μk)TΛk(xn−μk)

]

= Dβk−^1 +νk(xn−mk)TWk(xn−mk) (10.64)

lnΛ ̃k≡E[ln|Λk|]=

∑D

i=1

ψ

(
νk+1−i
2

)
+Dln 2 + ln|Wk| (10.65)

ln ̃πk≡E[lnπk]=ψ(αk)−ψ(̂α) (10.66)
Free download pdf