Pattern Recognition and Machine Learning

(Jeff_L) #1
10.7. Expectation Propagation 505

We also need to optimize the variational parametersξn, and this is also done by
maximizing the lower bound ̃L(q,ξ). Omitting terms that are independent ofξ, and
integrating overα,wehave

̃L(q,ξ)=


q(w)lnh(w,ξ)dw+const. (10.180)

Note that this has precisely the same form as (10.159), and so we can again appeal
to our earlier result (10.163), which can be obtained by direct optimization of the
marginal likelihood function, leading to re-estimation equations of the form

(ξnnew)^2 =φTn

(
ΣN+μNμTN

)
φn. (10.181)

We have obtained re-estimation equations for the three quantitiesq(w),q(α),
andξ, and so after making suitable initializations, we can cycle through these quan-
Appendix B tities, updating each in turn. The required moments are given by


E[α]=

aN
bN

(10.182)

E

[
wTw

]
= ΣN+μTNμN. (10.183)

10.7 Expectation Propagation


We conclude this chapter by discussing an alternative form of deterministic approx-
imate inference, known asexpectation propagationorEP(Minka, 2001a; Minka,
2001b). As with the variational Bayes methods discussed so far, this too is based
on the minimization of a Kullback-Leibler divergence but now of the reverse form,
which gives the approximation rather different properties.
Consider for a moment the problem of minimizingKL(p‖q)with respect toq(z)
whenp(z)is a fixed distribution andq(z)is a member of the exponential family and
so, from (2.194), can be written in the form

q(z)=h(z)g(η)exp

{
ηTu(z)

}

. (10.184)


As a function ofη, the Kullback-Leibler divergence then becomes

KL(p‖q)=−lng(η)−ηTEp(z)[u(z)]+const (10.185)

where the constant terms are independent of the natural parametersη. We can mini-
mizeKL(p‖q)within this family of distributions by setting the gradient with respect
toηto zero, giving
−∇lng(η)=Ep(z)[u(z)]. (10.186)
However, we have already seen in (2.226) that the negative gradient oflng(η)is
given by the expectation ofu(z)under the distributionq(z). Equating these two
results, we obtain
Eq(z)[u(z)] =Ep(z)[u(z)]. (10.187)
Free download pdf