Pattern Recognition and Machine Learning

10.7. Expectation Propagation 505

We also need to optimize the variational parametersξn, and this is also done by maximizing the lower bound ̃L(q,ξ). Omitting terms that are independent ofξ, and integrating overα,wehave

̃L(q,ξ)=

∫ q(w)lnh(w,ξ)dw+const. (10.180)

Note that this has precisely the same form as (10.159), and so we can again appeal to our earlier result (10.163), which can be obtained by direct optimization of the marginal likelihood function, leading to re-estimation equations of the form

(ξnnew)^2 =φTn

( ΣN+μNμTN

) φn. (10.181)

We have obtained re-estimation equations for the three quantitiesq(w),q(α),
andξ, and so after making suitable initializations, we can cycle through these quan-
Appendix B tities, updating each in turn. The required moments are given by

E[α]=

aN bN

(10.182)

E

[ wTw

] = ΣN+μTNμN. (10.183)

10.7 Expectation Propagation

We conclude this chapter by discussing an alternative form of deterministic approx- imate inference, known asexpectation propagationorEP(Minka, 2001a; Minka, 2001b). As with the variational Bayes methods discussed so far, this too is based on the minimization of a Kullback-Leibler divergence but now of the reverse form, which gives the approximation rather different properties. Consider for a moment the problem of minimizingKL(p‖q)with respect toq(z) whenp(z)is a fixed distribution andq(z)is a member of the exponential family and so, from (2.194), can be written in the form

q(z)=h(z)g(η)exp

{ ηTu(z)

}

. (10.184)

As a function ofη, the Kullback-Leibler divergence then becomes

KL(p‖q)=−lng(η)−ηTEp(z)[u(z)]+const (10.185)

where the constant terms are independent of the natural parametersη. We can mini- mizeKL(p‖q)within this family of distributions by setting the gradient with respect toηto zero, giving −∇lng(η)=Ep(z)[u(z)]. (10.186) However, we have already seen in (2.226) that the negative gradient oflng(η)is given by the expectation ofu(z)under the distributionq(z). Equating these two results, we obtain Eq(z)[u(z)] =Ep(z)[u(z)]. (10.187)

Pattern Recognition and Machine Learning

(10.182)

E

10.7 Expectation Propagation

Get our desktop app

Company

Features

Documentation

Resources