Pattern Recognition and Machine Learning

(Jeff_L) #1
10.1. Variational Inference 465

It should be emphasized that we are making no further assumptions about the distri-
bution. In particular, we place no restriction on the functional forms of the individual
factorsqi(Zi). This factorized form of variational inference corresponds to an ap-
proximation framework developed in physics calledmean field theory(Parisi, 1988).
Amongst all distributionsq(Z)having the form (10.5), we now seek that distri-
bution for which the lower boundL(q)is largest. We therefore wish to make a free
form (variational) optimization ofL(q)with respect to all of the distributionsqi(Zi),
which we do by optimizing with respect to each of the factors in turn. To achieve
this, we first substitute (10.5) into (10.3) and then dissect out the dependence on one
of the factorsqj(Zj). Denotingqj(Zj)by simplyqjto keep the notation uncluttered,
we then obtain

L(q)=

∫ ∏

i

qi

{

lnp(X,Z)−


i

lnqi

}

dZ

=


qj

{∫

lnp(X,Z)


i =j

qidZi

}

dZj−


qjlnqjdZj+const

=


qjln ̃p(X,Zj)dZj−


qjlnqjdZj+const (10.6)

where we have defined a new distribution ̃p(X,Zj)by the relation

ln ̃p(X,Zj)=Ei =j[lnp(X,Z)] + const. (10.7)

Here the notationEi =j[···]denotes an expectation with respect to theqdistributions
over all variableszifori =j, so that

Ei =j[lnp(X,Z)] =


lnp(X,Z)


i =j

qidZi. (10.8)

Now suppose we keep the{qi =j}fixed and maximizeL(q)in (10.6) with re-
spect to all possible forms for the distributionqj(Zj). This is easily done by rec-
ognizing that (10.6) is a negative Kullback-Leibler divergence betweenqj(Zj)and
̃p(X,Zj). Thus maximizing (10.6) is equivalent to minimizing the Kullback-Leibler

Leonhard Euler


1707–1783

Euler was a Swiss mathematician
and physicist who worked in St.
Petersburg and Berlin and who is
widely considered to be one of the
greatest mathematicians of all time.
He is certainly the most prolific, and
his collected works fill 75 volumes. Amongst his many


contributions, he formulated the modern theory of the
function, he developed (together with Lagrange) the
calculus of variations, and he discovered the formula
eiπ =− 1 , which relates four of the most important
numbers in mathematics. During the last 17 years of
his life, he was almost totally blind, and yet he pro-
duced nearly half of his results during this period.
Free download pdf