Pattern Recognition and Machine Learning

(Jeff_L) #1
466 10. APPROXIMATE INFERENCE

divergence, and the minimum occurs whenqj(Zj)= ̃p(X,Zj). Thus we obtain a
general expression for the optimal solutionqj(Zj)given by

lnqj(Zj)=Ei =j[lnp(X,Z)] + const. (10.9)

It is worth taking a few moments to study the form of this solution as it provides the
basis for applications of variational methods. It says that the log of the optimal so-
lution for factorqjis obtained simply by considering the log of the joint distribution
over all hidden and visible variables and then taking the expectation with respect to
all of the other factors{qi}fori =j.
The additive constant in (10.9) is set by normalizing the distributionqj(Zj).
Thus if we take the exponential of both sides and normalize, we have

qj(Zj)=

exp (Ei =j[lnp(X,Z)])

exp (Ei =j[lnp(X,Z)]) dZj

In practice, we shall find it more convenient to work with the form (10.9) and then re-
instate the normalization constant (where required) by inspection. This will become
clear from subsequent examples.
The set of equations given by (10.9) forj=1,...,Mrepresent a set of con-
sistency conditions for the maximum of the lower bound subject to the factorization
constraint. However, they do not represent an explicit solution because the expres-
sion on the right-hand side of (10.9) for the optimumqj(Zj)depends on expectations
computed with respect to the other factorsqi(Zi)fori =j. We will therefore seek
a consistent solution by first initializing all of the factorsqi(Zi)appropriately and
then cycling through the factors and replacing each in turn with a revised estimate
given by the right-hand side of (10.9) evaluated using the current estimates for all of
the other factors. Convergence is guaranteed because bound is convex with respect
to each of the factorsqi(Zi)(Boyd and Vandenberghe, 2004).

10.1.2 Properties of factorized approximations


Our approach to variational inference is based on a factorized approximation to
the true posterior distribution. Let us consider for a moment the problem of approx-
imating a general distribution by a factorized distribution. To begin with, we discuss
the problem of approximating a Gaussian distribution using a factorized Gaussian,
which will provide useful insight into the types of inaccuracy introduced in using
factorized approximations. Consider a Gaussian distributionp(z)=N(z|μ,Λ−^1 )
over two correlated variablesz=(z 1 ,z 2 )in which the mean and precision have
elements
μ=

(
μ 1
μ 2

)
, Λ=

(
Λ 11 Λ 12
Λ 21 Λ 22

)
(10.10)

andΛ 21 =Λ 12 due to the symmetry of the precision matrix. Now suppose we
wish to approximate this distribution using a factorized Gaussian of the formq(z)=
q 1 (z 1 )q 2 (z 2 ). We first apply the general result (10.9) to find an expression for the
Free download pdf