518 10. APPROXIMATE INFERENCE
minimization ofKL(p‖q)with respect toμandΣleads to the result thatμis given
by the expectation ofxunderp(x)and thatΣis given by the covariance.
10.5 ( ) www Consider a model in which the set of all hidden stochastic variables, de-
noted collectively byZ, comprises some latent variablesztogether with some model
parametersθ. Suppose we use a variational distribution that factorizes between la-
tent variables and parameters so thatq(z,θ)=qz(z)qθ(θ), in which the distribution
qθ(θ)is approximated by a point estimate of the formqθ(θ)=δ(θ−θ 0 )whereθ 0
is a vector of free parameters. Show that variational optimization of this factorized
distribution is equivalent to an EM algorithm, in which the E step optimizesqz(z),
and the M step maximizes the expected complete-data log posterior distribution ofθ
with respect toθ 0.
10.6 ( ) The alpha family of divergences is defined by (10.19). Show that the Kullback-
Leibler divergenceKL(p‖q)corresponds toα→ 1. This can be done by writing
p^ = exp(lnp)=1+lnp+O(^2 )and then taking→ 0. Similarly show that
KL(q‖p)corresponds toα→− 1.
10.7 ( ) Consider the problem of inferring the mean and precision of a univariate Gaus-
sian using a factorized variational approximation, as considered in Section 10.1.3.
Show that the factorqμ(μ)is a Gaussian of the formN(μ|μN,λ−N^1 )with mean and
precision given by (10.26) and (10.27), respectively. Similarly show that the factor
qτ(τ)is a gamma distribution of the formGam(τ|aN,bN)with parameters given by
(10.29) and (10.30).
10.8 ( ) Consider the variational posterior distribution for the precision of a univariate
Gaussian whose parameters are given by (10.29) and (10.30). By using the standard
results for the mean and variance of the gamma distribution given by (B.27) and
(B.28), show that if we letN →∞, this variational posterior distribution has a
mean given by the inverse of the maximum likelihood estimator for the variance of
the data, and a variance that goes to zero.
10.9 ( ) By making use of the standard resultE[τ]=aN/bNfor the mean of a gamma
distribution, together with (10.26), (10.27), (10.29), and (10.30), derive the result
(10.33) for the reciprocal of the expected precision in the factorized variational treat-
ment of a univariate Gaussian.
10.10 ( ) www Derive the decomposition given by (10.34) that is used to find approxi-
mate posterior distributions over models using variational inference.
10.11 ( ) www By using a Lagrange multiplier to enforce the normalization constraint
on the distributionq(m), show that the maximum of the lower bound (10.35) is given
by (10.36).
10.12 ( ) Starting from the joint distribution (10.41), and applying the general result
(10.9), show that the optimal variational distributionq(Z)over the latent variables
for the Bayesian mixture of Gaussians is given by (10.48) by verifying the steps
given in the text.