Pattern Recognition and Machine Learning

518 10. APPROXIMATE INFERENCE

minimization ofKL(p‖q)with respect toμandΣleads to the result thatμis given by the expectation ofxunderp(x)and thatΣis given by the covariance.

10.5 ( ) www Consider a model in which the set of all hidden stochastic variables, de- noted collectively byZ, comprises some latent variablesztogether with some model parametersθ. Suppose we use a variational distribution that factorizes between latent variables and parameters so thatq(z,θ)=qz(z)qθ(θ), in which the distribution qθ(θ)is approximated by a point estimate of the formqθ(θ)=δ(θ−θ 0 )whereθ 0 is a vector of free parameters. Show that variational optimization of this factorized distribution is equivalent to an EM algorithm, in which the E step optimizesqz(z), and the M step maximizes the expected complete-data log posterior distribution ofθ with respect toθ 0.

10.6 ( ) The alpha family of divergences is defined by (10.19). Show that the Kullback- Leibler divergenceKL(p‖q)corresponds toα→ 1. This can be done by writing p^ = exp(lnp)=1+lnp+O(^2 )and then taking→ 0. Similarly show that KL(q‖p)corresponds toα→− 1.

10.7 ( ) Consider the problem of inferring the mean and precision of a univariate Gaus- sian using a factorized variational approximation, as considered in Section 10.1.3. Show that the factorqμ(μ)is a Gaussian of the formN(μ|μN,λ−N^1 )with mean and precision given by (10.26) and (10.27), respectively. Similarly show that the factor qτ(τ)is a gamma distribution of the formGam(τ|aN,bN)with parameters given by (10.29) and (10.30).

10.8 ( ) Consider the variational posterior distribution for the precision of a univariate Gaussian whose parameters are given by (10.29) and (10.30). By using the standard results for the mean and variance of the gamma distribution given by (B.27) and (B.28), show that if we letN →∞, this variational posterior distribution has a mean given by the inverse of the maximum likelihood estimator for the variance of the data, and a variance that goes to zero.

10.9 ( ) By making use of the standard resultE[τ]=aN/bNfor the mean of a gamma distribution, together with (10.26), (10.27), (10.29), and (10.30), derive the result (10.33) for the reciprocal of the expected precision in the factorized variational treat- ment of a univariate Gaussian.

10.10 ( ) www Derive the decomposition given by (10.34) that is used to find approximate posterior distributions over models using variational inference.

10.11 ( ) www By using a Lagrange multiplier to enforce the normalization constraint on the distributionq(m), show that the maximum of the lower bound (10.35) is given by (10.36).

10.12 ( ) Starting from the joint distribution (10.41), and applying the general result (10.9), show that the optimal variational distributionq(Z)over the latent variables for the Bayesian mixture of Gaussians is given by (10.48) by verifying the steps given in the text.

Pattern Recognition and Machine Learning

518 10. APPROXIMATE INFERENCE

Get our desktop app

Company

Features

Documentation

Resources