Pattern Recognition and Machine Learning

(Jeff_L) #1
482 10. APPROXIMATE INFERENCE

E[lnp(μ,Λ)] =

1

2

∑K

k=1

{
Dln(β 0 / 2 π)+ln ̃Λk−

Dβ 0
βk

−β 0 νk(mk−m 0 )TWk(mk−m 0 )

}
+KlnB(W 0 ,ν 0 )

+

(ν 0 −D−1)
2

∑K

k=1

lnΛ ̃k−

1

2

∑K

k=1

νkTr(W− 01 Wk) (10.74)

E[lnq(Z)] =

∑N

n=1

∑K

k=1

rnklnrnk (10.75)

E[lnq(π)] =

∑K

k=1

(αk−1) ln ̃πk+lnC(α) (10.76)

E[lnq(μ,Λ)] =

∑K

k=1

{
1
2

ln ̃Λk+

D

2

ln

(
βk
2 π

)

D

2

−H[q(Λk)]

}
(10.77)

whereDis the dimensionality ofx,H[q(Λk)]is the entropy of the Wishart distribu-
tion given by (B.82), and the coefficientsC(α)andB(W,ν)are defined by (B.23)
and (B.79), respectively. Note that the terms involving expectations of the logs of the
qdistributions simply represent the negative entropies of those distributions. Some
simplifications and combination of terms can be performed when these expressions
are summed to give the lower bound. However, we have kept the expressions sepa-
rate for ease of understanding.
Finally, it is worth noting that the lower bound provides an alternative approach
for deriving the variational re-estimation equations obtained in Section 10.2.1. To do
this we use the fact that, since the model has conjugate priors, the functional form of
the factors in the variational posterior distribution is known, namely discrete forZ,
Dirichlet forπ, and Gaussian-Wishart for(μk,Λk). By taking general parametric
forms for these distributions we can derive the form of the lower bound as a function
of the parameters of the distributions. Maximizing the bound with respect to these
Exercise 10.18 parameters then gives the required re-estimation equations.


10.2.3 Predictive density


In applications of the Bayesian mixture of Gaussians model we will often be
interested in the predictive density for a new valuêxof the observed variable. As-
sociated with this observation will be a corresponding latent variablêz, and the pre-
dictive density is then given by

p(̂x|X)=


bz

∫∫∫
p(̂x|̂z,μ,Λ)p(̂z|π)p(π,μ,Λ|X)dπdμdΛ (10.78)
Free download pdf