Pattern Recognition and Machine Learning

(Jeff_L) #1
10.2. Illustration: Variational Mixture of Gaussians 475

Figure 10.5 Directed acyclic graph representing the Bayesian mix-
ture of Gaussians model, in which the box (plate) de-
notes a set ofNi.i.d. observations. Hereμdenotes
{μk}andΛdenotes{Λk}.

xn

zn

N

π

μ

Λ

Section 2.2.1 by (B.23). As we have seen, the parameterα 0 can be interpreted as the effective
prior number of observations associated with each component of the mixture. If the
value ofα 0 is small, then the posterior distribution will be influenced primarily by
the data rather than by the prior.
Similarly, we introduce an independent Gaussian-Wishart prior governing the
mean and precision of each Gaussian component, given by


p(μ,Λ)=p(μ|Λ)p(Λ)

=

∏K

k=1

N

(
μk|m 0 ,(β 0 Λk)−^1

)
W(Λk|W 0 ,ν 0 ) (10.40)

because this represents the conjugate prior distribution when both the mean and pre-
Section 2.3.6 cision are unknown. Typically we would choosem 0 = 0 by symmetry.
The resulting model can be represented as a directed graph as shown in Fig-
ure 10.5. Note that there is a link fromΛtoμsince the variance of the distribution
overμin (10.40) is a function ofΛ.
This example provides a nice illustration of the distinction between latent vari-
ables and parameters. Variables such asznthat appear inside the plate are regarded
as latent variables because the number of such variables grows with the size of the
data set. By contrast, variables such asμthat are outside the plate are fixed in
number independently of the size of the data set, and so are regarded as parameters.
From the perspective of graphical models, however, there is really no fundamental
difference between them.


10.2.1 Variational distribution
In order to formulate a variational treatment of this model, we next write down
the joint distribution of all of the random variables, which is given by

p(X,Z,π,μ,Λ)=p(X|Z,μ,Λ)p(Z|π)p(π)p(μ|Λ)p(Λ) (10.41)

in which the various factors are defined above. The reader should take a moment to
verify that this decomposition does indeed correspond to the probabilistic graphical
model shown in Figure 10.5. Note that only the variablesX={x 1 ,...,xN}are
observed.
Free download pdf