Pattern Recognition and Machine Learning

(Jeff_L) #1
446 9. MIXTURE MODELS AND EM

variablezassociated with each instance ofx. As in the case of the Gaussian mixture,
z=(z 1 ,...,zK)Tis a binaryK-dimensional variable having a single component
equal to 1 , with all other components equal to 0. We can then write the conditional
distribution ofx, given the latent variable, as

p(x|z,μ)=

∏K

k=1

p(x|μk)zk (9.52)

while the prior distribution for the latent variables is the same as for the mixture of
Gaussians model, so that

p(z|π)=

∏K

k=1

πzkk. (9.53)

If we form the product ofp(x|z,μ)andp(z|π)and then marginalize overz, then we
Exercise 9.14 recover (9.47).
In order to derive the EM algorithm, we first write down the complete-data log
likelihood function, which is given by


lnp(X,Z|μ,π)=

∑N

n=1

∑K

k=1

znk

{
lnπk

+

∑D

i=1

[xnilnμki+(1−xni)ln(1−μki)]

}

(9.54)

whereX={xn}andZ={zn}. Next we take the expectation of the complete-data
log likelihood with respect to the posterior distribution of the latent variables to give

EZ[lnp(X,Z|μ,π)] =

∑N

n=1

∑K

k=1

γ(znk)

{
lnπk

+

∑D

i=1

[xnilnμki+(1−xni)ln(1−μki)]

}

(9.55)

whereγ(znk)=E[znk]is the posterior probability, or responsibility, of component
kgiven data pointxn. In the E step, these responsibilities are evaluated using Bayes’
theorem, which takes the form

γ(znk)=E[znk]=


znk

znk[πkp(xn|μk)]znk


znj

[
πjp(xn|μj)

]znj

=

πkp(xn|μk)
∑K

j=1

πjp(xn|μj)

. (9.56)
Free download pdf