Pattern Recognition and Machine Learning

446 9. MIXTURE MODELS AND EM

variablezassociated with each instance ofx. As in the case of the Gaussian mixture, z=(z 1 ,...,zK)Tis a binaryK-dimensional variable having a single component equal to 1 , with all other components equal to 0. We can then write the conditional distribution ofx, given the latent variable, as

p(x|z,μ)=

∏K

k=1

p(x|μk)zk (9.52)

while the prior distribution for the latent variables is the same as for the mixture of Gaussians model, so that

p(z|π)=

∏K

k=1

πzkk. (9.53)

If we form the product ofp(x|z,μ)andp(z|π)and then marginalize overz, then we
Exercise 9.14 recover (9.47).
In order to derive the EM algorithm, we first write down the complete-data log
likelihood function, which is given by

lnp(X,Z|μ,π)=

∑N

n=1

∑K

k=1

znk

{ lnπk

+

∑D

i=1

[xnilnμki+(1−xni)ln(1−μki)]

}

(9.54)

whereX={xn}andZ={zn}. Next we take the expectation of the complete-data log likelihood with respect to the posterior distribution of the latent variables to give

EZ[lnp(X,Z|μ,π)] =

∑N

n=1

∑K

k=1

γ(znk)

{ lnπk

+

∑D

i=1

[xnilnμki+(1−xni)ln(1−μki)]

}

(9.55)

whereγ(znk)=E[znk]is the posterior probability, or responsibility, of component kgiven data pointxn. In the E step, these responsibilities are evaluated using Bayes’ theorem, which takes the form

γ(znk)=E[znk]=

∑

znk

znk[πkp(xn|μk)]znk

∑

znj

[ πjp(xn|μj)

]znj

=

πkp(xn|μk) ∑K

j=1

πjp(xn|μj)

. (9.56)

Pattern Recognition and Machine Learning

446 9. MIXTURE MODELS AND EM

+

+

=

. (9.56)

Get our desktop app

Company

Features

Documentation

Resources