446 9. MIXTURE MODELS AND EM
variablezassociated with each instance ofx. As in the case of the Gaussian mixture,
z=(z 1 ,...,zK)Tis a binaryK-dimensional variable having a single component
equal to 1 , with all other components equal to 0. We can then write the conditional
distribution ofx, given the latent variable, asp(x|z,μ)=∏Kk=1p(x|μk)zk (9.52)while the prior distribution for the latent variables is the same as for the mixture of
Gaussians model, so thatp(z|π)=∏Kk=1πzkk. (9.53)If we form the product ofp(x|z,μ)andp(z|π)and then marginalize overz, then we
Exercise 9.14 recover (9.47).
In order to derive the EM algorithm, we first write down the complete-data log
likelihood function, which is given by
lnp(X,Z|μ,π)=∑Nn=1∑Kk=1znk{
lnπk+
∑Di=1[xnilnμki+(1−xni)ln(1−μki)]}(9.54)whereX={xn}andZ={zn}. Next we take the expectation of the complete-data
log likelihood with respect to the posterior distribution of the latent variables to giveEZ[lnp(X,Z|μ,π)] =∑Nn=1∑Kk=1γ(znk){
lnπk+
∑Di=1[xnilnμki+(1−xni)ln(1−μki)]}(9.55)whereγ(znk)=E[znk]is the posterior probability, or responsibility, of component
kgiven data pointxn. In the E step, these responsibilities are evaluated using Bayes’
theorem, which takes the formγ(znk)=E[znk]=∑znkznk[πkp(xn|μk)]znk∑znj[
πjp(xn|μj)]znj=
πkp(xn|μk)
∑Kj=1πjp(xn|μj)