446 9. MIXTURE MODELS AND EM
variablezassociated with each instance ofx. As in the case of the Gaussian mixture,
z=(z 1 ,...,zK)Tis a binaryK-dimensional variable having a single component
equal to 1 , with all other components equal to 0. We can then write the conditional
distribution ofx, given the latent variable, as
p(x|z,μ)=
∏K
k=1
p(x|μk)zk (9.52)
while the prior distribution for the latent variables is the same as for the mixture of
Gaussians model, so that
p(z|π)=
∏K
k=1
πzkk. (9.53)
If we form the product ofp(x|z,μ)andp(z|π)and then marginalize overz, then we
Exercise 9.14 recover (9.47).
In order to derive the EM algorithm, we first write down the complete-data log
likelihood function, which is given by
lnp(X,Z|μ,π)=
∑N
n=1
∑K
k=1
znk
{
lnπk
+
∑D
i=1
[xnilnμki+(1−xni)ln(1−μki)]
}
(9.54)
whereX={xn}andZ={zn}. Next we take the expectation of the complete-data
log likelihood with respect to the posterior distribution of the latent variables to give
EZ[lnp(X,Z|μ,π)] =
∑N
n=1
∑K
k=1
γ(znk)
{
lnπk
+
∑D
i=1
[xnilnμki+(1−xni)ln(1−μki)]
}
(9.55)
whereγ(znk)=E[znk]is the posterior probability, or responsibility, of component
kgiven data pointxn. In the E step, these responsibilities are evaluated using Bayes’
theorem, which takes the form
γ(znk)=E[znk]=
∑
znk
znk[πkp(xn|μk)]znk
∑
znj
[
πjp(xn|μj)
]znj
=
πkp(xn|μk)
∑K
j=1
πjp(xn|μj)