352 Generative Models
while forQi,y=Pθ[Y=y|X=xi] we haveG(Q,θ) =∑mi=1(k
∑y=1Pθ[Y=y|X=xi] log(
Pθ[X=xi,Y =y]
Pθ[Y=y|X=xi]))
=
∑mi=1∑ky=1Pθ[Y=y|X=xi] log (Pθ[X=xi])=
∑mi=1log (Pθ[X=xi])∑ky=1Pθ[Y=y|X=xi]=
∑mi=1log (Pθ[X=xi]) =L(θ).This shows that settingQi,y=Pθ[Y=y|X=xi] maximizesG(Q,θ) overQ∈Q
and shows thatG(Q(t+1),θ(t)) =L(θ(t)).The preceding lemma immediately implies:theorem24.3 The EM procedure never decreases the log-likelihood; namely,
for allt,
L(θ(t+1))≥L(θ(t)).Proof By the lemma we haveL(θ(t+1)) =G(Q(t+2),θ(t+1))≥G(Q(t+1),θ(t)) =L(θ(t)).24.4.2 EM for Mixture of Gaussians (Soft k-Means)
Consider the case of a mixture ofkGaussians in whichθis a triplet (c,{μ 1 ,...,μk},{Σ 1 ,...,Σk})
wherePθ[Y=y] =cyandPθ[X=x|Y=y] is as given in Equation (24.9). For
simplicity, we assume that Σ 1 = Σ 2 =···= Σk=I, whereIis the identity
matrix. Specifying the EM algorithm for this case we obtain the following:- Expectation step: For eachi∈[m] andy∈[k] we have that
Pθ(t)[Y=y|X=xi] =1
ZiPθ(t)[Y=y]Pθ(t)[X=xi|Y=y]=
1
Zic(t)
y exp(
−
1
2 ‖xi−μ(t)
y ‖2)
, (24.12)
whereZiis a normalization factor which ensures that∑
yPθ(t)[Y=y|X=
xi] sums to 1.- Maximization step: We need to setθt+1to be a maximizer of Equation (24.11),
