352 Generative Models
while forQi,y=Pθ[Y=y|X=xi] we have
G(Q,θ) =
∑m
i=1
(k
∑
y=1
Pθ[Y=y|X=xi] log
(
Pθ[X=xi,Y =y]
Pθ[Y=y|X=xi]
))
=
∑m
i=1
∑k
y=1
Pθ[Y=y|X=xi] log (Pθ[X=xi])
=
∑m
i=1
log (Pθ[X=xi])
∑k
y=1
Pθ[Y=y|X=xi]
=
∑m
i=1
log (Pθ[X=xi]) =L(θ).
This shows that settingQi,y=Pθ[Y=y|X=xi] maximizesG(Q,θ) overQ∈Q
and shows thatG(Q(t+1),θ(t)) =L(θ(t)).
The preceding lemma immediately implies:
theorem24.3 The EM procedure never decreases the log-likelihood; namely,
for allt,
L(θ(t+1))≥L(θ(t)).
Proof By the lemma we have
L(θ(t+1)) =G(Q(t+2),θ(t+1))≥G(Q(t+1),θ(t)) =L(θ(t)).
24.4.2 EM for Mixture of Gaussians (Soft k-Means)
Consider the case of a mixture ofkGaussians in whichθis a triplet (c,{μ 1 ,...,μk},{Σ 1 ,...,Σk})
wherePθ[Y=y] =cyandPθ[X=x|Y=y] is as given in Equation (24.9). For
simplicity, we assume that Σ 1 = Σ 2 =···= Σk=I, whereIis the identity
matrix. Specifying the EM algorithm for this case we obtain the following:
- Expectation step: For eachi∈[m] andy∈[k] we have that
Pθ(t)[Y=y|X=xi] =
1
Zi
Pθ(t)[Y=y]Pθ(t)[X=xi|Y=y]
=
1
Zic
(t)
y exp
(
−
1
2 ‖xi−μ
(t)
y ‖
2
)
, (24.12)
whereZiis a normalization factor which ensures that
∑
yPθ(t)[Y=y|X=
xi] sums to 1.
- Maximization step: We need to setθt+1to be a maximizer of Equation (24.11),