Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

352 Generative Models


while forQi,y=Pθ[Y=y|X=xi] we have

G(Q,θ) =

∑m

i=1

(k

y=1

Pθ[Y=y|X=xi] log

(

Pθ[X=xi,Y =y]
Pθ[Y=y|X=xi]

))

=

∑m

i=1

∑k

y=1

Pθ[Y=y|X=xi] log (Pθ[X=xi])

=

∑m

i=1

log (Pθ[X=xi])

∑k

y=1

Pθ[Y=y|X=xi]

=

∑m

i=1

log (Pθ[X=xi]) =L(θ).

This shows that settingQi,y=Pθ[Y=y|X=xi] maximizesG(Q,θ) overQ∈Q
and shows thatG(Q(t+1),θ(t)) =L(θ(t)).

The preceding lemma immediately implies:

theorem24.3 The EM procedure never decreases the log-likelihood; namely,
for allt,
L(θ(t+1))≥L(θ(t)).

Proof By the lemma we have

L(θ(t+1)) =G(Q(t+2),θ(t+1))≥G(Q(t+1),θ(t)) =L(θ(t)).

24.4.2 EM for Mixture of Gaussians (Soft k-Means)


Consider the case of a mixture ofkGaussians in whichθis a triplet (c,{μ 1 ,...,μk},{Σ 1 ,...,Σk})
wherePθ[Y=y] =cyandPθ[X=x|Y=y] is as given in Equation (24.9). For
simplicity, we assume that Σ 1 = Σ 2 =···= Σk=I, whereIis the identity
matrix. Specifying the EM algorithm for this case we obtain the following:


  • Expectation step: For eachi∈[m] andy∈[k] we have that


Pθ(t)[Y=y|X=xi] =

1

Zi

Pθ(t)[Y=y]Pθ(t)[X=xi|Y=y]

=

1

Zic

(t)
y exp

(


1

2 ‖xi−μ

(t)
y ‖

2

)

, (24.12)

whereZiis a normalization factor which ensures that


yPθ(t)[Y=y|X=
xi] sums to 1.


  • Maximization step: We need to setθt+1to be a maximizer of Equation (24.11),

Free download pdf