Understanding Machine Learning: From Theory to Algorithms

352 Generative Models

while forQi,y=Pθ[Y=y|X=xi] we have

G(Q,θ) =

∑m

i=1

(k ∑

y=1

Pθ[Y=y|X=xi] log

(

Pθ[X=xi,Y =y] Pθ[Y=y|X=xi]

))

=

∑m

i=1

∑k

y=1

Pθ[Y=y|X=xi] log (Pθ[X=xi])

=

∑m

i=1

log (Pθ[X=xi])

∑k

y=1

Pθ[Y=y|X=xi]

=

∑m

i=1

log (Pθ[X=xi]) =L(θ).

This shows that settingQi,y=Pθ[Y=y|X=xi] maximizesG(Q,θ) overQ∈Q and shows thatG(Q(t+1),θ(t)) =L(θ(t)).

The preceding lemma immediately implies:

theorem24.3 The EM procedure never decreases the log-likelihood; namely, for allt, L(θ(t+1))≥L(θ(t)).

Proof By the lemma we have

L(θ(t+1)) =G(Q(t+2),θ(t+1))≥G(Q(t+1),θ(t)) =L(θ(t)).

24.4.2 EM for Mixture of Gaussians (Soft k-Means)

Consider the case of a mixture ofkGaussians in whichθis a triplet (c,{μ 1 ,...,μk},{Σ 1 ,...,Σk}) wherePθ[Y=y] =cyandPθ[X=x|Y=y] is as given in Equation (24.9). For simplicity, we assume that Σ 1 = Σ 2 =···= Σk=I, whereIis the identity matrix. Specifying the EM algorithm for this case we obtain the following:

Expectation step: For eachi∈[m] andy∈[k] we have that

Pθ(t)[Y=y|X=xi] =

1

Zi

Pθ(t)[Y=y]Pθ(t)[X=xi|Y=y]

=

1

Zic

(t) y exp

(

−

1

2 ‖xi−μ

(t) y ‖

2

)

, (24.12)

whereZiis a normalization factor which ensures that

∑

yPθ(t)[Y=y|X= xi] sums to 1.

Maximization step: We need to setθt+1to be a maximizer of Equation (24.11),

Understanding Machine Learning: From Theory to Algorithms

(

))

=

=

=

24.4.2 EM for Mixture of Gaussians (Soft k-Means)

1

=

1

(

−

1

)

, (24.12)

∑

Get our desktop app

Company

Features

Documentation

Resources