Pattern Recognition and Machine Learning

(Jeff_L) #1
618 13. SEQUENTIAL DATA

Gaussian emission densities we havep(x|φk)=N(x|μk,Σk), and maximization
of the functionQ(θ,θold)then gives

μk =

∑N

n=1

γ(znk)xn

∑N

n=1

γ(znk)

(13.20)

Σk =

∑N

n=1

γ(znk)(xn−μk)(xn−μk)T

∑N

n=1

γ(znk)

. (13.21)

For the case of discrete multinomial observed variables, the conditional distribution
of the observations takes the form

p(x|z)=

∏D

i=1

∏K

k=1

μxikizk (13.22)

Exercise 13.8 and the corresponding M-step equations are given by


μik=

∑N

n=1

γ(znk)xni

∑N

n=1

γ(znk)

. (13.23)

An analogous result holds for Bernoulli observed variables.
The EM algorithm requires initial values for the parameters of the emission dis-
tribution. One way to set these is first to treat the data initially as i.i.d. and fit the
emission density by maximum likelihood, and then use the resulting values to ini-
tialize the parameters for EM.

13.2.2 The forward-backward algorithm


Next we seek an efficient procedure for evaluating the quantitiesγ(znk)and
ξ(zn− 1 ,j,znk), corresponding to the E step of the EM algorithm. The graph for the
hidden Markov model, shown in Figure 13.5, is a tree, and so we know that the
posterior distribution of the latent variables can be obtained efficiently using a two-
Section 8.4 stage message passing algorithm. In the particular context of the hidden Markov
model, this is known as theforward-backwardalgorithm (Rabiner, 1989), or the
Baum-Welchalgorithm (Baum, 1972). There are in fact several variants of the basic
algorithm, all of which lead to the exact marginals, according to the precise form of

Free download pdf