Pattern Recognition and Machine Learning

618 13. SEQUENTIAL DATA

Gaussian emission densities we havep(x|φk)=N(x|μk,Σk), and maximization of the functionQ(θ,θold)then gives

μk =

∑N

n=1

γ(znk)xn

∑N

n=1

γ(znk)

(13.20)

Σk =

∑N

n=1

γ(znk)(xn−μk)(xn−μk)T

∑N

n=1

γ(znk)

. (13.21)

For the case of discrete multinomial observed variables, the conditional distribution of the observations takes the form

p(x|z)=

∏D

i=1

∏K

k=1

μxikizk (13.22)

Exercise 13.8 and the corresponding M-step equations are given by

μik=

∑N

n=1

γ(znk)xni

∑N

n=1

γ(znk)

. (13.23)

An analogous result holds for Bernoulli observed variables. The EM algorithm requires initial values for the parameters of the emission distribution. One way to set these is first to treat the data initially as i.i.d. and fit the emission density by maximum likelihood, and then use the resulting values to ini- tialize the parameters for EM.

13.2.2 The forward-backward algorithm

Next we seek an efficient procedure for evaluating the quantitiesγ(znk)and
ξ(zn− 1 ,j,znk), corresponding to the E step of the EM algorithm. The graph for the
hidden Markov model, shown in Figure 13.5, is a tree, and so we know that the
posterior distribution of the latent variables can be obtained efficiently using a two-
Section 8.4 stage message passing algorithm. In the particular context of the hidden Markov
model, this is known as theforward-backwardalgorithm (Rabiner, 1989), or the
Baum-Welchalgorithm (Baum, 1972). There are in fact several variants of the basic
algorithm, all of which lead to the exact marginals, according to the precise form of

Pattern Recognition and Machine Learning

618 13. SEQUENTIAL DATA

(13.20)

. (13.21)

. (13.23)

13.2.2 The forward-backward algorithm

Get our desktop app

Company

Features

Documentation

Resources