618 13. SEQUENTIAL DATA
Gaussian emission densities we havep(x|φk)=N(x|μk,Σk), and maximization
of the functionQ(θ,θold)then givesμk =∑Nn=1γ(znk)xn∑Nn=1γ(znk)(13.20)
Σk =∑Nn=1γ(znk)(xn−μk)(xn−μk)T∑Nn=1γ(znk). (13.21)
For the case of discrete multinomial observed variables, the conditional distribution
of the observations takes the formp(x|z)=∏Di=1∏Kk=1μxikizk (13.22)Exercise 13.8 and the corresponding M-step equations are given by
μik=∑Nn=1γ(znk)xni∑Nn=1γ(znk). (13.23)
An analogous result holds for Bernoulli observed variables.
The EM algorithm requires initial values for the parameters of the emission dis-
tribution. One way to set these is first to treat the data initially as i.i.d. and fit the
emission density by maximum likelihood, and then use the resulting values to ini-
tialize the parameters for EM.13.2.2 The forward-backward algorithm
Next we seek an efficient procedure for evaluating the quantitiesγ(znk)and
ξ(zn− 1 ,j,znk), corresponding to the E step of the EM algorithm. The graph for the
hidden Markov model, shown in Figure 13.5, is a tree, and so we know that the
posterior distribution of the latent variables can be obtained efficiently using a two-
Section 8.4 stage message passing algorithm. In the particular context of the hidden Markov
model, this is known as theforward-backwardalgorithm (Rabiner, 1989), or the
Baum-Welchalgorithm (Baum, 1972). There are in fact several variants of the basic
algorithm, all of which lead to the exact marginals, according to the precise form of