618 13. SEQUENTIAL DATA
Gaussian emission densities we havep(x|φk)=N(x|μk,Σk), and maximization
of the functionQ(θ,θold)then gives
μk =
∑N
n=1
γ(znk)xn
∑N
n=1
γ(znk)
(13.20)
Σk =
∑N
n=1
γ(znk)(xn−μk)(xn−μk)T
∑N
n=1
γ(znk)
. (13.21)
For the case of discrete multinomial observed variables, the conditional distribution
of the observations takes the form
p(x|z)=
∏D
i=1
∏K
k=1
μxikizk (13.22)
Exercise 13.8 and the corresponding M-step equations are given by
μik=
∑N
n=1
γ(znk)xni
∑N
n=1
γ(znk)
. (13.23)
An analogous result holds for Bernoulli observed variables.
The EM algorithm requires initial values for the parameters of the emission dis-
tribution. One way to set these is first to treat the data initially as i.i.d. and fit the
emission density by maximum likelihood, and then use the resulting values to ini-
tialize the parameters for EM.
13.2.2 The forward-backward algorithm
Next we seek an efficient procedure for evaluating the quantitiesγ(znk)and
ξ(zn− 1 ,j,znk), corresponding to the E step of the EM algorithm. The graph for the
hidden Markov model, shown in Figure 13.5, is a tree, and so we know that the
posterior distribution of the latent variables can be obtained efficiently using a two-
Section 8.4 stage message passing algorithm. In the particular context of the hidden Markov
model, this is known as theforward-backwardalgorithm (Rabiner, 1989), or the
Baum-Welchalgorithm (Baum, 1972). There are in fact several variants of the basic
algorithm, all of which lead to the exact marginals, according to the precise form of