612 13. SEQUENTIAL DATA
Figure 13.7 If we unfold the state transition dia-
gram of Figure 13.6 over time, we obtain a lattice,
or trellis, representation of the latent states. Each
column of this diagram corresponds to one of the
latent variableszn.
k=1
k=2
k=3
n− 2 n− 1 nn+1
A 11 A 11 A 11
A 33 A 33 A 33
We can represent the emission probabilities in the form
p(xn|zn,φ)=
∏K
k=1
p(xn|φk)znk. (13.9)
We shall focuss attention onhomogeneousmodels for which all of the condi-
tional distributions governing the latent variables share the same parametersA, and
similarly all of the emission distributions share the same parametersφ(the extension
to more general cases is straightforward). Note that a mixture model for an i.i.d. data
set corresponds to the special case in which the parametersAjkare the same for all
values ofj, so that the conditional distributionp(zn|zn− 1 )is independent ofzn− 1.
This corresponds to deleting the horizontal links in the graphical model shown in
Figure 13.5.
The joint probability distribution over both latent and observed variables is then
given by
p(X,Z|θ)=p(z 1 |π)
[N
∏
n=2
p(zn|zn− 1 ,A)
] N
∏
m=1
p(xm|zm,φ) (13.10)
whereX={x 1 ,...,xN},Z={z 1 ,...,zN}, andθ={π,A,φ}denotes the set
of parameters governing the model. Most of our discussion of the hidden Markov
model will be independent of the particular choice of the emission probabilities.
Indeed, the model is tractable for a wide range of emission distributions including
discrete tables, Gaussians, and mixtures of Gaussians. It is also possible to exploit
Exercise 13.4 discriminative models such as neural networks. These can be used to model the
emission densityp(x|z)directly, or to provide a representation forp(z|x)that can
be converted into the required emission densityp(x|z)using Bayes’ theorem (Bishop
et al., 2004).
We can gain a better understanding of the hidden Markov model by considering
it from a generative point of view. Recall that to generate samples from a mixture of