Pattern Recognition and Machine Learning

13.2. Hidden Markov Models 619

the messages that are propagated along the chain (Jordan, 2007). We shall focus on the most widely used of these, known as the alpha-beta algorithm. As well as being of great practical importance in its own right, the forward- backward algorithm provides us with a nice illustration of many of the concepts introduced in earlier chapters. We shall therefore begin in this section with a ‘con- ventional’ derivation of the forward-backward equations, making use of the sum and product rules of probability, and exploiting conditional independence properties which we shall obtain from the corresponding graphical model using d-separation. Then in Section 13.2.3, we shall see how the forward-backward algorithm can be obtained very simply as a specific example of the sum-product algorithm introduced in Section 8.4.4. It is worth emphasizing that evaluation of the posterior distributions of the latent variables is independent of the form of the emission densityp(x|z)or indeed of whether the observed variables are continuous or discrete. All we require is the values of the quantitiesp(xn|zn)for each value ofznfor everyn. Also, in this section and the next we shall omit the explicit dependence on the model parameters θoldbecause these fixed throughout. We therefore begin by writing down the following conditional independence properties (Jordan, 2007)

whereX={x 1 ,...,xN}. These relations are most easily proved using d-separation.
For instance in the first of these results, we note that every path from any one of the
nodesx 1 ,...,xn− 1 to the nodexnpasses through the nodezn, which is observed.
Because all such paths are head-to-tail, it follows that the conditional independence
property must hold. The reader should take a few moments to verify each of these
properties in turn, as an exercise in the application of d-separation. These relations
can also be proved directly, though with significantly greater effort, from the joint
distribution for the hidden Markov model using the sum and product rules of proba-
Exercise 13.10 bility.
Let us begin by evaluatingγ(znk). Recall that for a discrete multinomial ran-
dom variable the expected value of one of its components is just the probability of
that component having the value 1. Thus we are interested in finding the posterior
distributionp(zn|x 1 ,...,xN)ofzngiven the observed data setx 1 ,...,xN. This

Pattern Recognition and Machine Learning

Get our desktop app

Company

Features

Documentation

Resources