13.2. Hidden Markov Models 629
Finally, we note that there is an alternative formulation of the forward-backward
algorithm (Jordan, 2007) in which the backward pass is defined by a recursion based
the quantitiesγ(zn)=α̂(zn)̂β(zn)instead of usinĝβ(zn). Thisα–γrecursion
requires that the forward pass be completed first so that all the quantitieŝα(zn)
are available for the backward pass, whereas the forward and backward passes of
theα–βalgorithm can be done independently. Although these two algorithms have
comparable computational cost, theα–βversion is the most commonly encountered
Section 13.3 one in the case of hidden Markov models, whereas for linear dynamical systems a
recursion analogous to theα–γform is more usual.
13.2.5 The Viterbi algorithm
In many applications of hidden Markov models, the latent variables have some
meaningful interpretation, and so it is often of interest to find the most probable
sequence of hidden states for a given observation sequence. For instance in speech
recognition, we might wish to find the most probable phoneme sequence for a given
series of acoustic observations. Because the graph for the hidden Markov model is
a directed tree, this problem can be solved exactly using the max-sum algorithm.
We recall from our discussion in Section 8.4.5 that the problem of finding the most
probable sequence of latent states is not the same as that of finding the set of states
that are individually the most probable. The latter problem can be solved by first
running the forward-backward (sum-product) algorithm to find the latent variable
marginalsγ(zn)and then maximizing each of these individually (Dudaet al., 2001).
However, the set of such states will not, in general, correspond to the most probable
sequence of states. In fact, this set of states might even represent a sequence having
zero probability, if it so happens that two successive states, which in isolation are
individually the most probable, are such that the transition matrix element connecting
them is zero.
In practice, we are usually interested in finding the most probablesequenceof
states, and this can be solved efficiently using the max-sum algorithm, which in the
context of hidden Markov models is known as theViterbialgorithm (Viterbi, 1967).
Note that the max-sum algorithm works with log probabilities and so there is no
need to use re-scaled variables as was done with the forward-backward algorithm.
Figure 13.16 shows a fragment of the hidden Markov model expanded as lattice
diagram. As we have already noted, the number of possible paths through the lattice
grows exponentially with the length of the chain. The Viterbi algorithm searches this
space of paths efficiently to find the most probable path with a computational cost
that grows only linearly with the length of the chain.
As with the sum-product algorithm, we first represent the hidden Markov model
as a factor graph, as shown in Figure 13.15. Again, we treat the variable nodezN
as the root, and pass messages to the root starting with the leaf nodes. Using the
results (8.93) and (8.94), we see that the messages passed in the max-sum algorithm
are given by
μzn→fn+1(zn)=μfn→zn(zn) (13.66)
μfn+1→zn+1(zn+1) = max
zn
{
lnfn+1(zn,zn+1)+μzn→fn+1(zn)
}
.(13.67)