Pattern Recognition and Machine Learning

630 13. SEQUENTIAL DATA

Figure 13.16 A fragment of the HMM lattice
showing two possible paths. The Viterbi algorithm
efficiently determines the most probable path from
amongst the exponentially many possibilities. For
any given path, the corresponding probability is
given by the product of the elements of the tran-
sition matrixAjk, corresponding to the probabil-
itiesp(zn+1|zn)for each segment of the path,
along with the emission densitiesp(xn|k)asso-
ciated with each node on the path.

k=1

k=2

k=3 n− 2 n− 1 nn+1

If we eliminateμzn→fn+1(zn)between these two equations, and make use of (13.46), we obtain a recursion for thef→zmessages of the form

ω(zn+1)=lnp(xn+1|zn+1) + max zn {lnp(x+1|zn)+ω(zn)} (13.68)

where we have introduced the notationω(zn)≡μfn→zn(zn). From (8.95) and (8.96), these messages are initialized using

ω(z 1 )=lnp(z 1 )+lnp(x 1 |z 1 ). (13.69)

where we have used (13.45). Note that to keep the notation uncluttered, we omit
the dependence on the model parametersθthat are held fixed when finding the most
probable sequence.
The Viterbi algorithm can also be derived directly from the definition (13.6) of
the joint distribution by taking the logarithm and then exchanging maximizations
Exercise 13.16 and summations. It is easily seen that the quantitiesω(zn)have the probabilistic
interpretation
ω(zn)= max
z 1 ,...,zn− 1
p(x 1 ,...,xn,z 1 ,...,zn). (13.70)

Once we have completed the final maximization overzN, we will obtain the value of the joint distributionp(X,Z)corresponding to the most probable path. We also wish to find the sequence of latent variable values that corresponds to this path. To do this, we simply make use of the back-tracking procedure discussed in Sec- tion 8.4.5. Specifically, we note that the maximization overznmust be performed for each of theKpossible values ofzn+1. Suppose we keep a record of the values ofznthat correspond to the maxima for each value of theKvalues ofzn+1. Let us denote this function byψ(kn)wherek∈{ 1 ,...,K}. Once we have passed messages to the end of the chain and found the most probable state ofzN, we can then use this function to backtrack along the chain by applying it recursively

kmaxn =ψ(kmaxn+1). (13.71)

Pattern Recognition and Machine Learning

630 13. SEQUENTIAL DATA

Get our desktop app

Company

Features

Documentation

Resources