Pattern Recognition and Machine Learning

8.4. Inference in Graphical Models 415

by the lines connecting the nodes. Once we know the most probable value of the fi- nal nodexN, we can then simply follow the link back to find the most probable state of nodexN− 1 and so on back to the initial nodex 1. This corresponds to propagating a message back down the chain using

xmaxn− 1 =φ(xmaxn ) (8.102)

and is known asback-tracking. Note that there could be several values ofxn− 1 all
of which give the maximum value in (8.101). Provided we chose one of these values
when we do the back-tracking, we are assured of a globally consistent maximizing
configuration.
In Figure 8.53, we have indicated two paths, each of which we shall suppose
corresponds to a global maximum of the joint probability distribution. Ifk =2
andk =3each represent possible values ofxmaxN , then starting from either state
and tracing back along the black lines, which corresponds to iterating (8.102), we
obtain a valid global maximum configuration. Note that if we had run a forward
pass of max-sum message passing followed by a backward pass and then applied
(8.98) at each node separately, we could end up selecting some states from one path
and some from the other path, giving an overall configuration that is not a global
maximizer. We see that it is necessary instead to keep track of the maximizing states
during the forward pass using the functionsφ(xn)and then use back-tracking to find
a consistent solution.
The extension to a general tree-structured factor graph should now be clear. If
a message is sent from a factor nodef to a variable nodex, a maximization is
performed over all other variable nodesx 1 ,...,xMthat are neighbours of that fac-
tor node, using (8.93). When we perform this maximization, we keep a record of
which values of the variablesx 1 ,...,xMgave rise to the maximum. Then in the
back-tracking step, having foundxmax, we can then use these stored values to as-
sign consistent maximizing statesxmax 1 ,...,xmaxM. The max-sum algorithm, with
back-tracking, gives an exact maximizing configuration for the variables provided
the factor graph is a tree. An important application of this technique is for finding
the most probable sequence of hidden states in a hidden Markov model, in which
Section 13.2 case it is known as theViterbialgorithm.
As with the sum-product algorithm, the inclusion of evidence in the form of
observed variables is straightforward. The observed variables are clamped to their
observed values, and the maximization is performed over the remaining hidden vari-
ables. This can be shown formally by including identity functions for the observed
variables into the factor functions, as we did for the sum-product algorithm.
It is interesting to compare max-sum with the iterated conditional modes (ICM)
algorithm described on page 389. Each step in ICM is computationally simpler be-
cause the ‘messages’ that are passed from one node to the next comprise a single
value consisting of the new state of the node for which the conditional distribution
is maximized. The max-sum algorithm is more complex because the messages are
functions of node variablesxand hence comprise a set ofKvalues for each pos-
sible state ofx. Unlike max-sum, however, ICM is not guaranteed to find a global
maximum even for tree-structured graphs.

Pattern Recognition and Machine Learning

Get our desktop app

Company

Features

Documentation

Resources