Pattern Recognition and Machine Learning

8.4. Inference in Graphical Models 413

the results (8.66) and (8.69) derived earlier for the sum-product algorithm, we can
readily write down the max-sum algorithm in terms of message passing simply by
replacing ‘sum’ with ‘max’ and replacing products with sums of logarithms to give

μf→x(x) = max x 1 ,...,xM

⎡

⎣lnf(x, x 1 ,...,xM)+

∑

m∈ne(fs)\x

μxm→f(xm)

⎤

⎦(8.93)

μx→f(x)=

∑

l∈ne(x)\f

μfl→x(x). (8.94)

The initial messages sent by the leaf nodes are obtained by analogy with (8.70) and
(8.71) and are given by

μx→f(x)=0 (8.95) μf→x(x)=lnf(x) (8.96)

while at the root node the maximum probability can then be computed, by analogy
with (8.63), using

pmax=max x

⎡

⎣

∑

s∈ne(x)

μfs→x(x)

⎤

⎦. (8.97)

So far, we have seen how to find the maximum of the joint distribution by prop-
agating messages from the leaves to an arbitrarily chosen root node. The result will
be the same irrespective of which node is chosen as the root. Now we turn to the
second problem of finding the configuration of the variables for which the joint dis-
tribution attains this maximum value. So far, we have sent messages from the leaves
to the root. The process of evaluating (8.97) will also give the valuexmaxfor the
most probable value of the root node variable, defined by

xmax=argmax x

⎡

⎣

∑

s∈ne(x)

μfs→x(x)

⎤

⎦. (8.98)

At this point, we might be tempted simply to continue with the message passing al-
gorithm and send messages from the root back out to the leaves, using (8.93) and
(8.94), then apply (8.98) to all of the remaining variable nodes. However, because
we are now maximizing rather than summing, it is possible that there may be mul-
tiple configurations ofxall of which give rise to the maximum value forp(x).In
such cases, this strategy can fail because it is possible for the individual variable
values obtained by maximizing the product of messages at each node to belong to
different maximizing configurations, giving an overall configuration that no longer
corresponds to a maximum.
The problem can be resolved by adopting a rather different kind of message
passing from the root node to the leaves. To see how this works, let us return once
again to the simple chain example ofNvariablesx 1 ,...,xNeach havingKstates,

Pattern Recognition and Machine Learning

Get our desktop app

Company

Features

Documentation

Resources