Pattern Recognition and Machine Learning

418 8. GRAPHICAL MODELS

other messages would simply duplicate the previous message on the same link. For
graphs that have a tree structure, any schedule that sends only pending messages
will eventually terminate once a message has passed in each direction across every
Exercise 8.29 link. At this point, there are no pending messages, and the product of the received
messages at every variable give the exact marginal. In graphs having loops, however,
the algorithm may never terminate because there might always be pending messages,
although in practice it is generally found to converge within a reasonable time for
most applications. Once the algorithm has converged, or once it has been stopped
if convergence is not observed, the (approximate) local marginals can be computed
using the product of the most recently received incoming messages to each variable
node or factor node on every link.
In some applications, the loopy belief propagation algorithm can give poor re-
sults, whereas in other applications it has proven to be very effective. In particular,
state-of-the-art algorithms for decoding certain kinds of error-correcting codes are
equivalent to loopy belief propagation (Gallager, 1963; Berrouet al., 1993; McEliece
et al., 1998; MacKay and Neal, 1999; Frey, 1998).

8.4.8 Learning the graph structure

In our discussion of inference in graphical models, we have assumed that the structure of the graph is known and fixed. However, there is also interest in going beyond the inference problem and learning the graph structure itself from data (Friedman and Koller, 2003). This requires that we define a space of possible structures as well as a measure that can be used to score each structure. From a Bayesian viewpoint, we would ideally like to compute a posterior distribution over graph structures and to make predictions by averaging with respect to this distribution. If we have a priorp(m)over graphs indexed bym, then the posterior distribution is given by p(m|D)∝p(m)p(D|m) (8.103) whereDis the observed data set. The model evidencep(D|m)then provides the score for each model. However, evaluation of the evidence involves marginalization over the latent variables and presents a challenging computational problem for many models. Exploring the space of structures can also be problematic. Because the number of different graph structures grows exponentially with the number of nodes, it is often necessary to resort to heuristics to find good candidates.

Exercises

8.1 ( )www By marginalizing out the variables in order, show that the representation (8.5) for the joint distribution of a directed graph is correctly normalized, provided each of the conditional distributions is normalized. 8.2 ( ) www Show that the property of there being no directed cycles in a directed graph follows from the statement that there exists an ordered numbering of the nodes such that for each node there are no links going to a lower-numbered node.

Pattern Recognition and Machine Learning

418 8. GRAPHICAL MODELS

8.4.8 Learning the graph structure

Exercises

Get our desktop app

Company

Features

Documentation

Resources