Pattern Recognition and Machine Learning

(Jeff_L) #1
418 8. GRAPHICAL MODELS

other messages would simply duplicate the previous message on the same link. For
graphs that have a tree structure, any schedule that sends only pending messages
will eventually terminate once a message has passed in each direction across every
Exercise 8.29 link. At this point, there are no pending messages, and the product of the received
messages at every variable give the exact marginal. In graphs having loops, however,
the algorithm may never terminate because there might always be pending messages,
although in practice it is generally found to converge within a reasonable time for
most applications. Once the algorithm has converged, or once it has been stopped
if convergence is not observed, the (approximate) local marginals can be computed
using the product of the most recently received incoming messages to each variable
node or factor node on every link.
In some applications, the loopy belief propagation algorithm can give poor re-
sults, whereas in other applications it has proven to be very effective. In particular,
state-of-the-art algorithms for decoding certain kinds of error-correcting codes are
equivalent to loopy belief propagation (Gallager, 1963; Berrouet al., 1993; McEliece
et al., 1998; MacKay and Neal, 1999; Frey, 1998).


8.4.8 Learning the graph structure


In our discussion of inference in graphical models, we have assumed that the
structure of the graph is known and fixed. However, there is also interest in go-
ing beyond the inference problem and learning the graph structure itself from data
(Friedman and Koller, 2003). This requires that we define a space of possible struc-
tures as well as a measure that can be used to score each structure.
From a Bayesian viewpoint, we would ideally like to compute a posterior dis-
tribution over graph structures and to make predictions by averaging with respect
to this distribution. If we have a priorp(m)over graphs indexed bym, then the
posterior distribution is given by
p(m|D)∝p(m)p(D|m) (8.103)
whereDis the observed data set. The model evidencep(D|m)then provides the
score for each model. However, evaluation of the evidence involves marginalization
over the latent variables and presents a challenging computational problem for many
models.
Exploring the space of structures can also be problematic. Because the number
of different graph structures grows exponentially with the number of nodes, it is
often necessary to resort to heuristics to find good candidates.

Exercises


8.1 ( )www By marginalizing out the variables in order, show that the representation
(8.5) for the joint distribution of a directed graph is correctly normalized, provided
each of the conditional distributions is normalized.
8.2 ( ) www Show that the property of there being no directed cycles in a directed
graph follows from the statement that there exists an ordered numbering of the nodes
such that for each node there are no links going to a lower-numbered node.
Free download pdf