Pattern Recognition and Machine Learning

(Jeff_L) #1
414 8. GRAPHICAL MODELS

Figure 8.53 A lattice, or trellis, diagram show-
ing explicitly theKpossible states (one per row
of the diagram) for each of the variablesxnin the
chain model. In this illustrationK=3. The ar-
row shows the direction of message passing in the
max-product algorithm. For every statekof each
variablexn(corresponding to columnnof the dia-
gram) the functionφ(xn)defines a unique state at
the previous variable, indicated by the black lines.
The two paths through the lattice correspond to
configurations that give the global maximum of the
joint probability distribution, and either of these
can be found by tracing back along the black lines
in the opposite direction to the arrow.


k=1

k=2

k=3
n− 2 n− 1 nn+1

corresponding to the graph shown in Figure 8.38. Suppose we take nodexNto be
the root node. Then in the first phase, we propagate messages from the leaf nodex 1
to the root node using

μxn→fn,n+1(xn)=μfn− 1 ,n→xn(xn)
μfn− 1 ,n→xn(xn) = max
xn− 1

[
lnfn− 1 ,n(xn− 1 ,xn)+μxn− 1 →fn− 1 ,n(xn)

]

which follow from applying (8.94) and (8.93) to this particular graph. The initial
message sent from the leaf node is simply

μx 1 →f 1 , 2 (x 1 )=0. (8.99)

The most probable value forxNis then given by

xmaxN =argmax
xN

[
μfN− 1 ,N→xN(xN)

]

. (8.100)


Now we need to determine the states of the previous variables that correspond to the
same maximizing configuration. This can be done by keeping track of which values
of the variables gave rise to the maximum state of each variable, in other words by
storing quantities given by

φ(xn) = arg max
xn− 1

[
lnfn− 1 ,n(xn− 1 ,xn)+μxn− 1 →fn− 1 ,n(xn)

]

. (8.101)


To understand better what is happening, it is helpful to represent the chain of vari-
ables in terms of alatticeortrellisdiagram as shown in Figure 8.53. Note that this
is not a probabilistic graphical model because the nodes represent individual states
of variables, while each variable corresponds to a column of such states in the di-
agram. For each state of a given variable, there is a unique state of the previous
variable that maximizes the probability (ties are broken either systematically or at
random), corresponding to the functionφ(xn)given by (8.101), and this is indicated
Free download pdf