Pattern Recognition and Machine Learning

(Jeff_L) #1
622 13. SEQUENTIAL DATA

Figure 13.13 Illustration of the backward recursion
(13.38) for evaluation of theβvariables. In
this fragment of the lattice, we see that the
quantityβ(zn 1 )is obtained by taking the
componentsβ(zn+1,k)ofβ(zn+1)at step
n+1and summing them up with weights
given by the products ofA 1 k, correspond-
ing to the values ofp(zn+1|zn)and the cor-
responding values of the emission density
p(xn|zn+1,k).

k=1

k=2

k=3
nn+1

β(zn, 1 ) β(zn+1, 1 )

β(zn+1, 2 )

β(zn+1, 3 )

A 11

A 12

A 13

p(xn|zn+1, 1 )

p(xn|zn+1, 2 )

p(xn|zn+1, 3 )

Making use of the definition (13.35) forβ(zn), we then obtain

β(zn)=


zn+1

β(zn+1)p(xn+1|zn+1)p(zn+1|zn). (13.38)

Note that in this case we have a backward message passing algorithm that evaluates
β(zn)in terms ofβ(zn+1). At each step, we absorb the effect of observationxn+1
through the emission probabilityp(xn+1|zn+1), multiply by the transition matrix
p(zn+1|zn), and then marginalize outzn+1. This is illustrated in Figure 13.13.
Again we need a starting condition for the recursion, namely a value forβ(zN).
This can be obtained by settingn =Nin (13.33) and replacingα(zN)with its
definition (13.34) to give

p(zN|X)=

p(X,zN)β(zN)
p(X)

(13.39)

which we see will be correct provided we takeβ(zN)=1for all settings ofzN.
In the M step equations, the quantityp(X)will cancel out, as can be seen, for
instance, in the M-step equation forμkgiven by (13.20), which takes the form

μk=

∑n

n=1

γ(znk)xn

∑n

n=1

γ(znk)

=

∑n

n=1

α(znk)β(znk)xn

∑n

n=1

α(znk)β(znk)

. (13.40)

However, the quantityp(X)represents the likelihood function whose value we typ-
ically wish to monitor during the EM optimization, and so it is useful to be able to
evaluate it. If we sum both sides of (13.33) overzn, and use the fact that the left-hand
side is a normalized distribution, we obtain

p(X)=


zn

α(zn)β(zn). (13.41)
Free download pdf