Pattern Recognition and Machine Learning

(Jeff_L) #1
620 13. SEQUENTIAL DATA

represents a vector of lengthKwhose entries correspond to the expected values of
znk. Using Bayes’ theorem, we have

γ(zn)=p(zn|X)=

p(X|zn)p(zn)
p(X)

. (13.32)

Note that the denominatorp(X)is implicitly conditioned on the parametersθold
of the HMM and hence represents the likelihood function. Using the conditional
independence property (13.24), together with the product rule of probability, we
obtain

γ(zn)=

p(x 1 ,...,xn,zn)p(xn+1,...,xN|zn)
p(X)

=

α(zn)β(zn)
p(X)

(13.33)

where we have defined
α(zn) ≡ p(x 1 ,...,xn,zn) (13.34)
β(zn) ≡ p(xn+1,...,xN|zn). (13.35)
The quantityα(zn)represents the joint probability of observing all of the given
data up to timenand the value ofzn, whereasβ(zn)represents the conditional
probability of all future data from timen+1up toNgiven the value ofzn. Again,
α(zn)andβ(zn)each represent set ofKnumbers, one for each of the possible
settings of the 1-of-Kcoded binary vectorzn. We shall use the notationα(znk)to
denote the value ofα(zn)whenznk=1, with an analogous interpretation ofβ(znk).
We now derive recursion relations that allowα(zn)andβ(zn)to be evaluated
efficiently. Again, we shall make use of conditional independence properties, in
particular (13.25) and (13.26), together with the sum and product rules, allowing us
to expressα(zn)in terms ofα(zn− 1 )as follows
α(zn)=p(x 1 ,...,xn,zn)
= p(x 1 ,...,xn|zn)p(zn)
= p(xn|zn)p(x 1 ,...,xn− 1 |zn)p(zn)
= p(xn|zn)p(x 1 ,...,xn− 1 ,zn)
= p(xn|zn)


zn− 1

p(x 1 ,...,xn− 1 ,zn− 1 ,zn)

= p(xn|zn)


zn− 1

p(x 1 ,...,xn− 1 ,zn|zn− 1 )p(zn− 1 )

= p(xn|zn)


zn− 1

p(x 1 ,...,xn− 1 |zn− 1 )p(zn|zn− 1 )p(zn− 1 )

= p(xn|zn)


zn− 1

p(x 1 ,...,xn− 1 ,zn− 1 )p(zn|zn− 1 )

Making use of the definition (13.34) forα(zn), we then obtain

α(zn)=p(xn|zn)


zn− 1

α(zn− 1 )p(zn|zn− 1 ). (13.36)
Free download pdf