Pattern Recognition and Machine Learning

620 13. SEQUENTIAL DATA

represents a vector of lengthKwhose entries correspond to the expected values of znk. Using Bayes’ theorem, we have

γ(zn)=p(zn|X)=

p(X|zn)p(zn) p(X)

. (13.32)

Note that the denominatorp(X)is implicitly conditioned on the parametersθold of the HMM and hence represents the likelihood function. Using the conditional independence property (13.24), together with the product rule of probability, we obtain

γ(zn)=

p(x 1 ,...,xn,zn)p(xn+1,...,xN|zn) p(X)

=

α(zn)β(zn) p(X)

(13.33)

where we have defined α(zn) ≡ p(x 1 ,...,xn,zn) (13.34) β(zn) ≡ p(xn+1,...,xN|zn). (13.35) The quantityα(zn)represents the joint probability of observing all of the given data up to timenand the value ofzn, whereasβ(zn)represents the conditional probability of all future data from timen+1up toNgiven the value ofzn. Again, α(zn)andβ(zn)each represent set ofKnumbers, one for each of the possible settings of the 1-of-Kcoded binary vectorzn. We shall use the notationα(znk)to denote the value ofα(zn)whenznk=1, with an analogous interpretation ofβ(znk). We now derive recursion relations that allowα(zn)andβ(zn)to be evaluated efficiently. Again, we shall make use of conditional independence properties, in particular (13.25) and (13.26), together with the sum and product rules, allowing us to expressα(zn)in terms ofα(zn− 1 )as follows α(zn)=p(x 1 ,...,xn,zn) = p(x 1 ,...,xn|zn)p(zn) = p(xn|zn)p(x 1 ,...,xn− 1 |zn)p(zn) = p(xn|zn)p(x 1 ,...,xn− 1 ,zn) = p(xn|zn)

∑

zn− 1

p(x 1 ,...,xn− 1 ,zn− 1 ,zn)

= p(xn|zn)

∑

zn− 1

p(x 1 ,...,xn− 1 ,zn|zn− 1 )p(zn− 1 )

= p(xn|zn)

∑

zn− 1

p(x 1 ,...,xn− 1 |zn− 1 )p(zn|zn− 1 )p(zn− 1 )

= p(xn|zn)

∑

zn− 1

p(x 1 ,...,xn− 1 ,zn− 1 )p(zn|zn− 1 )

Making use of the definition (13.34) forα(zn), we then obtain

α(zn)=p(xn|zn)

∑

zn− 1

α(zn− 1 )p(zn|zn− 1 ). (13.36)

Pattern Recognition and Machine Learning

620 13. SEQUENTIAL DATA

. (13.32)

=

(13.33)

Get our desktop app

Company

Features

Documentation

Resources