Pattern Recognition and Machine Learning

(Jeff_L) #1
624 13. SEQUENTIAL DATA

This completes the E step, and we use the results to find a revised set of parameters
θnewusing the M-step equations from Section 13.2.1. We then continue to alternate
between E and M steps until some convergence criterion is satisfied, for instance
when the change in the likelihood function is below some threshold.
Note that in these recursion relations the observations enter through conditional
distributions of the formp(xn|zn). The recursions are therefore independent of
the type or dimensionality of the observed variables or the form of this conditional
distribution, so long as its value can be computed for each of theKpossible states
ofzn. Since the observed variables{xn}are fixed, the quantitiesp(xn|zn)can be
pre-computed as functions ofznat the start of the EM algorithm, and remain fixed
throughout.
We have seen in earlier chapters that the maximum likelihood approach is most
effective when the number of data points is large in relation to the number of parame-
ters. Here we note that a hidden Markov model can be trained effectively, using max-
imum likelihood, provided the training sequence is sufficiently long. Alternatively,
we can make use of multiple shorter sequences, which requires a straightforward
Exercise 13.12 modification of the hidden Markov model EM algorithm. In the case of left-to-right
models, this is particularly important because, in a given observation sequence, a
given state transition corresponding to a nondiagonal element ofAwill seen at most
once.
Another quantity of interest is the predictive distribution, in which the observed
data isX={x 1 ,...,xN}and we wish to predictxN+1, which would be important
for real-time applications such as financial forecasting. Again we make use of the
sum and product rules together with the conditional independence properties (13.29)
and (13.31) giving


p(xN+1|X)=


zN+1

p(xN+1,zN+1|X)

=


zN+1

p(xN+1|zN+1)p(zN+1|X)

=


zN+1

p(xN+1|zN+1)


zN

p(zN+1,zN|X)

=


zN+1

p(xN+1|zN+1)


zN

p(zN+1|zN)p(zN|X)

=


zN+1

p(xN+1|zN+1)


zN

p(zN+1|zN)

p(zN,X)
p(X)

=

1

p(X)


zN+1

p(xN+1|zN+1)


zN

p(zN+1|zN)α(zN) (13.44)

which can be evaluated by first running a forwardαrecursion and then computing
the final summations overzNandzN+1. The result of the first summation overzN
can be stored and used once the value ofxN+1is observed in order to run theα
recursion forward to the next step in order to predict the subsequent valuexN+2.
Free download pdf