Pattern Recognition and Machine Learning

(Jeff_L) #1
644 13. SEQUENTIAL DATA

We have approached parameter learning in the linear dynamical system using
maximum likelihood. Inclusion of priors to give a MAP estimate is straightforward,
and a fully Bayesian treatment can be found by applying the analytical approxima-
tion techniques discussed in Chapter 10, though a detailed treatment is precluded
here due to lack of space.

13.3.3 Extensions of LDS


As with the hidden Markov model, there is considerable interest in extending
the basic linear dynamical system in order to increase its capabilities. Although the
assumption of a linear-Gaussian model leads to efficient algorithms for inference
and learning, it also implies that the marginal distribution of the observed variables
is simply a Gaussian, which represents a significant limitation. One simple extension
of the linear dynamical system is to use a Gaussian mixture as the initial distribution
forz 1. If this mixture hasKcomponents, then the forward recursion equations
(13.85) will lead to a mixture ofKGaussians over each hidden variablezn, and so
the model is again tractable.
For many applications, the Gaussian emission density is a poor approximation.
If instead we try to use a mixture ofKGaussians as the emission density, then the
posterior̂α(z 1 )will also be a mixture ofKGaussians. However, from (13.85) the
posterior̂α(z 2 )will comprise a mixture ofK^2 Gaussians, and so on, witĥα(zn)
being given by a mixture ofKnGaussians. Thus the number of components grows
exponentially with the length of the chain, and so this model is impractical.
More generally, introducing transition or emission models that depart from the
linear-Gaussian (or other exponential family) model leads to an intractable infer-
ence problem. We can make deterministic approximations such as assumed den-
Chapter 10 sity filtering or expectation propagation, or we can make use of sampling methods,
as discussed in Section 13.3.4. One widely used approach is to make a Gaussian
approximation by linearizing around the mean of the predicted distribution, which
gives rise to theextended Kalman filter(Zarchan and Musoff, 2005).
As with hidden Markov models, we can develop interesting extensions of the ba-
sic linear dynamical system by expanding its graphical representation. For example,
theswitching state space model(Ghahramani and Hinton, 1998) can be viewed as
a combination of the hidden Markov model with a set of linear dynamical systems.
The model has multiple Markov chains of continuous linear-Gaussian latent vari-
ables, each of which is analogous to the latent chain of the linear dynamical system
discussed earlier, together with a Markov chain of discrete variables of the form used
in a hidden Markov model. The output at each time step is determined by stochas-
tically choosing one of the continuous latent chains, using the state of the discrete
latent variable as a switch, and then emitting an observation from the corresponding
conditional output distribution. Exact inference in this model is intractable, but vari-
ational methods lead to an efficient inference scheme involving forward-backward
recursions along each of the continuous and discrete Markov chains independently.
Note that, if we consider multiple chains of discrete latent variables, and use one as
the switch to select from the remainder, we obtain an analogous model having only
discrete latent variables known as theswitching hidden Markov model.

Free download pdf