Pattern Recognition and Machine Learning

13.3. Linear Dynamical Systems 635

and so we can transform the model into an equivalent standard HMM having a single
chain of latent variables each of which hasKMlatent states. We can then run the
standard forward-backward recursions in the E step. This has computational com-
plexityO(NK^2 M)that is exponential in the numberMof latent chains and so will
be intractable for anything other than small values ofM. One solution would be
to use sampling methods (discussed in Chapter 11). As an elegant deterministic al-
Section 10.1 ternative, Ghahramani and Jordan (1997) exploited variational inference techniques
to obtain a tractable algorithm for approximate inference. This can be done using
a simple variational posterior distribution that is fully factorized with respect to the
latent variables, or alternatively by using a more powerful approach in which the
variational distribution is described by independent Markov chains corresponding to
the chains of latent variables in the original model. In the latter case, the variational
inference algorithms involves running independent forward and backward recursions
along each chain, which is computationally efficient and yet is also able to capture
correlations between variables within the same chain.
Clearly, there are many possible probabilistic structures that can be constructed
according to the needs of particular applications. Graphical models provide a general
technique for motivating, describing, and analysing such structures, and variational
methods provide a powerful framework for performing inference in those models for
which exact solution is intractable.

13.3 Linear Dynamical Systems

In order to motivate the concept of linear dynamical systems, let us consider the following simple problem, which often arises in practical settings. Suppose we wish to measure the value of an unknown quantityzusing a noisy sensor that returns a observationxrepresenting the value ofzplus zero-mean Gaussian noise. Given a single measurement, our best guess forzis to assume thatz=x. However, we can improve our estimate forzby taking lots of measurements and averaging them, because the random noise terms will tend to cancel each other. Now let’s make the situation more complicated by assuming that we wish to measure a quantityzthat is changing over time. We can take regular measurements ofxso that at some point in time we have obtainedx 1 ,...,xNand we wish to find the corresponding values z 1 ,...,xN. If we simply average the measurements, the error due to random noise will be reduced, but unfortunately we will just obtain a single averaged estimate, in which we have averaged over the changing value ofz, thereby introducing a new source of error. Intuitively, we could imagine doing a bit better as follows. To estimate the value ofzN, we take only the most recent few measurements, sayxN−L,...,xNand just average these. Ifzis changing slowly, and the random noise level in the sensor is high, it would make sense to choose a relatively long window of observations to average. Conversely, if the signal is changing quickly, and the noise levels are small, we might be better just to usexNdirectly as our estimate ofzN. Perhaps we could do even better if we take a weighted average, in which more recent measurements

Pattern Recognition and Machine Learning

13.3 Linear Dynamical Systems

Get our desktop app

Company

Features

Documentation

Resources