13.3. Linear Dynamical Systems 635
and so we can transform the model into an equivalent standard HMM having a single
chain of latent variables each of which hasKMlatent states. We can then run the
standard forward-backward recursions in the E step. This has computational com-
plexityO(NK^2 M)that is exponential in the numberMof latent chains and so will
be intractable for anything other than small values ofM. One solution would be
to use sampling methods (discussed in Chapter 11). As an elegant deterministic al-
Section 10.1 ternative, Ghahramani and Jordan (1997) exploited variational inference techniques
to obtain a tractable algorithm for approximate inference. This can be done using
a simple variational posterior distribution that is fully factorized with respect to the
latent variables, or alternatively by using a more powerful approach in which the
variational distribution is described by independent Markov chains corresponding to
the chains of latent variables in the original model. In the latter case, the variational
inference algorithms involves running independent forward and backward recursions
along each chain, which is computationally efficient and yet is also able to capture
correlations between variables within the same chain.
Clearly, there are many possible probabilistic structures that can be constructed
according to the needs of particular applications. Graphical models provide a general
technique for motivating, describing, and analysing such structures, and variational
methods provide a powerful framework for performing inference in those models for
which exact solution is intractable.
13.3 Linear Dynamical Systems
In order to motivate the concept of linear dynamical systems, let us consider the
following simple problem, which often arises in practical settings. Suppose we wish
to measure the value of an unknown quantityzusing a noisy sensor that returns a
observationxrepresenting the value ofzplus zero-mean Gaussian noise. Given a
single measurement, our best guess forzis to assume thatz=x. However, we
can improve our estimate forzby taking lots of measurements and averaging them,
because the random noise terms will tend to cancel each other. Now let’s make the
situation more complicated by assuming that we wish to measure a quantityzthat
is changing over time. We can take regular measurements ofxso that at some point
in time we have obtainedx 1 ,...,xNand we wish to find the corresponding values
z 1 ,...,xN. If we simply average the measurements, the error due to random noise
will be reduced, but unfortunately we will just obtain a single averaged estimate, in
which we have averaged over the changing value ofz, thereby introducing a new
source of error.
Intuitively, we could imagine doing a bit better as follows. To estimate the value
ofzN, we take only the most recent few measurements, sayxN−L,...,xNand just
average these. Ifzis changing slowly, and the random noise level in the sensor is
high, it would make sense to choose a relatively long window of observations to
average. Conversely, if the signal is changing quickly, and the noise levels are small,
we might be better just to usexNdirectly as our estimate ofzN. Perhaps we could
do even better if we take a weighted average, in which more recent measurements