632 13. SEQUENTIAL DATA
Figure 13.17 Section of an autoregressive hidden
Markov model, in which the distribution
of the observationxndepends on a
subset of the previous observations as
well as on the hidden statezn. In this
example, the distribution ofxndepends
on the two previous observationsxn− 1
andxn− 2.
zn− 1 zn zn+1
xn− 1 xn xn+1
requires that every training sequence be evaluated under each of the models in or-
der to compute the denominator in (13.73). Hidden Markov models, coupled with
discriminative training methods, are widely used in speech recognition (Kapadia,
1998).
A significant weakness of the hidden Markov model is the way in which it rep-
resents the distribution of times for which the system remains in a given state. To see
the problem, note that the probability that a sequence sampled from a given hidden
Markov model will spend preciselyTsteps in statekand then make a transition to a
different state is given by
p(T)=(Akk)T(1−Akk)∝exp (−TlnAkk) (13.74)
and so is an exponentially decaying function ofT. For many applications, this will
be a very unrealistic model of state duration. The problem can be resolved by mod-
elling state duration directly in which the diagonal coefficientsAkkare all set to zero,
and each statekis explicitly associated with a probability distributionp(T|k)of pos-
sible duration times. From a generative point of view, when a statekis entered, a
valueTrepresenting the number of time steps that the system will remain in statek
is then drawn fromp(T|k). The model then emitsTvalues of the observed variable
xt, which are generally assumed to be independent so that the corresponding emis-
sion density is simply
∏T
t=1p(xt|k). This approach requires some straightforward
modifications to the EM optimization procedure (Rabiner, 1989).
Another limitation of the standard HMM is that it is poor at capturing long-
range correlations between the observed variables (i.e., between variables that are
separated by many time steps) because these must be mediated via the first-order
Markov chain of hidden states. Longer-range effects could in principle be included
by adding extra links to the graphical model of Figure 13.5. One way to address this
is to generalize the HMM to give theautoregressive hidden Markov model(Ephraim
et al., 1989), an example of which is shown in Figure 13.17. For discrete observa-
tions, this corresponds to expanded tables of conditional probabilities for the emis-
sion distributions. In the case of a Gaussian emission density, we can use the linear-
Gaussian framework in which the conditional distribution forxngiven the values
of the previous observations, and the value ofzn, is a Gaussian whose mean is a
linear combination of the values of the conditioning variables. Clearly the number
of additional links in the graph must be limited to avoid an excessive the number of
free parameters. In the example shown in Figure 13.17, each observation depends on