Pattern Recognition and Machine Learning

(Jeff_L) #1
642 13. SEQUENTIAL DATA

13.3.2 Learning in LDS


So far, we have considered the inference problem for linear dynamical systems,
assuming that the model parametersθ={A,Γ,C,Σ,μ 0 ,V 0 }are known. Next, we
consider the determination of these parameters using maximum likelihood (Ghahra-
mani and Hinton, 1996b). Because the model has latent variables, this can be ad-
dressed using the EM algorithm, which was discussed in general terms in Chapter 9.
We can derive the EM algorithm for the linear dynamical system as follows. Let
us denote the estimated parameter values at some particular cycle of the algorithm
byθold. For these parameter values, we can run the inference algorithm to determine
the posterior distribution of the latent variablesp(Z|X,θold), or more precisely those
local posterior marginals that are required in the M step. In particular, we shall
require the following expectations

E[zn]=μ̂n (13.105)
E

[
znzTn− 1

]
= Jn− 1 V̂n+μ̂nμ̂Tn− 1 (13.106)
E

[
znzTn

]
= V̂n+̂μnμ̂Tn (13.107)

where we have used (13.104).
Now we consider the complete-data log likelihood function, which is obtained
by taking the logarithm of (13.6) and is therefore given by

lnp(X,Z|θ)=lnp(z 1 |μ 0 ,V 0 )+

∑N

n=2

lnp(zn|zn− 1 ,A,Γ)

+

∑N

n=1

lnp(xn|zn,C,Σ) (13.108)

in which we have made the dependence on the parameters explicit. We now take the
expectation of the complete-data log likelihood with respect to the posterior distri-
butionp(Z|X,θold)which defines the function

Q(θ,θold)=EZ|θold[lnp(X,Z|θ)]. (13.109)

In the M step, this function is maximized with respect to the components ofθ.
Consider first the parametersμ 0 andV 0. If we substitute forp(z 1 |μ 0 ,V 0 )in
(13.108) using (13.77), and then take the expectation with respect toZ, we obtain

Q(θ,θold)=−

1

2

ln|V 0 |−EZ|θold

[
1
2

(z 1 −μ 0 )TV− 01 (z 1 −μ 0 )

]
+const

where all terms not dependent onμ 0 orV 0 have been absorbed into the additive
constant. Maximization with respect toμ 0 andV 0 is easily performed by making
use of the maximum likelihood solution for a Gaussian distribution discussed in
Exercise 13.32 Section 2.3.4, giving

Free download pdf