Pattern Recognition and Machine Learning

642 13. SEQUENTIAL DATA

13.3.2 Learning in LDS

So far, we have considered the inference problem for linear dynamical systems, assuming that the model parametersθ={A,Γ,C,Σ,μ 0 ,V 0 }are known. Next, we consider the determination of these parameters using maximum likelihood (Ghahra- mani and Hinton, 1996b). Because the model has latent variables, this can be ad- dressed using the EM algorithm, which was discussed in general terms in Chapter 9. We can derive the EM algorithm for the linear dynamical system as follows. Let us denote the estimated parameter values at some particular cycle of the algorithm byθold. For these parameter values, we can run the inference algorithm to determine the posterior distribution of the latent variablesp(Z|X,θold), or more precisely those local posterior marginals that are required in the M step. In particular, we shall require the following expectations

E[zn]=μ̂n (13.105) E

[ znzTn− 1

] = Jn− 1 V̂n+μ̂nμ̂Tn− 1 (13.106) E

[ znzTn

] = V̂n+̂μnμ̂Tn (13.107)

where we have used (13.104). Now we consider the complete-data log likelihood function, which is obtained by taking the logarithm of (13.6) and is therefore given by

lnp(X,Z|θ)=lnp(z 1 |μ 0 ,V 0 )+

∑N

n=2

lnp(zn|zn− 1 ,A,Γ)

+

∑N

n=1

lnp(xn|zn,C,Σ) (13.108)

in which we have made the dependence on the parameters explicit. We now take the expectation of the complete-data log likelihood with respect to the posterior distri- butionp(Z|X,θold)which defines the function

Q(θ,θold)=EZ|θold[lnp(X,Z|θ)]. (13.109)

In the M step, this function is maximized with respect to the components ofθ. Consider first the parametersμ 0 andV 0. If we substitute forp(z 1 |μ 0 ,V 0 )in (13.108) using (13.77), and then take the expectation with respect toZ, we obtain

Q(θ,θold)=−

1

2

ln|V 0 |−EZ|θold

[ 1 2

(z 1 −μ 0 )TV− 01 (z 1 −μ 0 )

] +const

where all terms not dependent onμ 0 orV 0 have been absorbed into the additive
constant. Maximization with respect toμ 0 andV 0 is easily performed by making
use of the maximum likelihood solution for a Gaussian distribution discussed in
Exercise 13.32 Section 2.3.4, giving

Pattern Recognition and Machine Learning

642 13. SEQUENTIAL DATA

13.3.2 Learning in LDS

+

1

2

Get our desktop app

Company

Features

Documentation

Resources