Pattern Recognition and Machine Learning

648 13. SEQUENTIAL DATA

13.5 ( ) Verify the M-step equations (13.18) and (13.19) for the initial state probabilities and transition probability parameters of the hidden Markov model by maximization of the expected complete-data log likelihood function (13.17), using appropriate Lagrange multipliers to enforce the summation constraints on the components ofπ andA.

13.6 ( ) Show that if any elements of the parametersπorAfor a hidden Markov model are initially set to zero, then those elements will remain zero in all subsequent updates of the EM algorithm.

13.7 ( ) Consider a hidden Markov model with Gaussian emission densities. Show that maximization of the functionQ(θ,θold)with respect to the mean and covariance parameters of the Gaussians gives rise to the M-step equations (13.20) and (13.21).

13.8 ( ) www For a hidden Markov model having discrete observations governed by a multinomial distribution, show that the conditional distribution of the observations given the hidden variables is given by (13.22) and the corresponding M step equations are given by (13.23). Write down the analogous equations for the conditional distribution and the M step equations for the case of a hidden Markov with multiple binary output variables each of which is governed by a Bernoulli conditional distribution. Hint: refer to Sections 2.1 and 2.2 for a discussion of the corresponding maximum likelihood solutions for i.i.d. data if required.

13.9 ( ) www Use the d-separation criterion to verify that the conditional independence properties (13.24)–(13.31) are satisfied by the joint distribution for the hidden Markov model defined by (13.6).

13.10 ( ) By applying the sum and product rules of probability, verify that the conditional independence properties (13.24)–(13.31) are satisfied by the joint distribution for the hidden Markov model defined by (13.6).

13.11 ( ) Starting from the expression (8.72) for the marginal distribution over the variables of a factor in a factor graph, together with the results for the messages in the sum-product algorithm obtained in Section 13.2.3, derive the result (13.43) for the joint posterior distribution over two successive latent variables in a hidden Markov model.

13.12 ( ) Suppose we wish to train a hidden Markov model by maximum likelihood using data that comprisesRindependent sequences of observations, which we de- note byX(r)wherer=1,...,R. Show that in the E step of the EM algorithm, we simply evaluate posterior probabilities for the latent variables by running theα andβrecursions independently for each of the sequences. Also show that in the M step, the initial probability and transition probability parameters are re-estimated

Pattern Recognition and Machine Learning

648 13. SEQUENTIAL DATA

Get our desktop app

Company

Features

Documentation

Resources