Pattern Recognition and Machine Learning

(Jeff_L) #1
648 13. SEQUENTIAL DATA

13.5 ( ) Verify the M-step equations (13.18) and (13.19) for the initial state probabili-
ties and transition probability parameters of the hidden Markov model by maximiza-
tion of the expected complete-data log likelihood function (13.17), using appropriate
Lagrange multipliers to enforce the summation constraints on the components ofπ
andA.

13.6 ( ) Show that if any elements of the parametersπorAfor a hidden Markov
model are initially set to zero, then those elements will remain zero in all subsequent
updates of the EM algorithm.

13.7 ( ) Consider a hidden Markov model with Gaussian emission densities. Show that
maximization of the functionQ(θ,θold)with respect to the mean and covariance
parameters of the Gaussians gives rise to the M-step equations (13.20) and (13.21).

13.8 ( ) www For a hidden Markov model having discrete observations governed by
a multinomial distribution, show that the conditional distribution of the observations
given the hidden variables is given by (13.22) and the corresponding M step equa-
tions are given by (13.23). Write down the analogous equations for the conditional
distribution and the M step equations for the case of a hidden Markov with multiple
binary output variables each of which is governed by a Bernoulli conditional dis-
tribution. Hint: refer to Sections 2.1 and 2.2 for a discussion of the corresponding
maximum likelihood solutions for i.i.d. data if required.

13.9 ( ) www Use the d-separation criterion to verify that the conditional indepen-
dence properties (13.24)–(13.31) are satisfied by the joint distribution for the hidden
Markov model defined by (13.6).

13.10 ( ) By applying the sum and product rules of probability, verify that the condi-
tional independence properties (13.24)–(13.31) are satisfied by the joint distribution
for the hidden Markov model defined by (13.6).

13.11 ( ) Starting from the expression (8.72) for the marginal distribution over the vari-
ables of a factor in a factor graph, together with the results for the messages in the
sum-product algorithm obtained in Section 13.2.3, derive the result (13.43) for the
joint posterior distribution over two successive latent variables in a hidden Markov
model.

13.12 ( ) Suppose we wish to train a hidden Markov model by maximum likelihood
using data that comprisesRindependent sequences of observations, which we de-
note byX(r)wherer=1,...,R. Show that in the E step of the EM algorithm,
we simply evaluate posterior probabilities for the latent variables by running theα
andβrecursions independently for each of the sequences. Also show that in the
M step, the initial probability and transition probability parameters are re-estimated
Free download pdf