Pattern Recognition and Machine Learning

3.5. The Evidence Approximation 169

Multiplying through by 2 αand rearranging, we obtain

αmTNmN=M−α

∑

i

1

λi+α

=γ. (3.90)

Since there areMterms in the sum overi, the quantityγcan be written

γ=

∑

i

λi α+λi

. (3.91)

The interpretation of the quantityγwill be discussed shortly. From (3.90) we see
Exercise 3.20 that the value ofαthat maximizes the marginal likelihood satisfies

α=

γ mTNmN

. (3.92)

Note that this is an implicit solution forαnot only becauseγdepends onα, but also because the modemNof the posterior distribution itself depends on the choice of α. We therefore adopt an iterative procedure in which we make an initial choice for αand use this to findmN, which is given by (3.53), and also to evaluateγ, which is given by (3.91). These values are then used to re-estimateαusing (3.92), and the process repeated until convergence. Note that because the matrixΦTΦis fixed, we can compute its eigenvalues once at the start and then simply multiply these byβto obtain theλi. It should be emphasized that the value ofαhas been determined purely by look- ing at the training data. In contrast to maximum likelihood methods, no independent data set is required in order to optimize the model complexity. We can similarly maximize the log marginal likelihood (3.86) with respect toβ. To do this, we note that the eigenvaluesλidefined by (3.87) are proportional toβ, and hencedλi/dβ=λi/βgiving

d dβ

ln|A|=

d dβ

∑

i

ln(λi+α)=

1

β

∑

i

λi λi+α

=

γ β

. (3.93)

The stationary point of the marginal likelihood therefore satisfies

0=

N

2 β

−

1

2

∑N

n=1

{ tn−mTNφ(xn)

} 2 −

γ 2 β

(3.94)

Exercise 3.22 and rearranging we obtain

1 β

=

1

N−γ

∑N

n=1

{ tn−mTNφ(xn)

} 2

. (3.95)

Again, this is an implicit solution forβand can be solved by choosing an initial value forβand then using this to calculatemNandγand then re-estimateβusing (3.95), repeating until convergence. If bothαandβare to be determined from the data, then their values can be re-estimated together after each update ofγ.

Pattern Recognition and Machine Learning

1

. (3.91)

. (3.92)

1

=

. (3.93)

0=

N

−

1

2

(3.94)

=

1

Get our desktop app

Company

Features

Documentation

Resources