# Pattern Recognition and Machine Learning

(Jeff_L) #1
``3.5. The Evidence Approximation 169``

``Multiplying through by 2 αand rearranging, we obtain``

``αmTNmN=M−α``

``∑``

``i``

##### 1

``λi+α``

``=γ. (3.90)``

``Since there areMterms in the sum overi, the quantityγcan be written``

``γ=``

``∑``

``i``

``````λi
α+λi``````

##### . (3.91)

The interpretation of the quantityγwill be discussed shortly. From (3.90) we see
Exercise 3.20 that the value ofαthat maximizes the marginal likelihood satisfies

``α=``

``````γ
mTNmN``````

##### . (3.92)

``````Note that this is an implicit solution forαnot only becauseγdepends onα, but also
because the modemNof the posterior distribution itself depends on the choice of
α. We therefore adopt an iterative procedure in which we make an initial choice for
αand use this to findmN, which is given by (3.53), and also to evaluateγ, which
is given by (3.91). These values are then used to re-estimateαusing (3.92), and the
process repeated until convergence. Note that because the matrixΦTΦis fixed, we
can compute its eigenvalues once at the start and then simply multiply these byβto
obtain theλi.
It should be emphasized that the value ofαhas been determined purely by look-
ing at the training data. In contrast to maximum likelihood methods, no independent
data set is required in order to optimize the model complexity.
We can similarly maximize the log marginal likelihood (3.86) with respect toβ.
To do this, we note that the eigenvaluesλidefined by (3.87) are proportional toβ,
and hencedλi/dβ=λi/βgiving``````

``````d
dβ``````

``ln|A|=``

``````d
dβ``````

``∑``

``i``

``ln(λi+α)=``

##### 1

``β``

``∑``

``i``

``````λi
λi+α``````

##### =

``````γ
β``````

##### . (3.93)

``The stationary point of the marginal likelihood therefore satisfies``

##### N

``2 β``

##### 2

``∑N``

``n=1``

``````{
tn−mTNφ(xn)``````

``````} 2
−``````

``````γ
2 β``````

##### (3.94)

Exercise 3.22 and rearranging we obtain

``````1
β``````

##### 1

``N−γ``

``∑N``

``n=1``

``````{
tn−mTNφ(xn)``````

``} 2``

. (3.95)

``````Again, this is an implicit solution forβand can be solved by choosing an initial
value forβand then using this to calculatemNandγand then re-estimateβusing
(3.95), repeating until convergence. If bothαandβare to be determined from the
data, then their values can be re-estimated together after each update ofγ.``````