Pattern Recognition and Machine Learning

504 10. APPROXIMATE INFERENCE

With this factorization we can appeal to the general result (10.9) to find expressions for the optimal factors. Consider first the distributionq(w). Discarding terms that are independent ofw,wehave

lnq(w)=Eα[ln{h(w,ξ)p(w|α)p(α)}]+const =lnh(w,ξ)+Eα[lnp(w|α)]+const.

We now substitute forlnh(w,ξ)using (10.153), and forlnp(w|α)using (10.165), giving

lnq(w)=−

E[α] 2

wTw+

∑N

n=1

{ (tn− 1 /2)wTφn−λ(ξn)wTφnφTnw

} +const.

We see that this is a quadratic function ofwand so the solution forq(w)will be Gaussian. Completing the square in the usual way, we obtain

q(w)=N(w|μN,ΣN) (10.174)

where we have defined

Σ−N^1 μN =

∑N

n=1

(tn− 1 /2)φn (10.175)

Σ−N^1 = E[α]I+2

∑N

n=1

λ(ξn)φnφTn. (10.176)

Similarly, the optimal solution for the factorq(α)is obtained from

lnq(α)=Ew[lnp(w|α)] + lnp(α)+const.

Substituting forlnp(w|α)using (10.165), and forlnp(α)using (10.166), we obtain

lnq(α)=

M

2

lnα−

α 2

E

[ wTw

] +(a 0 −1) lnα−b 0 α+const.

We recognize this as the log of a gamma distribution, and so we obtain

q(α)=Gam(α|aN,bN)=

1

Γ(a 0 )

ab 00 αa^0 −^1 e−b^0 α (10.177)

where

aN = a 0 +

M

2

(10.178)

bN = b 0 +

1

2

Ew

[ wTw

]

. (10.179)

Pattern Recognition and Machine Learning

504 10. APPROXIMATE INFERENCE

M

2

E

1

M

2

(10.178)

1

2

Get our desktop app

Company

Features

Documentation

Resources