10.6. Variational Logistic Regression 499we reproduce here for convenience
σ(z)σ(ξ)exp{
(z−ξ)/ 2 −λ(ξ)(z^2 −ξ^2 )}
(10.149)where
λ(ξ)=1
2 ξ[
σ(ξ)−1
2
]. (10.150)
We can therefore write
p(t|w)=eatσ(−a)eatσ(ξ)exp{
−(a+ξ)/ 2 −λ(ξ)(a^2 −ξ^2 )}. (10.151)
Note that because this bound is applied to each of the terms in the likelihood function
separately, there is a variational parameterξncorresponding to each training set
observation(φn,tn). Usinga=wTφ, and multiplying by the prior distribution, we
obtain the following bound on the joint distribution oftandw
p(t,w)=p(t|w)p(w)h(w,ξ)p(w) (10.152)whereξdenotes the set{ξn}of variational parameters, and
h(w,ξ)=∏Nn=1σ(ξn)exp{
wTφntn−(wTφn+ξn)/ 2−λ(ξn)([wTφn]^2 −ξ^2 n)}. (10.153)
Evaluation of the exact posterior distribution would require normalization of the left-
hand side of this inequality. Because this is intractable, we work instead with the
right-hand side. Note that the function on the right-hand side cannot be interpreted
as a probability density because it is not normalized. Once it is normalized to give a
variational posterior distributionq(w), however, it no longer represents a bound.
Because the logarithm function is monotonically increasing, the inequalityA
BimplieslnAlnB. This gives a lower bound on the log of the joint distribution
oftandwof the form
ln{p(t|w)p(w)}lnp(w)+∑Nn=1{
lnσ(ξn)+wTφntn−(wTφn+ξn)/ 2 −λ(ξn)([wTφn]^2 −ξ^2 n)}. (10.154)
Substituting for the priorp(w), the right-hand side of this inequality becomes, as a
function ofw
−
1
2
(w−m 0 )TS− 01 (w−m 0 )+
∑Nn=1{
wTφn(tn− 1 /2)−λ(ξn)wT(φnφTn)w}
+const. (10.155)