Pattern Recognition and Machine Learning

(Jeff_L) #1
7.2. Relevance Vector Machines 351

basis vectorsφ 1 ,...,φMa similar intuition holds, namely that if a particular basis
vector is poorly aligned with the data vectort, then it is likely to be pruned from the
model.
We now investigate the mechanism for sparsity from a more mathematical per-
spective, for a general case involvingMbasis functions. To motivate this analysis
we first note that, in the result (7.87) for re-estimating the parameterαi, the terms on
the right-hand side are themselves also functions ofαi. These results therefore rep-
resent implicit solutions, and iteration would be required even to determine a single
αiwith all otherαjforj =ifixed.
This suggests a different approach to solving the optimization problem for the
RVM, in which we make explicit all of the dependence of the marginal likelihood
(7.85) on a particularαiand then determine its stationary points explicitly (Faul and
Tipping, 2002; Tipping and Faul, 2003). To do this, we first pull out the contribution
fromαiin the matrixCdefined by (7.86) to give

C = β−^1 I+


j =i

α−j^1 φjφTj +α−i^1 φiφTi

= C−i+α−i^1 φiφTi (7.93)

whereφidenotes theithcolumn ofΦ, in other words theN-dimensional vector with
elements(φi(x 1 ),...,φi(xN)), in contrast toφn, which denotes thenthrow ofΦ.
The matrixC−irepresents the matrixCwith the contribution from basis functioni
removed. Using the matrix identities (C.7) and (C.15), the determinant and inverse
ofCcan then be written

|C| = |C−i||1+α−i^1 φiTC−−^1 iφi| (7.94)

C−^1 = C−−^1 i−

C−−^1 iφiφTiC−−^1 i
αi+φTiC−−^1 iφi

. (7.95)

Using these results, we can then write the log marginal likelihood function (7.85) in
Exercise 7.15 the form
L(α)=L(α−i)+λ(αi) (7.96)
whereL(α−i)is simply the log marginal likelihood with basis functionφiomitted,
and the quantityλ(αi)is defined by


λ(αi)=

1

2

[
lnαi−ln (αi+si)+

q^2 i
αi+si

]
(7.97)

and contains all of the dependence onαi. Here we have introduced the two quantities

si = φTiC−−^1 iφi (7.98)
qi = φTiC−−^1 it. (7.99)

Heresiis called thesparsityandqiis known as thequalityofφi, and as we shall
see, a large value ofsirelative to the value ofqimeans that the basis functionφi
Free download pdf