7.2. Relevance Vector Machines 351
basis vectorsφ 1 ,...,φMa similar intuition holds, namely that if a particular basis
vector is poorly aligned with the data vectort, then it is likely to be pruned from the
model.
We now investigate the mechanism for sparsity from a more mathematical per-
spective, for a general case involvingMbasis functions. To motivate this analysis
we first note that, in the result (7.87) for re-estimating the parameterαi, the terms on
the right-hand side are themselves also functions ofαi. These results therefore rep-
resent implicit solutions, and iteration would be required even to determine a single
αiwith all otherαjforj =ifixed.
This suggests a different approach to solving the optimization problem for the
RVM, in which we make explicit all of the dependence of the marginal likelihood
(7.85) on a particularαiand then determine its stationary points explicitly (Faul and
Tipping, 2002; Tipping and Faul, 2003). To do this, we first pull out the contribution
fromαiin the matrixCdefined by (7.86) to give
C = β−^1 I+
∑
j =i
α−j^1 φjφTj +α−i^1 φiφTi
= C−i+α−i^1 φiφTi (7.93)
whereφidenotes theithcolumn ofΦ, in other words theN-dimensional vector with
elements(φi(x 1 ),...,φi(xN)), in contrast toφn, which denotes thenthrow ofΦ.
The matrixC−irepresents the matrixCwith the contribution from basis functioni
removed. Using the matrix identities (C.7) and (C.15), the determinant and inverse
ofCcan then be written
|C| = |C−i||1+α−i^1 φiTC−−^1 iφi| (7.94)
C−^1 = C−−^1 i−
C−−^1 iφiφTiC−−^1 i
αi+φTiC−−^1 iφi
. (7.95)
Using these results, we can then write the log marginal likelihood function (7.85) in
Exercise 7.15 the form
L(α)=L(α−i)+λ(αi) (7.96)
whereL(α−i)is simply the log marginal likelihood with basis functionφiomitted,
and the quantityλ(αi)is defined by
λ(αi)=
1
2
[
lnαi−ln (αi+si)+
q^2 i
αi+si
]
(7.97)
and contains all of the dependence onαi. Here we have introduced the two quantities
si = φTiC−−^1 iφi (7.98)
qi = φTiC−−^1 it. (7.99)
Heresiis called thesparsityandqiis known as thequalityofφi, and as we shall
see, a large value ofsirelative to the value ofqimeans that the basis functionφi