Pattern Recognition and Machine Learning

7.2. Relevance Vector Machines 351

basis vectorsφ 1 ,...,φMa similar intuition holds, namely that if a particular basis vector is poorly aligned with the data vectort, then it is likely to be pruned from the model. We now investigate the mechanism for sparsity from a more mathematical per- spective, for a general case involvingMbasis functions. To motivate this analysis we first note that, in the result (7.87) for re-estimating the parameterαi, the terms on the right-hand side are themselves also functions ofαi. These results therefore rep- resent implicit solutions, and iteration would be required even to determine a single αiwith all otherαjforj =ifixed. This suggests a different approach to solving the optimization problem for the RVM, in which we make explicit all of the dependence of the marginal likelihood (7.85) on a particularαiand then determine its stationary points explicitly (Faul and Tipping, 2002; Tipping and Faul, 2003). To do this, we first pull out the contribution fromαiin the matrixCdefined by (7.86) to give

C = β−^1 I+

∑

j =i

α−j^1 φjφTj +α−i^1 φiφTi

= C−i+α−i^1 φiφTi (7.93)

whereφidenotes theithcolumn ofΦ, in other words theN-dimensional vector with elements(φi(x 1 ),...,φi(xN)), in contrast toφn, which denotes thenthrow ofΦ. The matrixC−irepresents the matrixCwith the contribution from basis functioni removed. Using the matrix identities (C.7) and (C.15), the determinant and inverse ofCcan then be written

|C| = |C−i||1+α−i^1 φiTC−−^1 iφi| (7.94)

C−^1 = C−−^1 i−

C−−^1 iφiφTiC−−^1 i αi+φTiC−−^1 iφi

. (7.95)

Using these results, we can then write the log marginal likelihood function (7.85) in
Exercise 7.15 the form
L(α)=L(α−i)+λ(αi) (7.96)
whereL(α−i)is simply the log marginal likelihood with basis functionφiomitted,
and the quantityλ(αi)is defined by

λ(αi)=

1

2

[ lnαi−ln (αi+si)+

q^2 i αi+si

] (7.97)

and contains all of the dependence onαi. Here we have introduced the two quantities

si = φTiC−−^1 iφi (7.98) qi = φTiC−−^1 it. (7.99)

Heresiis called thesparsityandqiis known as thequalityofφi, and as we shall see, a large value ofsirelative to the value ofqimeans that the basis functionφi

Pattern Recognition and Machine Learning

. (7.95)

1

2

Get our desktop app

Company

Features

Documentation

Resources