Pattern Recognition and Machine Learning

(Jeff_L) #1
7.2. Relevance Vector Machines 353


  1. EvaluateΣandm, along withqiandsifor all basis functions.

  2. Select a candidate basis functionφi.

  3. Ifq^2 i>si, andαi<∞, so that the basis vectorφiis already included in
    the model, then updateαiusing (7.101).

  4. Ifq^2 i>si, andαi=∞, then addφito the model, and evaluate hyperpa-
    rameterαiusing (7.101).

  5. Ifqi^2 si, andαi<∞then remove basis functionφifrom the model,
    and setαi=∞.

  6. If solving a regression problem, updateβ.

  7. If converged terminate, otherwise go to 3.


Note that ifq^2 i siandαi=∞, then the basis functionφiis already excluded
from the model and no action is required.
In practice, it is convenient to evaluate the quantities

Qi = φTiC−^1 t (7.102)
Si = φTiC−^1 φi. (7.103)

The quality and sparseness variables can then be expressed in the form

qi =

αiQi
αi−Si

(7.104)

si =

αiSi
αi−Si

. (7.105)

Exercise 7.17 Note that whenαi=∞,wehaveqi=Qiandsi=Si. Using (C.7), we can write


Qi = βφTit−β^2 φTiΦΣΦTt (7.106)
Si = βφTiφi−β^2 φTiΦΣΦTφi (7.107)

whereΦandΣinvolve only those basis vectors that correspond to finite hyperpa-
rametersαi. At each stage the required computations therefore scale likeO(M^3 ),
whereMis the number of active basis vectors in the model and is typically much
smaller than the numberNof training patterns.

7.2.3 RVM for classification


We can extend the relevance vector machine framework to classification prob-
lems by applying the ARD prior over weights to a probabilistic linear classification
model of the kind studied in Chapter 4. To start with, we consider two-class prob-
lems with a binary target variablet∈{ 0 , 1 }. The model now takes the form of a
linear combination of basis functions transformed by a logistic sigmoid function

y(x,w)=σ

(
wTφ(x)

)
(7.108)
Free download pdf