Pattern Recognition and Machine Learning

334 7. SPARSE KERNEL MACHINES

model (7.13). The remaining data points constitute the support vectors. These have an> 0 and hence from (7.25) must satisfy

tny(xn)=1−ξn. (7.35)

Ifan<C, then (7.31) implies thatμn> 0 , which from (7.28) requiresξn=0and hence such points lie on the margin. Points withan=Ccan lie inside the margin and can either be correctly classified ifξn 1 or misclassified ifξn> 1. To determine the parameterbin (7.1), we note that those support vectors for which 0 <an<Chaveξn=0so thattny(xn)=1and hence will satisfy

tn

( ∑

m∈S

amtmk(xn,xm)+b

)

=1. (7.36)

Again, a numerically stable solution is obtained by averaging to give

b=

1

NM

∑

n∈M

(

tn−

∑

m∈S

amtmk(xn,xm)

)

(7.37)

whereMdenotes the set of indices of data points having 0 <an<C. An alternative, equivalent formulation of the support vector machine, known as theν-SVM, has been proposed by Scholkopf ̈ et al.(2000). This involves maximizing

L ̃(a)=−^1 2

∑N

n=1

∑N

m=1

anamtntmk(xn,xm) (7.38)

subject to the constraints

0 an 1 /N (7.39) ∑N

n=1

antn=0 (7.40)

∑N

n=1

anν. (7.41)

This approach has the advantage that the parameterν, which replacesC, can be interpreted as both an upper bound on the fraction ofmargin errors(points for which ξn> 0 and hence which lie on the wrong side of the margin boundary and which may or may not be misclassified) and a lower bound on the fraction of support vectors. An example of theν-SVM applied to a synthetic data set is shown in Figure 7.4. Here Gaussian kernels of the formexp (−γ‖x−x′‖^2 )have been used, withγ=0. 45. Although predictions for new inputs are made using only the support vectors, the training phase (i.e., the determination of the parametersaandb) makes use of the whole data set, and so it is important to have efficient algorithms for solving

Pattern Recognition and Machine Learning

334 7. SPARSE KERNEL MACHINES

1

NM

Get our desktop app

Company

Features

Documentation

Resources