Pattern Recognition and Machine Learning

(Jeff_L) #1
334 7. SPARSE KERNEL MACHINES

model (7.13). The remaining data points constitute the support vectors. These have
an> 0 and hence from (7.25) must satisfy

tny(xn)=1−ξn. (7.35)

Ifan<C, then (7.31) implies thatμn> 0 , which from (7.28) requiresξn=0and
hence such points lie on the margin. Points withan=Ccan lie inside the margin
and can either be correctly classified ifξn 1 or misclassified ifξn> 1.
To determine the parameterbin (7.1), we note that those support vectors for
which 0 <an<Chaveξn=0so thattny(xn)=1and hence will satisfy

tn

(

m∈S

amtmk(xn,xm)+b

)

=1. (7.36)

Again, a numerically stable solution is obtained by averaging to give

b=

1

NM


n∈M

(

tn−


m∈S

amtmk(xn,xm)

)

(7.37)

whereMdenotes the set of indices of data points having 0 <an<C.
An alternative, equivalent formulation of the support vector machine, known as
theν-SVM, has been proposed by Scholkopf ̈ et al.(2000). This involves maximizing

L ̃(a)=−^1
2

∑N

n=1

∑N

m=1

anamtntmk(xn,xm) (7.38)

subject to the constraints

0 an 1 /N (7.39)
∑N

n=1

antn=0 (7.40)

∑N

n=1

anν. (7.41)

This approach has the advantage that the parameterν, which replacesC, can be
interpreted as both an upper bound on the fraction ofmargin errors(points for which
ξn> 0 and hence which lie on the wrong side of the margin boundary and which may
or may not be misclassified) and a lower bound on the fraction of support vectors. An
example of theν-SVM applied to a synthetic data set is shown in Figure 7.4. Here
Gaussian kernels of the formexp (−γ‖x−x′‖^2 )have been used, withγ=0. 45.
Although predictions for new inputs are made using only the support vectors,
the training phase (i.e., the determination of the parametersaandb) makes use of
the whole data set, and so it is important to have efficient algorithms for solving
Free download pdf