Pattern Recognition and Machine Learning

(Jeff_L) #1
340 7. SPARSE KERNEL MACHINES

Figure 7.6 Plot of an -insensitive error function (in
red) in which the error increases lin-
early with distance beyond the insen-
sitive region. Also shown for compar-
ison is the quadratic error function (in
green).

0 z

E(z)

− 

minimize a regularized error function given by

1

2

∑N

n=1

{yn−tn}
2
+

λ
2

‖w‖^2. (7.50)

To obtain sparse solutions, the quadratic error function is replaced by an-insensitive
error function(Vapnik, 1995), which gives zero error if the absolute difference be-
tween the predictiony(x)and the targettis less thanwhere> 0. A simple
example of an-insensitive error function, having a linear cost associated with errors
outside the insensitive region, is given by

E (y(x)−t)=

{
0 , if|y(x)−t|<;
|y(x)−t|−, otherwise

(7.51)

and is illustrated in Figure 7.6.
We therefore minimize a regularized error function given by

C

∑N

n=1

E (y(xn)−tn)+

1

2

‖w‖^2 (7.52)

wherey(x)is given by (7.1). By convention the (inverse) regularization parameter,
denotedC, appears in front of the error term.
As before, we can re-express the optimization problem by introducing slack
variables. For each data pointxn, we now need two slack variablesξn  0 and
̂ξn 0 , whereξn> 0 corresponds to a point for whichtn>y(xn)+, and̂ξn> 0
corresponds to a point for whichtn<y(xn)−, as illustrated in Figure 7.7.
The condition for a target point to lie inside the-tube is thatyn−tn
yn+, whereyn=y(xn). Introducing the slack variables allows points to lie outside
the tube provided the slack variables are nonzero, and the corresponding conditions
are

tn  y(xn)++ξn (7.53)
tn  y(xn)−−̂ξn. (7.54)
Free download pdf