340 7. SPARSE KERNEL MACHINES
Figure 7.6 Plot of an -insensitive error function (in
red) in which the error increases lin-
early with distance beyond the insen-
sitive region. Also shown for compar-
ison is the quadratic error function (in
green).
0 z
E(z)
−
minimize a regularized error function given by
1
2
∑N
n=1
{yn−tn}
2
+
λ
2
‖w‖^2. (7.50)
To obtain sparse solutions, the quadratic error function is replaced by an-insensitive
error function(Vapnik, 1995), which gives zero error if the absolute difference be-
tween the predictiony(x)and the targettis less thanwhere> 0. A simple
example of an-insensitive error function, having a linear cost associated with errors
outside the insensitive region, is given by
E (y(x)−t)=
{
0 , if|y(x)−t|<;
|y(x)−t|−, otherwise
(7.51)
and is illustrated in Figure 7.6.
We therefore minimize a regularized error function given by
C
∑N
n=1
E (y(xn)−tn)+
1
2
‖w‖^2 (7.52)
wherey(x)is given by (7.1). By convention the (inverse) regularization parameter,
denotedC, appears in front of the error term.
As before, we can re-express the optimization problem by introducing slack
variables. For each data pointxn, we now need two slack variablesξn 0 and
̂ξn 0 , whereξn> 0 corresponds to a point for whichtn>y(xn)+, and̂ξn> 0
corresponds to a point for whichtn<y(xn)−, as illustrated in Figure 7.7.
The condition for a target point to lie inside the-tube is thatyn−tn
yn+, whereyn=y(xn). Introducing the slack variables allows points to lie outside
the tube provided the slack variables are nonzero, and the corresponding conditions
are
tn y(xn)++ξn (7.53)
tn y(xn)−−̂ξn. (7.54)