Pattern Recognition and Machine Learning

340 7. SPARSE KERNEL MACHINES

Figure 7.6 Plot of an -insensitive error function (in red) in which the error increases lin- early with distance beyond the insensitive region. Also shown for compar- ison is the quadratic error function (in green).

0 z

E(z)

−

minimize a regularized error function given by

1

2

∑N

n=1

{yn−tn} 2 +

λ 2

‖w‖^2. (7.50)

To obtain sparse solutions, the quadratic error function is replaced by an-insensitive error function(Vapnik, 1995), which gives zero error if the absolute difference be- tween the predictiony(x)and the targettis less thanwhere> 0. A simple example of an-insensitive error function, having a linear cost associated with errors outside the insensitive region, is given by

E (y(x)−t)=

{ 0 , if|y(x)−t|<; |y(x)−t|−, otherwise

(7.51)

and is illustrated in Figure 7.6. We therefore minimize a regularized error function given by

C

∑N

n=1

E (y(xn)−tn)+

1

2

‖w‖^2 (7.52)

wherey(x)is given by (7.1). By convention the (inverse) regularization parameter, denotedC, appears in front of the error term. As before, we can re-express the optimization problem by introducing slack variables. For each data pointxn, we now need two slack variablesξn 0 and ̂ξn 0 , whereξn> 0 corresponds to a point for whichtn>y(xn)+, and̂ξn> 0 corresponds to a point for whichtn<y(xn)−, as illustrated in Figure 7.7. The condition for a target point to lie inside the-tube is thatyn−tn yn+, whereyn=y(xn). Introducing the slack variables allows points to lie outside the tube provided the slack variables are nonzero, and the corresponding conditions are

tn y(xn)++ξn (7.53) tn y(xn)−−̂ξn. (7.54)

Pattern Recognition and Machine Learning

340 7. SPARSE KERNEL MACHINES

−

1

2

(7.51)

C

1

2

Get our desktop app

Company

Features

Documentation

Resources

Pattern Recognition and Machine Learning

340 7. SPARSE KERNEL MACHINES

− 

1

2

(7.51)

C

1

2

Get our desktop app

Company

Features

Documentation

Resources

−