Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
15.1 Margin and Hard-SVM 203

x

x

While both the dashed-black and solid-green hyperplanes separate the four ex-
amples, our intuition would probably lead us to prefer the black hyperplane over
the green one. One way to formalize this intuition is using the concept ofmargin.
The margin of a hyperplane with respect to a training set is defined to be the
minimal distance between a point in the training set and the hyperplane. If a
hyperplane has a large margin, then it will still separate the training set even if
we slightly perturb each instance.
We will see later on that the true error of a halfspace can be bounded in terms
of the margin it has over the training sample (the larger the margin, the smaller
the error), regardless of the Euclidean dimension in which this halfspace resides.
Hard-SVMis the learning rule in which we return an ERM hyperplane that
separates the training set with the largest possible margin. To define Hard-SVM
formally, we first express the distance between a pointxto a hyperplane using
the parameters defining the halfspace.


claim15.1 The distance between a pointxand the hyperplane defined by
(w, b)where‖w‖= 1is|〈w,x〉+b|.


Proof The distance between a pointxand the hyperplane is defined as


min{‖x−v‖:〈w,v〉+b= 0}.

Takingv=x−(〈w,x〉+b)wwe have that


〈w,v〉+b=〈w,x〉−(〈w,x〉+b)‖w‖^2 +b= 0,

and


‖x−v‖=|〈w,x〉+b|‖w‖=|〈w,x〉+b|.

Hence, the distance is at most|〈w,x〉+b|. Next, take any other pointuon the
hyperplane, thus〈w,u〉+b= 0. We have


‖x−u‖^2 =‖x−v+v−u‖^2
=‖x−v‖^2 +‖v−u‖^2 + 2〈x−v,v−u〉
≥‖x−v‖^2 + 2〈x−v,v−u〉
=‖x−v‖^2 + 2(〈w,x〉+b)〈w,v−u〉
=‖x−v‖^2 ,

where the last equality is because〈w,v〉=〈w,u〉=−b. Hence, the distance

Free download pdf