Understanding Machine Learning: From Theory to Algorithms

204 Support Vector Machines

betweenxanduis at least the distance betweenxandv, which concludes our proof.

On the basis of the preceding claim, the closest point in the training set to the separating hyperplane is mini∈[m]|〈w,xi〉+b|. Therefore, the Hard-SVM rule is

argmax (w,b):‖w‖=1

min i∈[m]

|〈w,xi〉+b| s.t. ∀i, yi(〈w,xi〉+b)> 0.

Whenever there is a solution to the preceding problem (i.e., we are in the sepa- rable case), we can write an equivalent problem as follows (see Exercise 1):

argmax (w,b):‖w‖=1

min i∈[m]

yi(〈w,xi〉+b). (15.1)

Next, we give another equivalent formulation of the Hard-SVM rule as a quadratic optimization problem.^1

Hard-SVM

input:(x 1 ,y 1 ),...,(xm,ym) solve:

(w 0 ,b 0 ) = argmin (w,b)

‖w‖^2 s.t.∀i, yi(〈w,xi〉+b)≥ 1 (15.2)

output:wˆ=‖ww^00 ‖, ˆb=‖wb^00 ‖

The lemma that follows shows that the output of hard-SVM is indeed the separating hyperplane with the largest margin. Intuitively, hard-SVM searches forwof minimal norm among all the vectors that separate the data and for which|〈w,xi〉+b| ≥1 for alli. In other words, we enforce the margin to be 1, but now the units in which we measure the margin scale with the norm ofw. Therefore, finding the largest margin halfspace boils down to findingwwhose norm is minimal. Formally:

lemma15.2 The output of Hard-SVM is a solution of Equation (15.1).

Proof Let (w?,b?) be a solution of Equation (15.1) and define the margin achieved by (w?,b?) to beγ?= mini∈[m] yi(〈w?,xi〉+b?). Therefore, for all iwe have yi(〈w?,xi〉+b?)≥γ?

or equivalently yi(〈w

? γ?,xi〉+

b? γ?)≥^1. Hence, the pair (w

? γ?,

b? γ?) satisfies the conditions of the quadratic optimization

(^1) A quadratic optimization problem is an optimization problem in which the objective is a
convex quadratic function and the constraints are linear inequalities.

Understanding Machine Learning: From Theory to Algorithms

Get our desktop app

Company

Features

Documentation

Resources