Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

204 Support Vector Machines


betweenxanduis at least the distance betweenxandv, which concludes our
proof.

On the basis of the preceding claim, the closest point in the training set to the
separating hyperplane is mini∈[m]|〈w,xi〉+b|. Therefore, the Hard-SVM rule is

argmax
(w,b):‖w‖=1

min
i∈[m]

|〈w,xi〉+b| s.t. ∀i, yi(〈w,xi〉+b)> 0.

Whenever there is a solution to the preceding problem (i.e., we are in the sepa-
rable case), we can write an equivalent problem as follows (see Exercise 1):

argmax
(w,b):‖w‖=1

min
i∈[m]

yi(〈w,xi〉+b). (15.1)

Next, we give another equivalent formulation of the Hard-SVM rule as a quadratic
optimization problem.^1

Hard-SVM

input:(x 1 ,y 1 ),...,(xm,ym)
solve:

(w 0 ,b 0 ) = argmin
(w,b)

‖w‖^2 s.t.∀i, yi(〈w,xi〉+b)≥ 1 (15.2)

output:wˆ=‖ww^00 ‖, ˆb=‖wb^00 ‖

The lemma that follows shows that the output of hard-SVM is indeed the
separating hyperplane with the largest margin. Intuitively, hard-SVM searches
forwof minimal norm among all the vectors that separate the data and for
which|〈w,xi〉+b| ≥1 for alli. In other words, we enforce the margin to be 1,
but now the units in which we measure the margin scale with the norm ofw.
Therefore, finding the largest margin halfspace boils down to findingwwhose
norm is minimal. Formally:

lemma15.2 The output of Hard-SVM is a solution of Equation (15.1).

Proof Let (w?,b?) be a solution of Equation (15.1) and define the margin
achieved by (w?,b?) to beγ?= mini∈[m] yi(〈w?,xi〉+b?). Therefore, for all
iwe have
yi(〈w?,xi〉+b?)≥γ?

or equivalently
yi(〈w

?
γ?,xi〉+

b?
γ?)≥^1.
Hence, the pair (w

?
γ?,

b?
γ?) satisfies the conditions of the quadratic optimization

(^1) A quadratic optimization problem is an optimization problem in which the objective is a
convex quadratic function and the constraints are linear inequalities.

Free download pdf