Understanding Machine Learning: From Theory to Algorithms

15.4 Duality* 211

lemma15.9 (Fritz John) Suppose that

w?∈argmin w

f(w) s.t. ∀i∈[m], gi(w)≤ 0 ,

wheref,g 1 ,...,gm are differentiable. Then, there exists α∈ Rm such that ∇f(w?) +

∑

i∈Iαi∇gi(w?) =^0 , whereI={i:gi(w?) = 0}.

15.4 Duality*

Historically, many of the properties of SVM have been obtained by considering thedualof Equation (15.3). Our presentation of SVM does not rely on duality. For completeness, we present in the following how to derive the dual of Equa- tion (15.3). We start by rewriting the problem in an equivalent form as follows. Consider the function

g(w) = max α∈Rm:α≥ 0

∑m

i=1

αi(1−yi〈w,xi〉) =

{

0 if∀i,yi〈w,xi〉≥ 1 ∞ otherwise

.

We can therefore rewrite Equation (15.3) as

minw

(

‖w‖^2 +g(w)

)

. (15.7)

Rearranging the preceding we obtain that Equation (15.3) can be rewritten as the problem

minw max α∈Rm:α≥ 0

(

1

2

‖w‖^2 +

∑m

i=1

αi(1−yi〈w,xi〉)

)

. (15.8)

Now suppose that we flip the order of min and max in the above equation. This can only decrease the objective value (see Exercise 4), and we have

minw α∈Rmaxm:α≥ 0

(

1

2 ‖w‖

(^2) +
∑m
i=1
αi(1−yi〈w,xi〉)

)

≥α∈Rmaxm:α≥ 0 minw

(

1

2

‖w‖^2 +

∑m

i=1

αi(1−yi〈w,xi〉)

)

.

The preceding inequality is calledweak duality. It turns out that in our case, strong dualityalso holds; namely, the inequality holds with equality. Therefore, thedualproblem is

α∈maxRm:α≥ 0 minw

(

1

2 ‖w‖

(^2) +
∑m
i=1
αi(1−yi〈w,xi〉)

)

. (15.9)

We can simplify the dual problem by noting that onceαis fixed, the optimization

Understanding Machine Learning: From Theory to Algorithms

∑

15.4 Duality*

{

.

(

)

. (15.7)

(

1

2

)

. (15.8)

(

1

)

(

1

2

)

.

(

1

)

. (15.9)

Get our desktop app

Company

Features

Documentation

Resources