Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
15.4 Duality* 211

lemma15.9 (Fritz John) Suppose that

w?∈argmin
w

f(w) s.t. ∀i∈[m], gi(w)≤ 0 ,

wheref,g 1 ,...,gm are differentiable. Then, there exists α∈ Rm such that
∇f(w?) +


i∈Iαi∇gi(w?) =^0 , whereI={i:gi(w?) = 0}.

15.4 Duality*


Historically, many of the properties of SVM have been obtained by considering
thedualof Equation (15.3). Our presentation of SVM does not rely on duality.
For completeness, we present in the following how to derive the dual of Equa-
tion (15.3).
We start by rewriting the problem in an equivalent form as follows. Consider
the function

g(w) = max
α∈Rm:α≥ 0

∑m

i=1

αi(1−yi〈w,xi〉) =

{

0 if∀i,yi〈w,xi〉≥ 1
∞ otherwise

.

We can therefore rewrite Equation (15.3) as

minw

(

‖w‖^2 +g(w)

)

. (15.7)

Rearranging the preceding we obtain that Equation (15.3) can be rewritten as
the problem

minw max
α∈Rm:α≥ 0

(

1

2

‖w‖^2 +

∑m

i=1

αi(1−yi〈w,xi〉)

)

. (15.8)

Now suppose that we flip the order of min and max in the above equation. This
can only decrease the objective value (see Exercise 4), and we have

minw α∈Rmaxm:α≥ 0

(

1

2 ‖w‖

(^2) +
∑m
i=1
αi(1−yi〈w,xi〉)


)

≥α∈Rmaxm:α≥ 0 minw

(

1

2

‖w‖^2 +

∑m

i=1

αi(1−yi〈w,xi〉)

)

.

The preceding inequality is calledweak duality. It turns out that in our case,
strong dualityalso holds; namely, the inequality holds with equality. Therefore,
thedualproblem is

α∈maxRm:α≥ 0 minw

(

1

2 ‖w‖

(^2) +
∑m
i=1
αi(1−yi〈w,xi〉)


)

. (15.9)

We can simplify the dual problem by noting that onceαis fixed, the optimization
Free download pdf