Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
26.3 Generalization Bounds for SVM 383

Finally, since the variablesσ 1 ,...,σmare independent we have


[


∑m

i=1

σixi‖^22

]

=Eσ




i,j

σiσj〈xi,xj〉



=


i 6 =j

〈xi,xj〉Eσ[σiσj] +

∑m

i=1

〈xi,xi〉Eσ

[

σi^2

]

=

∑m

i=1

‖xi‖^22 ≤mmax
i

‖xi‖^22.

Combining this with Equation (26.15) and Equation (26.16) we conclude our
proof.
Next we bound the Rademacher complexity ofH 1 ◦S.

lemma26.11 LetS= (x 1 ,...,xm)be vectors inRn. Then,

R(H 1 ◦S) ≤ maxi ‖xi‖∞


2 log(2n)
m.
Proof Using Holder’s inequality we know that for any vectorsw,vwe have
〈w,v〉≤‖w‖ 1 ‖v‖∞. Therefore,

mR(H 1 ◦S) =Eσ

[

sup
a∈H 1 ◦S

∑m

i=1

σiai

]

=Eσ

[

sup
w:‖w‖ 1 ≤ 1

∑m

i=1

σi〈w,xi〉

]

=Eσ

[

sup
w:‖w‖ 1 ≤ 1

〈w,

∑m

i=1

σixi〉

]

≤Eσ

[


∑m

i=1

σixi‖∞

]

. (26.17)

For eachj∈[n], letvj= (x 1 ,j,...,xm,j)∈Rm. Note that‖vj‖ 2 ≤


mmaxi‖xi‖∞.
LetV={v 1 ,...,vn,−v 1 ,...,−vn}. The right-hand side of Equation (26.17) is
mR(V). Using Massart lemma (Lemma 26.8) we have that

R(V)≤maxi ‖xi‖∞


2 log(2n)/m,

which concludes our proof.

26.3 Generalization Bounds for SVM


In this section we use Rademacher complexity to derive generalization bounds
for generalized linear predictors with Euclidean norm constraint. We will show
how this leads to generalization bounds for hard-SVM and soft-SVM.
Free download pdf