Understanding Machine Learning: From Theory to Algorithms

380 Rademacher Complexities

lemma26.7 LetA be a subset ofRm and letA′ ={

∑N

j=1αja

(j) :N ∈ N,∀j,a(j)∈A,αj≥ 0 ,‖α‖ 1 = 1}. Then,R(A′) =R(A). Proof The main idea follows from the fact that for any vectorvwe have

sup α≥ 0 :‖α‖ 1 =1

∑N

j=1

αjvj = maxj vj.

Therefore,

mR(A′) =Eσ sup α≥ 0 :‖α‖ 1 =1

sup a(1),...,a(N)

∑m

i=1

σi

∑N

j=1

αja(ij)

=Eσ sup α≥ 0 :‖α‖ 1 =1

∑N

j=1

αjsup a(j)

∑m

i=1

σia(ij)

=Eσsup a∈A

∑m

i=1

σiai

=mR(A), and we conclude our proof. The next lemma, due to Massart, states that the Rademacher complexity of a finite set grows logarithmically with the size of the set. lemma26.8 (Massart lemma) LetA={a 1 ,...,aN}be a finite set of vectors inRm. Define ̄a=N^1

∑N

i=1ai. Then,

R(A) ≤ max a∈A

‖a−a ̄‖

√

2 log(N) m

.

Proof Based on Lemma 26.6, we can assume without loss of generality that ̄a= 0. Letλ >0 and letA′={λa 1 ,...,λaN}. We upper bound the Rademacher complexity as follows:

mR(A′) =Eσ

[

max a∈A′

〈σ,a〉

]

= Eσ

[

log

(

max a∈A′

e〈σ,a〉

)]

≤Eσ

[

log

(

∑

a∈A′

e〈σ,a〉

)]

≤log

(

Eσ

[

∑

a∈A′

e〈σ,a〉

])

// Jensen’s inequality

= log

(

∑

a∈A′

∏m

i=1

Eσ i

[eσiai]

)

,

where the last equality occurs because the Rademacher variables are indepen- dent. Next, using Lemma A.6 we have that for allai∈R,

σE i

eσiai=

exp(ai) + exp(−ai) 2

≤exp(a^2 i/2),

Understanding Machine Learning: From Theory to Algorithms

∑N

∑N

∑N

∑N

∑N

√

.

[

]

[

(

)]

[

(

∑

)]

(

[

∑

])

(

∑

)

,

Get our desktop app

Company

Features

Documentation

Resources