Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

380 Rademacher Complexities


lemma26.7 LetA be a subset ofRm and letA′ ={

∑N

j=1αja

(j) :N ∈
N,∀j,a(j)∈A,αj≥ 0 ,‖α‖ 1 = 1}. Then,R(A′) =R(A).
Proof The main idea follows from the fact that for any vectorvwe have

sup
α≥ 0 :‖α‖ 1 =1

∑N

j=1

αjvj = maxj vj.

Therefore,

mR(A′) =Eσ sup
α≥ 0 :‖α‖ 1 =1

sup
a(1),...,a(N)

∑m

i=1

σi

∑N

j=1

αja(ij)

=Eσ sup
α≥ 0 :‖α‖ 1 =1

∑N

j=1

αjsup
a(j)

∑m

i=1

σia(ij)

=Eσsup
a∈A

∑m

i=1

σiai

=mR(A),
and we conclude our proof.
The next lemma, due to Massart, states that the Rademacher complexity of
a finite set grows logarithmically with the size of the set.
lemma26.8 (Massart lemma) LetA={a 1 ,...,aN}be a finite set of vectors
inRm. Define ̄a=N^1

∑N

i=1ai. Then,

R(A) ≤ max
a∈A

‖a−a ̄‖


2 log(N)
m

.

Proof Based on Lemma 26.6, we can assume without loss of generality that
̄a= 0. Letλ >0 and letA′={λa 1 ,...,λaN}. We upper bound the Rademacher
complexity as follows:

mR(A′) =Eσ

[

max
a∈A′

〈σ,a〉

]

= Eσ

[

log

(

max
a∈A′

e〈σ,a〉

)]

≤Eσ

[

log

(


a∈A′

e〈σ,a〉

)]

≤log

(


[


a∈A′

e〈σ,a〉

])

// Jensen’s inequality

= log

(


a∈A′

∏m

i=1


i

[eσiai]

)

,

where the last equality occurs because the Rademacher variables are indepen-
dent. Next, using Lemma A.6 we have that for allai∈R,

σE
i

eσiai=

exp(ai) + exp(−ai)
2

≤exp(a^2 i/2),
Free download pdf