Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

376 Rademacher Complexities


This can be written more compactly by definingσ= (σ 1 ,...,σm)∈ {± 1 }mto
be a vector such thatS 1 ={zi:σi= 1}andS 2 ={zi:σi=− 1 }. Then, if we
further assume that|S 1 |=|S 2 |then Equation (26.2) can be rewritten as
2
m

sup
f∈F

∑m

i=1

σif(zi). (26.3)

The Rademacher complexity measure captures this idea by considering the ex-
pectation of the above with respect to a random choice ofσ. Formally, letF◦S
be the set of all possible evaluations a functionf∈ Fcan achieve on a sample
S, namely,
F ◦S={(f(z 1 ),...,f(zm)) :f∈F}.

Let the variables inσbe distributed i.i.d. according toP[σi= 1] =P[σi=−1] =
1
2. Then, the Rademacher complexity ofFwith respect toSis defined as follows:

R(F ◦S) def=^1
m

E

σ∼{± 1 }m

[

sup
f∈F

∑m

i=1

σif(zi)

]

. (26.4)

More generally, given a set of vectors,A⊂Rm, we define

R(A) def=

1

mEσ

[

sup
a∈A

∑m

i=1

σiai

]

. (26.5)

The following lemma bounds the expected value of the representativeness of
Sby twice the expected Rademacher complexity.
lemma26.2

S∼DEm[ RepD(F,S)] ≤^2 S∼DEmR(F ◦S).
Proof LetS′={z′ 1 ,...,z′m}be another i.i.d. sample. Clearly, for allf∈ F,
LD(f) =ES′[LS′(f)]. Therefore, for everyf∈Fwe have

LD(f)−LS(f) = ES′[LS′(f)]−LS(f) = SE′[LS′(f)−LS(f)].

Taking supremum overf∈Fof both sides, and using the fact that the supremum
of expectation is smaller than expectation of the supremum we obtain
sup
f∈F

(

LD(f)−LS(f)

)

= sup
f∈F

ES′[LS′(f)−LS(f)]

≤ E

S′

[

sup
f∈F

(

LS′(f)−LS(f)

)

]

.

Taking expectation overSon both sides we obtain

ES

[

sup
f∈F

(

LD(f)−LS(f)

)

]

≤ S,SE′

[

sup
f∈F

(

LS′(f)−LS(f)

)

]

=

1

m

S,SE′

[

sup
f∈F

∑m

i=1

(f(zi′)−f(zi))

]

.

(26.6)
Free download pdf