Understanding Machine Learning: From Theory to Algorithms

376 Rademacher Complexities

This can be written more compactly by definingσ= (σ 1 ,...,σm)∈ {± 1 }mto be a vector such thatS 1 ={zi:σi= 1}andS 2 ={zi:σi=− 1 }. Then, if we further assume that|S 1 |=|S 2 |then Equation (26.2) can be rewritten as 2 m

sup f∈F

∑m

i=1

σif(zi). (26.3)

The Rademacher complexity measure captures this idea by considering the expectation of the above with respect to a random choice ofσ. Formally, letF◦S be the set of all possible evaluations a functionf∈ Fcan achieve on a sample S, namely, F ◦S={(f(z 1 ),...,f(zm)) :f∈F}.

Let the variables inσbe distributed i.i.d. according toP[σi= 1] =P[σi=−1] = 1 2. Then, the Rademacher complexity ofFwith respect toSis defined as follows:

R(F ◦S) def=^1 m

E

σ∼{± 1 }m

[

sup f∈F

∑m

i=1

σif(zi)

]

. (26.4)

More generally, given a set of vectors,A⊂Rm, we define

R(A) def=

1

mEσ

[

sup a∈A

∑m

i=1

σiai

]

. (26.5)

The following lemma bounds the expected value of the representativeness of Sby twice the expected Rademacher complexity. lemma26.2

S∼DEm[ RepD(F,S)] ≤^2 S∼DEmR(F ◦S). Proof LetS′={z′ 1 ,...,z′m}be another i.i.d. sample. Clearly, for allf∈ F, LD(f) =ES′[LS′(f)]. Therefore, for everyf∈Fwe have

LD(f)−LS(f) = ES′[LS′(f)]−LS(f) = SE′[LS′(f)−LS(f)].

Taking supremum overf∈Fof both sides, and using the fact that the supremum of expectation is smaller than expectation of the supremum we obtain sup f∈F

(

LD(f)−LS(f)

)

= sup f∈F

ES′[LS′(f)−LS(f)]

≤ E

S′

[

sup f∈F

(

LS′(f)−LS(f)

)

]

.

Taking expectation overSon both sides we obtain

ES

[

sup f∈F

(

LD(f)−LS(f)

)

]

≤ S,SE′

[

sup f∈F

(

LS′(f)−LS(f)

)

]

=

1

m

S,SE′

[

sup f∈F

∑m

i=1

(f(zi′)−f(zi))

]

.

(26.6)

Understanding Machine Learning: From Theory to Algorithms

E

[

]

. (26.4)

1

[

]

. (26.5)

(

)

≤ E

[

(

)

]

.

[

(

)

]

≤ S,SE′

[

(

)

]

=

1

S,SE′

[

]

.

(26.6)

Get our desktop app

Company

Features

Documentation

Resources