376 Rademacher Complexities
This can be written more compactly by definingσ= (σ 1 ,...,σm)∈ {± 1 }mto
be a vector such thatS 1 ={zi:σi= 1}andS 2 ={zi:σi=− 1 }. Then, if we
further assume that|S 1 |=|S 2 |then Equation (26.2) can be rewritten as
2
m
sup
f∈F
∑m
i=1
σif(zi). (26.3)
The Rademacher complexity measure captures this idea by considering the ex-
pectation of the above with respect to a random choice ofσ. Formally, letF◦S
be the set of all possible evaluations a functionf∈ Fcan achieve on a sample
S, namely,
F ◦S={(f(z 1 ),...,f(zm)) :f∈F}.
Let the variables inσbe distributed i.i.d. according toP[σi= 1] =P[σi=−1] =
1
2. Then, the Rademacher complexity ofFwith respect toSis defined as follows:
R(F ◦S) def=^1
m
E
σ∼{± 1 }m
[
sup
f∈F
∑m
i=1
σif(zi)
]
. (26.4)
More generally, given a set of vectors,A⊂Rm, we define
R(A) def=
1
mEσ
[
sup
a∈A
∑m
i=1
σiai
]
. (26.5)
The following lemma bounds the expected value of the representativeness of
Sby twice the expected Rademacher complexity.
lemma26.2
S∼DEm[ RepD(F,S)] ≤^2 S∼DEmR(F ◦S).
Proof LetS′={z′ 1 ,...,z′m}be another i.i.d. sample. Clearly, for allf∈ F,
LD(f) =ES′[LS′(f)]. Therefore, for everyf∈Fwe have
LD(f)−LS(f) = ES′[LS′(f)]−LS(f) = SE′[LS′(f)−LS(f)].
Taking supremum overf∈Fof both sides, and using the fact that the supremum
of expectation is smaller than expectation of the supremum we obtain
sup
f∈F
(
LD(f)−LS(f)
)
= sup
f∈F
ES′[LS′(f)−LS(f)]
≤ E
S′
[
sup
f∈F
(
LS′(f)−LS(f)
)
]
.
Taking expectation overSon both sides we obtain
ES
[
sup
f∈F
(
LD(f)−LS(f)
)
]
≤ S,SE′
[
sup
f∈F
(
LS′(f)−LS(f)
)
]
=
1
m
S,SE′
[
sup
f∈F
∑m
i=1
(f(zi′)−f(zi))