26.1 The Rademacher Complexity 377
Next, we note that for eachj,zjandz′jare i.i.d. variables. Therefore, we can
replace them without affecting the expectation:
S,SE′
sup
f∈F
(f(z′j)−f(zj)) +
∑
i 6 =j
(f(zi′)−f(zi))
=
S,SE′
sup
f∈F
(f(zj)−f(z′j)) +
∑
i 6 =j
(f(zi′)−f(zi))
.
(26.7)
Letσjbe a random variable such thatP[σj= 1] =P[σj=−1] = 1/2. From
Equation (26.7) we obtain that
S,SE′,σ
j
sup
f∈F
σj(f(z′j)−f(zj)) +
∑
i 6 =j
(f(zi′)−f(zi))
=^1
2
(l.h.s. of Equation (26.7)) +^1
2
(r.h.s. of Equation (26.7))
= E
S,S′
sup
f∈F
(f(zj′)−f(zj)) +
∑
i 6 =j
(f(z′i)−f(zi))
.
(26.8)
Repeating this for alljwe obtain that
S,SE′
[
sup
f∈F
∑m
i=1
(f(z′i)−f(zi))
]
=S,SE′,σ
[
sup
f∈F
∑m
i=1
σi(f(zi′)−f(zi))
]
. (26.9)
Finally,
sup
f∈F
∑
i
σi(f(z′i)−f(zi))≤sup
f∈F
∑
i
σif(z′i) + sup
f∈F
∑
i
−σif(zi)
and since the probability ofσis the same as the probability of−σ, the right-hand
side of Equation (26.9) can be bounded by
E
S,S′,σ
[
sup
f∈F
∑
i
σif(z′i) + sup
f∈F
∑
i
σif(zi)
]
= mE
S′
[R(F ◦S′)] +mE
S
[R(F ◦S)] = 2mE
S
[R(F ◦S)].
The lemma immediately yields that, in expectation, the ERM rule finds a
hypothesis which is close to the optimal hypothesis inH.
theorem26.3 We have
S∼DEm[LD(ERMH(S))−LS(ERMH(S))]≤^2 S∼DEmR(`◦H◦S).
Furthermore, for anyh?∈H
E
S∼Dm
[LD(ERMH(S))−LD(h?)] ≤ 2 E
S∼Dm