Understanding Machine Learning: From Theory to Algorithms

26.1 The Rademacher Complexity 377

Next, we note that for eachj,zjandz′jare i.i.d. variables. Therefore, we can
replace them without affecting the expectation:

S,SE′



 sup f∈F



(f(z′j)−f(zj)) +

∑

i 6 =j

(f(zi′)−f(zi))







 =

S,SE′



 sup f∈F



(f(zj)−f(z′j)) +

∑

i 6 =j

(f(zi′)−f(zi))







.

(26.7)

Letσjbe a random variable such thatP[σj= 1] =P[σj=−1] = 1/2. From
Equation (26.7) we obtain that

S,SE′,σ j



sup f∈F



σj(f(z′j)−f(zj)) +

∑

i 6 =j

(f(zi′)−f(zi))









=^1

2

(l.h.s. of Equation (26.7)) +^1 2

(r.h.s. of Equation (26.7))

= E

S,S′



sup f∈F



(f(zj′)−f(zj)) +

∑

i 6 =j

(f(z′i)−f(zi))







.

(26.8)

Repeating this for alljwe obtain that

S,SE′

[

sup f∈F

∑m

i=1

(f(z′i)−f(zi))

]

=S,SE′,σ

[

sup f∈F

∑m

i=1

σi(f(zi′)−f(zi))

]

. (26.9)

Finally,

sup f∈F

∑

i

σi(f(z′i)−f(zi))≤sup f∈F

∑

i

σif(z′i) + sup f∈F

∑

i

−σif(zi)

and since the probability ofσis the same as the probability of−σ, the right-hand
side of Equation (26.9) can be bounded by

E

S,S′,σ

[

sup f∈F

∑

i

σif(z′i) + sup f∈F

∑

i

σif(zi)

]

= mE S′

[R(F ◦S′)] +mE S

[R(F ◦S)] = 2mE S

[R(F ◦S)].

The lemma immediately yields that, in expectation, the ERM rule finds a
hypothesis which is close to the optimal hypothesis inH.

theorem26.3 We have

S∼DEm[LD(ERMH(S))−LS(ERMH(S))]≤^2 S∼DEmR(`◦H◦S).

Furthermore, for anyh?∈H

E S∼Dm

[LD(ERMH(S))−LD(h?)] ≤ 2 E S∼Dm

R(`◦H◦S).

Understanding Machine Learning: From Theory to Algorithms

S,SE′





∑







 =

S,SE′





∑







.

(26.7)





∑









=^1

2

= E





∑







.

(26.8)

S,SE′

[

]

[

]

. (26.9)

∑

∑

∑

E

[

∑

∑

]

[R(F ◦S)].

R(`◦H◦S).

Get our desktop app

Company

Features

Documentation

Resources