Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

26 Rademacher Complexities


In Chapter 4 we have shown that uniform convergence is a sufficient condition
for learnability. In this chapter we study the Rademacher complexity, which
measures the rate of uniform convergence. We will provide generalization bounds
based on this measure.

26.1 The Rademacher Complexity


Recall the definition of an-representative sample from Chapter 4, repeated here
for convenience.
definition26.1 (-Representative Sample) A training setSis called-representative
(w.r.t. domainZ, hypothesis classH, loss function`, and distributionD) if
sup
h∈H

|LD(h)−LS(h)|≤.

We have shown that ifSis an/2 representative sample then the ERM rule
is-consistent, namely,LD(ERMH(S))≤minh∈HLD(h) +.
To simplify our notation, let us denote

Fdef=`◦Hdef={z7→`(h,z) :h∈H},

and givenf∈F, we define

LD(f) = E
z∼D

[f(z)] , LS(f) =^1
m

∑m

i=1

f(zi).

We define therepresentativenessofSwith respect toFas the largest gap be-
tween the true error of a functionfand its empirical error, namely,

RepD(F,S) def= sup
f∈F

(

LD(f)−LS(f)

)

. (26.1)

Now, suppose we would like to estimate the representativeness ofSusing the
sampleSonly. One simple idea is to splitSinto two disjoint sets,S=S 1 ∪S 2 ;
refer toS 1 as a validation set and toS 2 as a training set. We can then estimate
the representativeness ofSby
sup
f∈F

(

LS 1 (f)−LS 2 (f)

)

. (26.2)

Understanding Machine Learning,©c2014 by Shai Shalev-Shwartz and Shai Ben-David
Published 2014 by Cambridge University Press.
Personal use only. Not for distribution. Do not post.
Please link tohttp://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning
Free download pdf