Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
5.1 The No-Free-Lunch Theorem 63

functionh:C→{ 0 , 1 }and everyiwe have

LDi(h) =^1
2 m


x∈C

(^1) [h(x) 6 =fi(x)]



1

2 m

∑p

r=1

(^1) [h(vr) 6 =fi(vr)]


≥^1

2 p

∑p

r=1

(^1) [h(vr) 6 =fi(vr)]. (5.5)
Hence,
1
T


∑T

i=1

LDi(A(Sij)) ≥

1

T

∑T

i=1

1

2 p

∑p

r=1

(^1) [A(Sij)(vr) 6 =fi(vr)]


=

1

2 p

∑p

r=1

1

T

∑T

i=1

(^1) [A(Sji)(vr) 6 =fi(vr)]



1

2

· min
r∈[p]

1

T

∑T

i=1

(^1) [A(Sji)(vr) 6 =fi(vr)]. (5.6)
Next, fix somer∈[p]. We can partition all the functions inf 1 ,...,fTintoT/ 2
disjoint pairs, where for a pair (fi,fi′) we have that for everyc∈C,fi(c) 6 =fi′(c)
if and only ifc=vr. Since for such a pair we must haveSji=Sij′, it follows that
(^1) [A(Sij)(vr) 6 =fi(vr)]+ (^1) [A(Sij′)(vr) 6 =fi′(vr)]= 1,
which yields
1
T


∑T

i=1

(^1) [A(Sji)(vr) 6 =fi(vr)]=


1

2.

Combining this with Equation (5.6), Equation (5.4), and Equation (5.3), we
obtain that Equation (5.1) holds, which concludes our proof.

5.1.1 No-Free-Lunch and Prior Knowledge


How does the No-Free-Lunch result relate to the need for prior knowledge? Let us
consider an ERM predictor over the hypothesis classHof all the functionsffrom
Xto{ 0 , 1 }. This class represents lack of prior knowledge: Every possible function
from the domain to the label set is considered a good candidate. According to the
No-Free-Lunch theorem, any algorithm that chooses its output from hypotheses
inH, and in particular the ERM predictor, will fail on some learning task.
Therefore, this class is not PAC learnable, as formalized in the following corollary:

corollary5.2 LetX be an infinite domain set and letHbe the set of all
functions fromXto{ 0 , 1 }. Then,His not PAC learnable.
Free download pdf