Understanding Machine Learning: From Theory to Algorithms

5.1 The No-Free-Lunch Theorem 63

functionh:C→{ 0 , 1 }and everyiwe have

LDi(h) =^1 2 m

∑

x∈C

(^1) [h(x) 6 =fi(x)]

≥

1

2 m

∑p

r=1

(^1) [h(vr) 6 =fi(vr)]

≥^1

2 p

∑p

r=1

(^1) [h(vr) 6 =fi(vr)]. (5.5)
Hence,
1
T

∑T

i=1

LDi(A(Sij)) ≥

1

T

∑T

i=1

1

2 p

∑p

r=1

(^1) [A(Sij)(vr) 6 =fi(vr)]

=

1

2 p

∑p

r=1

1

T

∑T

i=1

(^1) [A(Sji)(vr) 6 =fi(vr)]

≥

1

2

· min r∈[p]

1

T

∑T

i=1

(^1) [A(Sji)(vr) 6 =fi(vr)]. (5.6)
Next, fix somer∈[p]. We can partition all the functions inf 1 ,...,fTintoT/ 2
disjoint pairs, where for a pair (fi,fi′) we have that for everyc∈C,fi(c) 6 =fi′(c)
if and only ifc=vr. Since for such a pair we must haveSji=Sij′, it follows that
(^1) [A(Sij)(vr) 6 =fi(vr)]+ (^1) [A(Sij′)(vr) 6 =fi′(vr)]= 1,
which yields
1
T

∑T

i=1

(^1) [A(Sji)(vr) 6 =fi(vr)]=

1

2.

Combining this with Equation (5.6), Equation (5.4), and Equation (5.3), we obtain that Equation (5.1) holds, which concludes our proof.

5.1.1 No-Free-Lunch and Prior Knowledge

How does the No-Free-Lunch result relate to the need for prior knowledge? Let us consider an ERM predictor over the hypothesis classHof all the functionsffrom Xto{ 0 , 1 }. This class represents lack of prior knowledge: Every possible function from the domain to the label set is considered a good candidate. According to the No-Free-Lunch theorem, any algorithm that chooses its output from hypotheses inH, and in particular the ERM predictor, will fail on some learning task. Therefore, this class is not PAC learnable, as formalized in the following corollary:

corollary5.2 LetX be an infinite domain set and letHbe the set of all functions fromXto{ 0 , 1 }. Then,His not PAC learnable.

Understanding Machine Learning: From Theory to Algorithms

∑

≥

1

≥^1

∑T

1

T

∑T

1

=

1

1

T

∑T

≥

1

2

1

T

∑T

∑T

1

2.

5.1.1 No-Free-Lunch and Prior Knowledge

Get our desktop app

Company

Features

Documentation

Resources