Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
28.3 The Upper Bound for the Realizable Case 401

28.3.1 From-Nets to PAC Learnability


theorem28.4 LetHbe a hypothesis class overXwithVCdim(H) =d. Let
Dbe a distribution overXand letc∈Hbe a target hypothesis. Fix,δ∈(0,1)
and letmbe as defined in Theorem 28.3. Then, with probability of at least 1 −δ
over a choice ofmi.i.d. instances fromXwith labels according tocwe have that
any ERM hypothesis has a true error of at most.
Proof Define the classHc={c

a
h:h∈H}, wherec

a
h= (h\c)∪(c\h). It is
easy to verify that if someA⊂Xis shattered byHthen it is also shattered byHc
and vice versa. Hence, VCdim(H) = VCdim(Hc). Therefore, using Theorem 28.3
we know that with probability of at least 1−δ, the sampleSis an-net forHc.
Note thatLD(h) =D(h

a
c). Therefore, for anyh∈HwithLD(h)≥we have
that|(h

a
c)∩S|>0, which implies thathcannot be an ERM hypothesis, which
concludes our proof.
Free download pdf