Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
28.3 The Upper Bound for the Realizable Case 399

Claim 1


P[S∈B]≤ 2 P[(S,T)∈B′].

Proof of Claim 1: SinceSandTare chosen independently we can write


P[(S,T)∈B′] = E
(S,T)∼D^2 m

[

(^1) [(S,T)∈B′]


]

= E

S∼Dm

[

E

T∼Dm

[

(^1) [(S,T)∈B′]


]]

.

Note that (S,T)∈B′impliesS∈Band therefore (^1) [(S,T)∈B′]= (^1) [(S,T)∈B′] (^1) [S∈B],
which gives
P[(S,T)∈B′] =S∼DEmT∼DEm (^1) [(S,T)∈B′] (^1) [S∈B]
= E
S∼Dm
(^1) [S∈B] E
T∼Dm
(^1) [(S,T)∈B′].
Fix someS. Then, either (^1) [S∈B]= 0 orS∈Band then∃hSsuch thatD(hS)≥
and|hS∩S|= 0. It follows that a sufficient condition for (S,T)∈B′is that
|T∩hS|>m 2. Therefore, wheneverS∈Bwe have
T∼DEm^1 [(S,T)∈B′] ≥ T∼DPm[|T∩hS|>m^2 ].
But, since we now assumeS∈Bwe know thatD(hS) =ρ≥. Therefore,
|T∩hS|is a binomial random variable with parametersρ(probability of success
for a single try) andm(number of tries). Chernoff’s inequality implies
P[|T∩hS|≤ρm 2 ]≤e−
2
mρ(mρ−mρ/2)^2 =e−mρ/^2 ≤e−m/^2 ≤e−dlog(1/δ)/^2 =δd/^2 ≤ 1 / 2.
Thus,
P[|T∩hS|>m 2 ] = 1−P[|T∩hS|≤m 2 ] ≥ 1 −P[|T∩hS|≤ρm 2 ]≥ 1 / 2.
Combining all the preceding weconclude the proof of Claim 1.


Claim 2 (Symmetrization):


P[(S,T)∈B′]≤e−m/^4 τH(2m).
Proof of Claim 2: To simplify notation, letα=m/2 and for a sequenceA=
(x 1 ,...,x 2 m) letA 0 = (x 1 ,...,xm). Using the definition ofB′we get that


P[A∈B′] = E
A∼D^2 m

maxh∈H (^1) [D(h)≥] (^1) [|h∩A 0 |=0] (^1) [|h∩A|≥α]
≤ E
A∼D^2 m
maxh∈H (^1) [|h∩A 0 |=0] (^1) [|h∩A|≥α].
Now, let us define byHAthe effective number of different hypotheses onA,
namely,HA={h∩A:h∈H }. It follows that
P[A∈B′]≤ E
A∼D^2 m
max
h∈HA
(^1) [|h∩A 0 |=0] (^1) [|h∩A|≥α]
≤ E
A∼D^2 m



h∈HA

(^1) [|h∩A 0 |=0] (^1) [|h∩A|≥α].
LetJ={j⊂[2m] :|j|=m}. For anyj∈JandA= (x 1 ,...,x 2 m) define
Aj = (xj 1 ,...,xjm). Since the elements ofAare chosen i.i.d., we have that
for anyj∈J and any functionf(A,A 0 ) it holds thatEA∼D^2 m[f(A,A 0 )] =

Free download pdf