28.2 The Lower Bound for the Agnostic Case 395Therefore,
max
b∈{± 1 }P[LDb(A(y))−LDb(hb) =]= max
b∈{± 1 }∑
yPb[y] (^1) [A(y) 6 =b]
≥
1
2
∑
yP+[y] (^1) [A(y) 6 =+]+
1
2
∑
yP−[y] (^1) [A(y) 6 =−]
=
1
2
∑
y∈N+(P+[y] (^1) [A(y) 6 =+]+P−[y] (^1) [A(y) 6 =−]) +
1
2
∑
y∈N−(P+[y] (^1) [A(y) 6 =+]+P−[y] (^1) [A(y) 6 =−])
≥
1
2
∑
y∈N+(P−[y] (^1) [A(y) 6 =+]+P−[y] (^1) [A(y) 6 =−]) +
1
2
∑
y∈N−(P+[y] (^1) [A(y) 6 =+]+P+[y] (^1) [A(y) 6 =−])
=
1
2
∑
y∈N+P−[y] +1
2
∑
y∈N−P+[y].Next note that
∑
y∈N+P−[y] =∑
y∈N−P+[y], and both values are the prob-
ability that a Binomial (m,(1−)/2) random variable will have value greater
thanm/2. Using Lemma B.11, this probability is lower bounded by
1
2(
1 −
√
1 −exp(−m^2 /(1−^2 )))
≥^1
2
(
1 −
√
1 −exp(− 2 m^2 ))
,
where we used the assumption that^2 ≤ 1 /2. It follows that ifm≤ 0 .5 log(1/(4δ))/^2
then there existsbsuch that
P[LDb(A(y))−LDb(hb) =]≥^1
2(
1 −
√
1 −
√
4 δ)
≥δ,where the last inequality follows by standard algebraic manipulations. This con-
cludes our proof.
28.2.2 Showing Thatm(, 1 /8)≥ 8 d/
We shall now prove that for every < 1 /(8
√
2) we have thatm(,δ)≥^8 d 2.
Letρ= 8and note thatρ∈(0, 1 /√
2). We will construct a family of distri-
butions as follows. First, letC={c 1 ,...,cd}be a set ofdinstances which are
shattered byH. Second, for each vector (b 1 ,...,bd)∈ {± 1 }d, define a distribu-
tionDbsuch that
Db({(x,y)}) ={ 1
d·1+ybiρ
2 if∃i:x=ci
0 otherwise.That is, to sample an example according toDb, we first sample an elementci∈C
uniformly at random, and then set the label to bebiwith probability (1 +ρ)/ 2
or−biwith probability (1−ρ)/2.
It is easy to verify that the Bayes optimal predictor forDbis the hypothesis