28.2 The Lower Bound for the Agnostic Case 395
Therefore,
max
b∈{± 1 }
P[LDb(A(y))−LDb(hb) =]
= max
b∈{± 1 }
∑
y
Pb[y] (^1) [A(y) 6 =b]
≥
1
2
∑
y
P+[y] (^1) [A(y) 6 =+]+
1
2
∑
y
P−[y] (^1) [A(y) 6 =−]
=
1
2
∑
y∈N+
(P+[y] (^1) [A(y) 6 =+]+P−[y] (^1) [A(y) 6 =−]) +
1
2
∑
y∈N−
(P+[y] (^1) [A(y) 6 =+]+P−[y] (^1) [A(y) 6 =−])
≥
1
2
∑
y∈N+
(P−[y] (^1) [A(y) 6 =+]+P−[y] (^1) [A(y) 6 =−]) +
1
2
∑
y∈N−
(P+[y] (^1) [A(y) 6 =+]+P+[y] (^1) [A(y) 6 =−])
=
1
2
∑
y∈N+
P−[y] +
1
2
∑
y∈N−
P+[y].
Next note that
∑
y∈N+P−[y] =
∑
y∈N−P+[y], and both values are the prob-
ability that a Binomial (m,(1−)/2) random variable will have value greater
thanm/2. Using Lemma B.11, this probability is lower bounded by
1
2
(
1 −
√
1 −exp(−m^2 /(1−^2 ))
)
≥^1
2
(
1 −
√
1 −exp(− 2 m^2 )
)
,
where we used the assumption that^2 ≤ 1 /2. It follows that ifm≤ 0 .5 log(1/(4δ))/^2
then there existsbsuch that
P[LDb(A(y))−LDb(hb) =]
≥^1
2
(
1 −
√
1 −
√
4 δ
)
≥δ,
where the last inequality follows by standard algebraic manipulations. This con-
cludes our proof.
28.2.2 Showing Thatm(, 1 /8)≥ 8 d/
We shall now prove that for every < 1 /(8
√
2) we have thatm(,δ)≥^8 d 2.
Letρ= 8and note thatρ∈(0, 1 /
√
2). We will construct a family of distri-
butions as follows. First, letC={c 1 ,...,cd}be a set ofdinstances which are
shattered byH. Second, for each vector (b 1 ,...,bd)∈ {± 1 }d, define a distribu-
tionDbsuch that
Db({(x,y)}) =
{ 1
d·
1+ybiρ
2 if∃i:x=ci
0 otherwise.
That is, to sample an example according toDb, we first sample an elementci∈C
uniformly at random, and then set the label to bebiwith probability (1 +ρ)/ 2
or−biwith probability (1−ρ)/2.
It is easy to verify that the Bayes optimal predictor forDbis the hypothesis