24.2 Unit ball 274∑t
s=1A(^2) si≥n/d}. Then,
Rn(A,θ) = ∆Eθ
[n
∑
t=1
∑d
i=1
(
√^1
d−Atisign(θi))]
≥
∆
√
d
2 Eθ[n
∑t=1∑di=1(
√^1
d−Atisign(θi)) 2 ]
≥
∆
√
d
2∑di=1Eθ[τi
∑t=1(
√^1
d−Atisign(θi)) 2 ]
,
where the second inequality uses that‖At‖^22 ≤1. Fixi∈[d]. Forx∈{± 1 }, define
Ui(x) =
∑τi
t=1(1/√
d−Atix)^2 and letθ′∈{±∆}dbe another parameter vector
such thatθj=θj′forj 6 =iandθi′=−θi. Assume without loss of generality that
θi>0. LetPandP′be the laws ofUi(1) with respect to the bandit/learner
interaction measure induced byθandθ′, respectively. Then,Eθ[Ui(1)]≥Eθ′[Ui(1)]−(
4 n
d+ 2
)√
1
2
D(P,P′)
≥Eθ′[Ui(1)]−∆
2
(
4 n
d+ 2
)
√√
√√
E
[τi
∑t=1A^2 ti]
(24.3)
≥Eθ′[Ui(1)]−∆
2
(
4 n
d+ 2
)√
n
d+ 1 (24.4)
≥Eθ′[Ui(1)]−4
√
3∆n
d√
n
d, (24.5)
where in the first inequality we used Pinsker’s inequality (Eq. (14.11)), the result
in Exercise 14.4, the bound
Ui(1) =∑τit=1(1/
√
d−Ati)^2 ≤ 2∑τit=11
d+ 2
∑τit=1A^2 ti≤4 n
d+ 2,
and the assumption thatd≤ 2 n. The inequality in Eq. (24.3) follows from the
chain rule for the relative entropy up to a stopping time (Exercise 15.7). Eq. (24.4)
is true by the definition ofτiand Eq. (24.5) by the assumption thatd≤ 2 n.
Then,
Eθ[Ui(1)] +Eθ′[Ui(−1)]≥Eθ′[Ui(1) +Ui(−1)]−4
√
3 n∆
d√
n
d= 2Eθ′[
τi
d+
∑τit=1A^2 ti]
−
4
√
3 n∆
d√
n
d≥
2 n
d−
4
√
3 n∆
d√
n
d=
n
d