24.2 Unit ball 274
∑t
s=1A
(^2) si≥n/d}. Then,
Rn(A,θ) = ∆Eθ
[n
∑
t=1
∑d
i=1
(
√^1
d
−Atisign(θi)
)]
≥
∆
√
d
2 Eθ
[n
∑
t=1
∑d
i=1
(
√^1
d
−Atisign(θi)
) 2 ]
≥
∆
√
d
2
∑d
i=1
Eθ
[τi
∑
t=1
(
√^1
d
−Atisign(θi)
) 2 ]
,
where the second inequality uses that‖At‖^22 ≤1. Fixi∈[d]. Forx∈{± 1 }, define
Ui(x) =
∑τi
t=1(1/
√
d−Atix)^2 and letθ′∈{±∆}dbe another parameter vector
such thatθj=θj′forj 6 =iandθi′=−θi. Assume without loss of generality that
θi>0. LetPandP′be the laws ofUi(1) with respect to the bandit/learner
interaction measure induced byθandθ′, respectively. Then,
Eθ[Ui(1)]≥Eθ′[Ui(1)]−
(
4 n
d
+ 2
)√
1
2
D(P,P′)
≥Eθ′[Ui(1)]−
∆
2
(
4 n
d
+ 2
)
√√
√√
E
[τi
∑
t=1
A^2 ti
]
(24.3)
≥Eθ′[Ui(1)]−
∆
2
(
4 n
d
+ 2
)√
n
d
+ 1 (24.4)
≥Eθ′[Ui(1)]−
4
√
3∆n
d
√
n
d
, (24.5)
where in the first inequality we used Pinsker’s inequality (Eq. (14.11)), the result
in Exercise 14.4, the bound
Ui(1) =
∑τi
t=1
(1/
√
d−Ati)^2 ≤ 2
∑τi
t=1
1
d
+ 2
∑τi
t=1
A^2 ti≤
4 n
d
+ 2,
and the assumption thatd≤ 2 n. The inequality in Eq. (24.3) follows from the
chain rule for the relative entropy up to a stopping time (Exercise 15.7). Eq. (24.4)
is true by the definition ofτiand Eq. (24.5) by the assumption thatd≤ 2 n.
Then,
Eθ[Ui(1)] +Eθ′[Ui(−1)]≥Eθ′[Ui(1) +Ui(−1)]−
4
√
3 n∆
d
√
n
d
= 2Eθ′
[
τi
d
+
∑τi
t=1
A^2 ti
]
−
4
√
3 n∆
d
√
n
d
≥
2 n
d
−
4
√
3 n∆
d
√
n
d
=
n
d