38.7 Proof of lower bound 493
The first sum is bounded using a version of Hoeffding–Azuma (Exercise 20.8):
P
(
Fcand
∑n
t=1
(Et[vKt(St+1)]−vKt(St+1))≥D
√
nlog(2/δ)
2
)
≤δ
2
For the second term we note thatT(k)(s,a)/
√
1 ∨Tτk− 1 (s,a)cannot be large
too often. A continuous approximation often provides intuition for the correct
form. Recalling the thousands of integrals you did at school, for any differentiable
f: [0,∞)→Rwe have
∫K
0
√f′(k)
f(k)
dk= 2
√
f(K)− 2
√
f(0). (38.21)
Here we are thinking off(k) as the continuous approximation ofTτk− 1 (s,a) and
its derivative asT(k)(s,a). In Exercise 38.22 we ask you to make this argument
rigorous by showing that
∑K
k=1
T(k)(s,a)
√
1 ∨Tτk− 1 (s,a)
≤
(√
2 + 1
)√
Tn(s,a).
Then by Cauchy-Schwarz and the fact that
∑
s,a∈S×ATn(s,a) =n,
∑
s∈S
∑
a∈A
√
Tn(s,a)≤
√
SAn.
It remains to bound the number of phases. A new phase starts when the visit
count for some state-action pair doubles. HenceKcannot be more than the
number of times the counters double in total for each of the states. It is easy to
see that 1 +log 2 Tn(s,a) gives an upper bound on how many times the counter
for this pair may double (the constant 1 is there to account for the counter
changing from zero to one). ThusK≤K′=
∑
s,a1 +log^2 Tn(s,a). Noting that
0 ≤Tn(s,a) and
∑
s,aTn(s,a) =nand relaxingTn(s,a) to take real values we
find that the value ofK′is the largest whenTn(s,a) =n/(SA), which shows that
K≤SA
(
1 + log 2
(n
SA
))
Putting everything together gives the desired result.
38.7 Proof of lower bound
The lower bound is proven by crafting a difficult MDP that models a bandit
with approximately SA arms. This a cumbersome endeavour, but intuitively
straightforward and the explanations that follow should be made clear in Fig. 38.3.
Given S and A, the first step is to construct a tree of minimum depth with at
most A children for each node using exactly S−2 states. The root of the tree is
denoted bys◦and transitions within the tree are deterministic, so in any given
node the learner can simply select which child to transition to. LetLbe the