10.2 The KL-UCB algorithm 133Proof Letε∈(0,∆) andu=a/d(μ+ε,μ+ ∆). ThenE[κ] =∑ns=1P
(
d(ˆμs,μ+ ∆)≤a
s)
≤
∑ns=1P
(
μˆs≥μ+εord(μ+ε,μ+ ∆)≤a
s)
(d(·,μ+ ∆) is decreasing on [0,μ+ ∆])≤u+∑ns=dueP(ˆμs≥μ+ε)≤u+∑∞
s=1exp (−sd(μ+ε,μ)) (Lemma 10.3)≤
a
d(μ+ε,μ+ ∆)+1
d(μ+ε,μ)≤a
d(μ+ε,μ+ ∆)+
1
2 ε^2(Pinsker’s inequality/Lemma 10.2(b))as required.Proof of Theorem 10.6 As in other proofs we assume without loss of generality
thatμ 1 =μ∗and boundE[Ti(n)] for suboptimal armsi. To this end, fix a
suboptimal armiand letε 1 +ε 2 ∈(0,∆i) with bothε 1 andε 2 positive. Defineτ= min{
t: max 1 ≤s≤nd(ˆμ 1 s,μ 1 −ε 2 )−logf(t)
s≤ 0
}
,andκ=∑ns=1I
{
d(ˆμis,μi+ ∆i−ε 2 )≤logf(n)
s}
.
By a similar reasoning as in the proof of Theorem 8.1,E[Ti(n)] =E[n
∑t=1I{At=i}]
≤E[τ] +E[ n
∑t=τ+1I{At=i}]
≤E[τ] +E[n
∑t=1I
{
At=iandd(ˆμi,Ti(t−1),μ 1 −ε 2 )≤logf(t)
Ti(t−1)}]
≤E[τ] +E[κ]≤2
ε^22+
f(n)
d(μi+ε 1 ,μ∗−ε 2 )+
1
2 ε^21,
where the second inequality follows since by the definition ofτ, ift > τ, then
the index of the optimal arm is at least as large asμ 1 −ε 2. The third inequality
follows from the definition ofκas in the proof of Theorem 8.1. The final inequality
