10.2 The KL-UCB algorithm 133
Proof Letε∈(0,∆) andu=a/d(μ+ε,μ+ ∆). Then
E[κ] =
∑n
s=1
P
(
d(ˆμs,μ+ ∆)≤a
s
)
≤
∑n
s=1
P
(
μˆs≥μ+εord(μ+ε,μ+ ∆)≤a
s
)
(d(·,μ+ ∆) is decreasing on [0,μ+ ∆])
≤u+
∑n
s=due
P(ˆμs≥μ+ε)
≤u+
∑∞
s=1
exp (−sd(μ+ε,μ)) (Lemma 10.3)
≤
a
d(μ+ε,μ+ ∆)+
1
d(μ+ε,μ)
≤
a
d(μ+ε,μ+ ∆)
+
1
2 ε^2
(Pinsker’s inequality/Lemma 10.2(b))
as required.
Proof of Theorem 10.6 As in other proofs we assume without loss of generality
thatμ 1 =μ∗and boundE[Ti(n)] for suboptimal armsi. To this end, fix a
suboptimal armiand letε 1 +ε 2 ∈(0,∆i) with bothε 1 andε 2 positive. Define
τ= min
{
t: max 1 ≤s≤nd(ˆμ 1 s,μ 1 −ε 2 )−
logf(t)
s
≤ 0
}
,and
κ=
∑n
s=1
I
{
d(ˆμis,μi+ ∆i−ε 2 )≤
logf(n)
s
}
.
By a similar reasoning as in the proof of Theorem 8.1,
E[Ti(n)] =E
[n
∑
t=1
I{At=i}
]
≤E[τ] +E
[ n
∑
t=τ+1
I{At=i}
]
≤E[τ] +E
[n
∑
t=1
I
{
At=iandd(ˆμi,Ti(t−1),μ 1 −ε 2 )≤
logf(t)
Ti(t−1)
}]
≤E[τ] +E[κ]
≤
2
ε^22
+
f(n)
d(μi+ε 1 ,μ∗−ε 2 )
+
1
2 ε^21
,
where the second inequality follows since by the definition ofτ, ift > τ, then
the index of the optimal arm is at least as large asμ 1 −ε 2. The third inequality
follows from the definition ofκas in the proof of Theorem 8.1. The final inequality