33.2 Best-arm identification with a fixed confidence 384and ˆμ≈μandZt= inf
̃ν∈Ealt(ˆν(t))∑ki=1Ti(t)(ˆμi(t)−μi( ̃ν))^2
2≈t inf
̃ν∈Ealt(ν)∑ki=1α∗i(ν)(μi(ν)−μi( ̃ν))^2
2=t
c∗(ν).
Provided the approximation is reasonably accurate, the algorithm should halt
once
t
c∗(ν)≥βt(δ) = (1 +o(1)) log(1/δ),which occurs oncet≥(1 +o(1))c∗(ν) log(1/δ).33.2.3 Concentration
The first concentration theorem follows from Corollary 5.5 and a union bound.Lemma33.8.Let(Xt)∞t=1be a sequence of independent Gaussian random variables
with meanμand unit variance. Letμˆn=n^1∑n
t=1Xt. ThenP(
existsn∈N+:n
2 (ˆμn−μ)(^2) ≥log(1/δ) + log(n(n+ 1))
)
≤δ.Proposition33.9. Letg :N →Rbe increasing and for each i∈ [k]let
Si 1 ,Si 2 ,...be an infinite sequence of random variables such that for al lδ∈(0,1),P(existss∈N:Sis≥g(s) + log(1/δ))≤δ.Then provided that(Si)ki=1are independent andx≥ 0 ,P
(
existss∈Nk:∑ki=1Sisi≥kg(k
∑i=1si)
+x)
≤
(x
k)k
exp(k−x).Proof Fori∈[k] letWi=max{w∈[0,1] :Sis< g(s) +log(1/w)for alls∈N},
where we definelog(1/0) =∞. Note thatWiare well-defined. Then, for any
s∈Nk,∑ki=1Sisi≤∑ki=1g(si) +∑ki=1log(1/Wi)≤kg(k
∑i=1si)
+
∑ki=1log(1/Wi).By assumption (Wi)ki=1are independent and satisfyP(Wi≤x)≤xfor all
x∈[0,1]. The proof is completed by using Exercise 5.16.