Bandit Algorithms

(Jeff_L) #1
33.2 Best-arm identification with a fixed confidence 379

The objective in fixed confidence best-arm identification is to find a sound
learner for whichEνπ[τ] is minimized over environmentsν∈ E. Since this is a
multi-objective criteria there is a priori no reason to believe that a single optimal
learner should exist. Conveniently, however, the condition that the learner must
satisfy Eq. (33.2) plays the role of the consistency assumption in the asymptotic
lower bounds in Chapter 16, which allows for a sense of instance-dependent
asymptotic optimality. The situation in finite-time is more complicated, as we
discuss in Note 6.

IfEis sufficiently rich andνhas multiple optimal arms, then no sound learner
can stop in finite time with positive probability. The reason is that there is no
way to reject the hypothesis that one optimal arm is fractionally better than
another. You will investigate this in Exercise 33.11. Also note that in our
definitionI{τ=t}is a deterministic function ofA 1 ,X 1 ,...,At,Xt,At+1.
None of the results that follow would change if you allowedτto also depend
on some exogenous source of randomness.

33.2.1 Lower bound


We start with the lower bound, which serves as a target for the upper bound to
follow. LetEbe an arbitrary set ofk-armed stochastic bandit environments and
forν∈Edefine

i∗(ν) = argmaxi∈[k]μi(ν) and Ealt(ν) ={ν′∈E:i∗(ν′)∩i∗(ν) =∅},
which is the set of bandits inEwith different optimal arms thanν.

Theorem33.5. Assume that(π,τ)is sound forEat confidence levelδ∈(0,1)
and letν∈E. ThenEνπ[τ]≥c∗(ν) log

( 4


δ

)


, where

c∗(ν)−^1 = sup
α∈Pk− 1

(


inf
ν′∈Ealt(ν)

(k

i=1

αiD(νi,νi′)

))


(33.3)


withc∗(ν) =∞whenc∗(ν)−^1 = 0.

Proof The result is trivial whenEνπ[τ] =∞. For the remainder assume that
Eνπ[τ]<∞, which implies thatPνπ(τ=∞) = 0. Next letν′∈Ealt(ν) and define
eventE={τ <∞andAτ+1∈/i∗(ν′)}∈Fτ. Then,
2 δ≥Pνπ(τ <∞andAτ+1∈/i∗(ν)) +Pν′π(τ <∞andAτ+1∈/i∗(ν′))
≥Pνπ(Ec) +Pν′π(E)

≥^1
2

exp

(



∑k

i=1

Eνπ[Ti(τ)] D(νi,ν′i)

)


, (33.4)


where the first inequality follows from the definition of soundness and the last
Free download pdf