Bandit Algorithms

16.1 Asymptotic bounds 197

In finite-time the situation is a little messy, but if one pushes these ideas to the limit, then for many classes of bandits one can define a precise notion of instance-dependent optimality.

16.1 Asymptotic bounds

We need to define exactly what is meant by a reasonable policy. If one is only concerned with asymptotics, then a rather conservative definition suffices.

Definition16.1.A policyπis calledconsistentover a class of banditsEif for allν∈Eandp >0 it holds that

nlim→∞

Rn(π,ν) np

= 0. (16.1)

The class of consistent policies overEis denoted by Πcons(E).

Theorem 7.1 shows that UCB is consistent overEkSG(1). The strategy that always chooses the first action is not consistent on any classEunless the first arm is optimal for everyν∈E.

Consistency is an asymptotic notion. A policy could be consistent and yet playAt= 1 for allt≤ 10100. For this reason an assumption of consistency is insufficient to derive nonasymptotic lower bounds. In Section 16.2 we introduce a finite-time version of consistency that allows us to prove finite- time instance-dependent lower bounds.

Recall that a classEof stochastic bandits is unstructured ifE=M 1 ×···×Mk withM 1 ,...,Mksets of distributions. The main theorem of this chapter is a generic lower bound that applies to any unstructured class of stochastic bandits. After the proof we will see some applications to specific classes. LetMbe a set of distributions with finite means and letμ:M→Rbe the function that maps P∈Mto its mean. Letμ∗∈RandP∈MhaveP(μ)< μ∗and define

dinf(P,μ∗,M) = inf P′∈M

{D(P,P′) :μ(P′)> μ∗}.

Theorem16.2.LetE=M 1 ×···×Mkandπ∈Πcons(E)be a consistent policy overE. Then for al lν= (Pi)ki=1∈Eit holds that

lim infn→∞ Rn log(n)

≥c∗(ν,E) =

∑

i:∆i> 0

∆i dinf(Pi,μ∗,Mi)

, (16.2)

where∆iis the suboptimality gap of theith arm inνandμ∗is the mean of the optimal arm.

Proof Letμibe the mean of theith arm inνanddi=dinf(Pi,μ∗,Mi). The

Bandit Algorithms

16.1 Asymptotic bounds

= 0. (16.1)

∑

, (16.2)

Get our desktop app

Company

Features

Documentation

Resources