Bandit Algorithms

This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This pre-publication version is free to view and download for personal use only. Not for re-distribution, sale, or use in derivative works.©Tor Lattimore and Csaba Szepesvari 2017. The latest version is available athttp://banditalgs.com. Feedback on any aspect is very welcome: [email protected]

17 High Probability Lower Bounds

The lower bounds proven in the last two chapters were for stochastic bandits. In this chapter we prove high probability lower bounds for both stochastic and adversarial bandits. Recall that for adversarial banditx∈[0,1]n×kthe random regret is

Rˆn= max i∈[k]

∑n

t=1

xti−xtAt

and the (expected) regret isRn=E[Rˆn]. To set expectations, remember that in Chapter 12 we proved two high probability upper bounds on the regret of Exp3-IX. In the first we showed there exists a policyπsuch that for all adversarial banditsx∈[0,1]n×kandδ∈(0,1) it holds with probability at least 1−δthat

Rˆn=O

(

√

knlog(k) +

√

kn log(k)

log

(

1

δ

))

. (17.1)

We also gave a version of the algorithm that depended onδ∈(0,1) for which with probability at least 1−δ,

Rˆn=O

(√

knlog

(

k δ

))

. (17.2)

The important difference is the order of quantifiers. In the first we have a single algorithm and a high-probability guarantee that holds simultaneously for any confidence level. The second algorithm needs the confidence level to be specified in advance. The price for using the generic algorithm appears to be√ log(1/δ)/log(k), which is usually quite small but not totally insignificant. We will see that both bounds are tight up to constant factors, which implies that knowing the desired confidence level in advance really does help. One reason why choosing the confidence level in advance is not ideal is that the resulting high-probability bound cannot be integrated to prove a bound in expectation. For algorithms satisfying (17.1) the expected regret can be bounded by

Rn≤

∫∞

0

P

(

Rˆn≥u

)

du=O(

√

knlog(k)). (17.3)

Bandit Algorithms

17 High Probability Lower Bounds

(

√

√

(

1

))

. (17.1)

(√

(

))

. (17.2)

∫∞

P

(

)

√

Get our desktop app

Company

Features

Documentation

Resources