Bandit Algorithms

This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This pre-publication version is free to view and download for personal use only. Not for re-distribution, sale, or use in derivative works.©Tor Lattimore and Csaba Szepesvari 2017. The latest version is available athttp://banditalgs.com. Feedback on any aspect is very welcome: [email protected]

12 The Exp3-IX Algorithm

In the last chapter we proved a sublinear bound on the expected regret of Exp3, but with a dishearteningly large variance. The objective of this chapter is to modify Exp3 so that the regret stays small in expectation and is simultaneously well concentrated about its mean. Such results are calledhigh probability bounds. By slightly modifying the algorithm we show that for eachδ∈(0,1) there exists an algorithm such that with probability at least 1−δ,

Rˆn= max a∈A

∑n

t=1

(ytAt−yta) =O

(√

nklog

(

k δ

))

One way to make Exp3 more robust is to ensure sure thatPtiis never too small. The first thing that comes to mind is to mixPtwith the uniform distribution. This is an explicit way of forcing exploration, which after further modification can be made to work. The resulting algorithm is called Exp3.P, which you will analyze in Exercise 12.1. We explore a similar approach for which the analysis is a little more straightforward. The idea is to change the reward estimates to control the variance at the price of introducing some bias.

12.1 The Exp3-IX algorithm

We start by summarizing what we know about the behavior of the random regret of Exp3. Because we want to use the loss-based estimator it is more convenient to switch to losses, which we do for the remainder of the chapter. Rewriting Eq. (11.15) in terms of losses,

Lˆn−Lˆni≤log(k) η

+

η 2

∑k

j=1

Lˆnj, (12.1)

whereLˆnandLˆniare defined using the loss estimatorYˆtjby

Lˆn=

∑n

t=1

∑k

j=1

PtjYˆtj and Lˆni=

∑n

t=1

Yˆti.

Bandit Algorithms

12 The Exp3-IX Algorithm

(√

(

))

12.1 The Exp3-IX algorithm

+

Get our desktop app

Company

Features

Documentation

Resources