Bandit Algorithms

This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This pre-publication version is free to view and download for personal use only. Not for re-distribution, sale, or use in derivative works.©Tor Lattimore and Csaba Szepesvari 2017. The latest version is available athttp://banditalgs.com. Feedback on any aspect is very welcome: [email protected]

29 The Relation Between Adversarial and Stochastic Linear Bandits

The purpose of this chapter is to highlight some of the differences and connections between adversarial and stochastic linear bandits. As it turns out, the connection between these are not as straightforward as for finite-armed bandits. We focus on three topics:

(a)For fixed action sets there is a reduction from stochastic linear bandits to adversarial linear bandits. This does not come entirely for free. The action set needs to be augmented for things to work (Section 29.2). (b)The adversarial and stochastic settings make different assumptions about the variability of the losses/rewards. This will explain the apparently contradictory result that the upper bound for adversarial bandits on the unit ball is O(

√

dnlog(n)) (Theorem 28.11) while the lower bound for stochastic bandits is Ω(d

√

n) (Theorem 24.2). (c)When the action set is changing, the notion of regret in the adversarial setting must be carefully chosen, and for the ‘right’ choice we do not yet have effective algorithms (Section 29.4).

We start with a unified view of the two settings.

29.1 Unified view

To make the notation consistent, we present the stochastic and adversarial linear bandit frameworks again using losses for both. LetA⊂Rdbe the action set. In each round the learner choosesAt∈Aand receives the lossYt, where

Yt=〈At,θ〉+ηt, (Stochastic setting) (29.1) Yt=〈At,θt〉, (Adversarial setting) (29.2)

and (ηt)nt=1is a sequence of independent and identically distributed 1-subgaussian random variables and (θt)nt=1is a sequence of loss vectors chosen by the adversary. As noted earlier, the assumptions on the noise can be relaxed significantly. For example, ifFt=σ(A 1 ,Y 1 ,...,At,Yt,At+1), then the results of the previous chapters hold as soon asηtis 1-subgaussian conditioned onFt− 1. The expected

Bandit Algorithms

29 The Relation Between Adversarial and Stochastic Linear Bandits

√

√

29.1 Unified view

Get our desktop app

Company

Features

Documentation

Resources