Bandit Algorithms

This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This pre-publication version is free to view and download for personal use only. Not for re-distribution, sale, or use in derivative works.©Tor Lattimore and Csaba Szepesvari 2017. The latest version is available athttp://banditalgs.com. Feedback on any aspect is very welcome: [email protected]

27 Exp3 for Adversarial Linear Bandits

The model for adversarial linear bandits is as follows. The learner is given an action setA⊂Rdand the number of roundsn. As usual in the adversarial setting, it is convenient to switch to losses. An instance of the adversarial problem is a sequence of loss vectorsy 1 ,...,yntaking values inRd. In each roundt∈[n] the learner selects a possibly random actionAt∈Aand observes a lossYt=〈At,yt〉. The learner does not observe the loss vectoryt. The regret of the learner aftern rounds is

Rn=E

[n ∑

t=1

Yt

]

−min a∈A

∑n

t=1

〈a,yt〉.

Clearly, the finite-armed adversarial bandits discussed in Chapter 11 is a special case of adversarial linear bandits corresponding to the choiceA={e 1 ,...,ed} wheree 1 ,...,edare the unit vectors of thed-dimensional standard Euclidean basis.

For this chapter we assume that (a) For allt∈[n] the loss satisfiesyt∈L=

{

x∈Rd: supa∈A|〈a,x〉|≤ 1

}

.

(b) The action setAspansRd.

The latter assumption is for convenience only and may be relaxed with a little care (Exercise 27.7).

27.1 Exponential weights for linear bandits

We adapt the exponential weighting algorithm of Chapter 11. Like in that setting we need a way to estimate the individual losses for each action, but now we make use of the linear structure to share information between the arms and decrease the variance of our estimators. For now we assume thatAis finite, which we relax in Section 27.3. Lett∈[n] be the index of the current round. Assuming the loss estimate for actiona∈Ain rounds∈[n] isYˆs(a), then the probability distribution proposed by exponential weights is given by the probability mass

Bandit Algorithms

27 Exp3 for Adversarial Linear Bandits

]

{

}

.

27.1 Exponential weights for linear bandits

Get our desktop app

Company

Features

Documentation

Resources