This material will be published by Cambridge University Press as Bandit
Algorithms by Tor Lattimore and Csaba Szepesvari. This pre-publication version
is free to view and download for personal use only. Not for re-distribution, sale,
or use in derivative works.©Tor Lattimore and Csaba Szepesvari 2017.
The latest version is available athttp://banditalgs.com. Feedback on any
aspect is very welcome: [email protected]
27 Exp3 for Adversarial Linear Bandits
The model for adversarial linear bandits is as follows. The learner is given an
action setA⊂Rdand the number of roundsn. As usual in the adversarial setting,
it is convenient to switch to losses. An instance of the adversarial problem is a
sequence of loss vectorsy 1 ,...,yntaking values inRd. In each roundt∈[n] the
learner selects a possibly random actionAt∈Aand observes a lossYt=〈At,yt〉.
The learner does not observe the loss vectoryt. The regret of the learner aftern
rounds is
Rn=E
[n
∑
t=1
Yt
]
−min
a∈A
∑n
t=1
〈a,yt〉.
Clearly, the finite-armed adversarial bandits discussed in Chapter 11 is a special
case of adversarial linear bandits corresponding to the choiceA={e 1 ,...,ed}
wheree 1 ,...,edare the unit vectors of thed-dimensional standard Euclidean
basis.
For this chapter we assume that
(a) For allt∈[n] the loss satisfiesyt∈L=
{
x∈Rd: supa∈A|〈a,x〉|≤ 1
}
.
(b) The action setAspansRd.
The latter assumption is for convenience only and may be relaxed with a
little care (Exercise 27.7).
27.1 Exponential weights for linear bandits
We adapt the exponential weighting algorithm of Chapter 11. Like in that setting
we need a way to estimate the individual losses for each action, but now we make
use of the linear structure to share information between the arms and decrease
the variance of our estimators. For now we assume thatAis finite, which we
relax in Section 27.3. Lett∈[n] be the index of the current round. Assuming
the loss estimate for actiona∈Ain rounds∈[n] isYˆs(a), then the probability
distribution proposed by exponential weights is given by the probability mass