This material will be published by Cambridge University Press as Bandit
Algorithms by Tor Lattimore and Csaba Szepesvari. This pre-publication version
is free to view and download for personal use only. Not for re-distribution, sale,
or use in derivative works.©Tor Lattimore and Csaba Szepesvari 2017.
The latest version is available athttp://banditalgs.com. Feedback on any
aspect is very welcome: [email protected]
29 The Relation Between Adversarial and Stochastic Linear Bandits
The purpose of this chapter is to highlight some of the differences and connections
between adversarial and stochastic linear bandits. As it turns out, the connection
between these are not as straightforward as for finite-armed bandits. We focus
on three topics:
(a)For fixed action sets there is a reduction from stochastic linear bandits to
adversarial linear bandits. This does not come entirely for free. The action
set needs to be augmented for things to work (Section 29.2).
(b)The adversarial and stochastic settings make different assumptions about the
variability of the losses/rewards. This will explain the apparently contradictory
result that the upper bound for adversarial bandits on the unit ball is
O(
√
dnlog(n)) (Theorem 28.11) while the lower bound for stochastic bandits
is Ω(d
√
n) (Theorem 24.2).
(c)When the action set is changing, the notion of regret in the adversarial setting
must be carefully chosen, and for the ‘right’ choice we do not yet have effective
algorithms (Section 29.4).
We start with a unified view of the two settings.
29.1 Unified view
To make the notation consistent, we present the stochastic and adversarial linear
bandit frameworks again using losses for both. LetA⊂Rdbe the action set. In
each round the learner choosesAt∈Aand receives the lossYt, where
Yt=〈At,θ〉+ηt, (Stochastic setting) (29.1)
Yt=〈At,θt〉, (Adversarial setting) (29.2)
and (ηt)nt=1is a sequence of independent and identically distributed 1-subgaussian
random variables and (θt)nt=1is a sequence of loss vectors chosen by the adversary.
As noted earlier, the assumptions on the noise can be relaxed significantly. For
example, ifFt=σ(A 1 ,Y 1 ,...,At,Yt,At+1), then the results of the previous
chapters hold as soon asηtis 1-subgaussian conditioned onFt− 1. The expected