289
In the next few chapters we will consider adversarial linear bandits, which
superficially can be thought of as the adversarial version of the stochastic linear
bandit. Indeed, the techniques in this part combine the ideas of optimal design
presented in Chapter 22 with the exponential weighting algorithm of Chapter 11.
The intuitions gained by studying stochastic bandits should not be taken too
seriously here, however. There are subtle differences between the model of
adversarial bandits introduced here and the stochastic linear bandits examined
in previous chapters. These differences will be discussed at length in Chapter 29.
The adversarial version of the linear bandits turns out to be remarkably rich,
both because of the complex information structure and because of the challenging
computational issues.
The part is split into four chapters, the first of which is an introduction to the
necessary tools from convex analysis and optimization. In the first chapter on
bandits we show how to combine the core ideas of the Exp3 policy of Chapter 11
with the optimal experimental design for least squares estimators in Chapter 21.
When the number of actions is large (or infinite), the approach based on Exp3 is
hard to make efficient. These shortcomings are addressed in the next chapter where
we introduce the mirror descent and follow-the-regularized leader algorithms for
bandits and show how they can be used to design efficient algorithms. We conclude
the part with a discussion on the relationship between adversarial and stochastic
linear bandits, which is more subtle than the situation with finite-armed bandits.