Bandit Algorithms

142

bounded byC

√

nklog(n)? The lazy way is to push part of the proof into the assumptions. For UCB this might mean replacing a subgaussian assumption with a condition that the data generating process for each arm satisfies the conclusion of the core concentration result (Corollary 5.5). A more ambitious goal is to define the subset of rewards for which the regret is bounded by some value and try to characterize this set. To our knowledge these ideas have not been explored in bandits and barely at all in machine learning more broadly.

Bibliographic remarks

The quote by George Box was used several times with different phrasings [Box,
1976, 1979]. The adversarial framework has its roots in game theory with familiar
names like Hannan [1957] and Blackwell [1954] producing some of the early
work. The nonstatistical approach has enjoyed enormous popularity since the
1990’s and has been adopted wholeheartedly by the theoretical computer science
community [Vovk, 1990, Littlestone and Warmuth, 1994, and many many others].
For bandits the earliest work that we know of is by Auer et al. [1995]. There is now
a big literature on adversarial bandits, which we will cover in more depth in the
chapters that follow. There has been a lot of effort to move away from stochastic
assumptions. An important aspect of this is to define a sense of regularity for
individual sequences. We refer the reader to some of the classic papers by Martin-
L ̈of [1966], Levin [1973] and the more recent paper by Ivanenko and Labkovsky
[2013].

Bandit Algorithms

√

Bibliographic remarks

Get our desktop app

Company

Features

Documentation

Resources