Bandit Algorithms

(Jeff_L) #1
142

bounded byC


nklog(n)? The lazy way is to push part of the proof into the
assumptions. For UCB this might mean replacing a subgaussian assumption
with a condition that the data generating process for each arm satisfies the
conclusion of the core concentration result (Corollary 5.5). A more ambitious
goal is to define the subset of rewards for which the regret is bounded by some
value and try to characterize this set. To our knowledge these ideas have not
been explored in bandits and barely at all in machine learning more broadly.

Bibliographic remarks


The quote by George Box was used several times with different phrasings [Box,
1976, 1979]. The adversarial framework has its roots in game theory with familiar
names like Hannan [1957] and Blackwell [1954] producing some of the early
work. The nonstatistical approach has enjoyed enormous popularity since the
1990’s and has been adopted wholeheartedly by the theoretical computer science
community [Vovk, 1990, Littlestone and Warmuth, 1994, and many many others].
For bandits the earliest work that we know of is by Auer et al. [1995]. There is now
a big literature on adversarial bandits, which we will cover in more depth in the
chapters that follow. There has been a lot of effort to move away from stochastic
assumptions. An important aspect of this is to define a sense of regularity for
individual sequences. We refer the reader to some of the classic papers by Martin-
L ̈of [1966], Levin [1973] and the more recent paper by Ivanenko and Labkovsky
[2013].

Free download pdf