Bandit Algorithms

(Jeff_L) #1
86

Over the next few chapters we introduce the fundamental algorithms and
tools of analysis for unstructured stochastic bandits with finitely many actions.
The keywords here are finite, unstructured and stochastic. The first of these
just means that the number of actions available is finite. The second is more
ambiguous, but roughly means that choosing one action yields no information
about the mean payoff of the other arms. A bandit is stochastic if the sequence
of rewards associated with each action is independent and identically distributed
according to some distribution. This latter assumption will be relaxed in Part III.
There are several reasons to study this class of bandit problems. First, their
simplicity makes them relatively easy to analyze and permits a deep understanding
of the tradeoff between exploration and exploitation. Second, many of the
algorithms designed for finite-armed bandits, and the principle underlying them,
can be generalized to other settings. Finally, finite-armed bandits already have
applications. Notably as a replacement to A/B testing as discussed in the
introduction.

Free download pdf