Bandit Algorithms

This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This pre-publication version is free to view and download for personal use only. Not for re-distribution, sale, or use in derivative works.©Tor Lattimore and Csaba Szepesvari 2017. The latest version is available athttp://banditalgs.com. Feedback on any aspect is very welcome: [email protected]

5 Concentration of Measure

Before we can start designing and analyzing algorithms we need one more tool from probability theory calledconcentration of measure. Recall that the optimal action is the one with the largest mean. Since the mean payoffs are initially unknown, they must be learned from data. How long does it take to learn about the mean reward of an action? In this section, after introducing the notion of tail probabilities, we look at ways of obtaining upper bounds on them. The main point is to introduce subgaussian random variables and the Cramer-Chernoff (exponential) tail inequalities, which will play a central role in the design and analysis of the various bandit algorithms.

5.1 Tail probabilities

Suppose thatX,X 1 ,X 2 ,...,Xnis a sequence of independent and identically distributed random variables and assume that the meanμ=E[X] and variance σ^2 =V[X] exist. Having observedX 1 ,X 2 ,...,Xnwe would like to estimate the common meanμ. The most natural estimator is

μˆ=

1

n

∑n

i=1

Xi,

which is called thesample meanorempirical mean. Linearity of expectation (Proposition 2.6) shows thatE[μˆ] =μ, which means thatμˆis anunbiased estimator ofμ. How far fromμdo we expectμˆto be? A simple measure of the spread of the distribution of a random variable Z is its variance, V[Z] =E

[

(Z−E[Z])^2

]

. A quick calculation using independence shows that

V[ ˆμ] =E

[

(ˆμ−μ)^2

]

=

σ^2 n

, (5.1)

which means that we expect the squared distance betweenμandμˆto shrink as ngrows large at a rate of 1/nand scale linearly with the variance ofX. While the expected squared error is important, it does not tell us very much about the distribution of the error. To do this we usually analyze the probability thatˆμ overestimates or underestimatesμby more than some valueε >0. Precisely, how

Bandit Algorithms

5 Concentration of Measure

5.1 Tail probabilities

1

[

(Z−E[Z])^2

]

[

]

=

, (5.1)

Get our desktop app

Company

Features

Documentation

Resources