Bandit Algorithms

(Jeff_L) #1
10.1 Concentration for sums of Bernoulli random variables 128

More generally, therelative entropyorKullback-Leibler divergence
is a measure of similarity between distributions. See Chapter 14 for a generic
definition, interpretation and discussion.

Lemma10.2.Letp,q,ε∈[0,1]. The fol lowing hold:

(a)The functionsd(·,q)andd(p,·)are convex and have unique minimizers atq
andprespectively.
(b)d(p,q)≥2(p−q)^2 (Pinsker’s inequality).
(c) Ifp≤q−ε≤q, thend(p,q−ε)≤d(p,q)−d(q−ε,q)≤d(p,q)− 2 ε^2.


Proof We assume thatp,q ∈ (0,1). The corner cases are easily checked
separately. Part (a):d(·,q) is the sum of the negative binary entropy function
h(p) =plogp+ (1−p)log(1−p) and a linear function. The second derivative
ofhish′′(p) = 1/p+ 1/(1−p), which is positive and hencehis convex. For
fixedpthe functiond(p,·) is the sum ofh(p) and convex functionsplog(1/q) and
(1−p)log(1/(1−q)). Henced(p,·) is convex. The minimizer property follows
becaused(p,q)>0 unlessp=qin which cased(p,p) =d(q,q) = 0. A more
general version of (b) is given in Chapter 15. A proof of the simple version here
follows by considering the functiong(x) =d(p,p+x)− 2 x^2 , which obviously
satisfiesg(0) = 0. The proof is finished by showing that this is the unique
minimizer ofgover the interval [−p, 1 −p]. The details are left to Exercise 10.1.
For (c) notice that


h(p) =d(p,q−ε)−d(p,q) =plog

q
q−ε

+ (1−p) log

1 −q
1 −q+ε

.


It is easy to see then thathis linear and increasing in its argument. Therefore,
sincep≤q−ε,

h(p)≤h(q−ε) =−d(q−ε,q)

as required for the first inequality of (c). The second inequality follows by using
the result in (b).

The next lemma controls the concentration of the sample mean of a sequence
of independent and identically distributed Bernoulli random variables.

Lemma10.3 (Chernoff’s bound).LetX 1 ,X 2 ,...,Xnbe a sequence of independent
random variables that are Bernoulli distributed with meanμ and let μˆ =
1
n


∑n
t=1Xtbe the sample mean. Then forε∈[0,^1 −μ]it holds that
P(ˆμ≥μ+ε)≤exp (−nd(μ+ε,μ)) (10.1)

and forε∈[0,μ]it holds that


P(ˆμ≤μ−ε)≤exp (−nd(μ−ε,μ)). (10.2)
Free download pdf