Bandit Algorithms

(Jeff_L) #1
5.6 Exercises 81

(b) Prove Hoeffding’s inequality (5.8).


Hint For part (a) it suffices to prove thatψX(λ)≤λ^2 (b−a)^2 /4. By Taylor’s
theorem, for someλ′between 0 andλ,ψX(λ) =ψX(0) +ψ′X(0)λ+ψ′′X(λ′)λ^2 /2.
To bound the last term, introduce the distributionPλforλ∈Rarbitrary:
Pλ(dz) =e−ψX(λ)eλzP(dz). Show that Ψ′′X(λ) =V[Z] whereZ∼Pλ. Now,
sinceZ ∈[a,b] with probability one, argue (without relying onE[Z]) that
V[Z]≤(b−a)^2 /4.


5.12(Subgaussianity of Bernoulli distribution) LetXpbe a Bernoulli
distribution with meanp, which means thatP(X= 1)=pandP(X= 0)= 1−p.

(a) Show thatXpis 1/2-subgaussian for allp.


(b)LetQ: [0,1]→[0, 1 /2] be the function given byQ(p) =


√ 1 − 2 p
2 ln((1−p)/p)
where undefined points are defined in terms of their limits. Show thatXpis
Q(p)-subgaussian.
(c)PlotQ(p) as a function ofp. How does it compare to


V[Xp]=


p(1−p)?

Readers looking for a hint to part (b) in the previous exercise might like to
look at the short paper by Ostrovsky and Sirota [2014].

5.13(Central limit theorem for sums of Bernoulli random variables)
In this question we try to understand the concentration of the empirical mean
for Bernoulli random variables. LetX 1 ,X 2 ,...,Xnbe independent Bernoulli
random variables with meanp∈[0,1] andpˆn=

∑n
t=1Xt/n. LetZnbe normally
distributed random variable with meanpand variancep(1−p)/n.

(a) Write down expressions forE[ ˆpn] andV[ ˆpn].


(b)What does the central limit theorem say about the relationship betweenpˆn
andZnasngets large?
(c)For eachp∈{ 1 / 10 , 1 / 2 }andδ= 1/100 and ∆ = 1/10 find the minimumn
such thatP( ˆpn≥p+ ∆)≤δ.
(d) Letp= 1/10 and ∆ = 1/10 and


n 1 (δ,p,∆) = min{n:P( ˆpn≥p+ ∆)≤δ}
n 2 (δ,p,∆) = min{n:P(Zn≥p+ ∆)≤δ}.

(i) Evaluate empirically or analytically the value of

δlim→ 0

n 1 (δ, 1 / 10 , 1 /10)
n 2 (δ, 1 / 10 , 1 /10)

(ii)In light of the central limit theorem, explain why the answer you got in (i)
was not 1.
Free download pdf