Bandit Algorithms

(Jeff_L) #1
20.4 Exercises 248

20.8(Extension of Hoeffding–Azuma) The following simple extension
of Hoeffding–Azuma is often useful. Letn∈N+and (at) and (bt) be fixed
sequences withat≤btfor allt∈[n]. LetX 1 ,...,Xnbe a sequence of random
variables adapted to a filtrationF= (Ft)tandAbe an event. Assume that
P(existst∈[n] :AandXt∈/[at,bt]) = 0 andε >0 and show that

(a)P


(


A∩


∑n

t=1

(Xt−E[Xt|Ft− 1 ])≥ε

)


≤exp

(


−^2 n

(^2) ε 2
∑n
t=1(bt−at)^2


)


.


(b)P


(n

t=1

(Xt−E[Xt|Ft− 1 ])≥ε

)


≤P(Ac) + exp

(



∑^2 n^2 ε^2
n
t=1(bt−at)^2

)


.


The utility of this result comes from the fact that very often the range of
some adapted sequence is itself random and could be arbitrarily large with
low probability (whenAdoes not hold). A reference for the above result is
the survey by McDiarmid [1998].

20.9 Letδ∈(0,1) andF= (Ft)∞t=1be a filtration and (Xt)∞t=1beF-adapted
such that

for allλ∈R, E[exp(λX)|Ft− 1 ]≤exp(λ^2 σ^2 /2)a.s..

LetSn=

∑n
t=1Xt. Show that

P



existst:|St|≥

√√


√√


2 σ^2 (t+ 1) log

(√


tσ^2 + 1
δ

)


≤δ.

20.10(Law of the iterated logarithm and method of mixtures) This
exercise uses the same notation as Exercise 20.9. Letfbe a probability density
function supported on [0,∞) and

Mn=

∫∞


0

f(λ) exp

(


λSn−λ

(^2) n
2


)


dλ.

(a) Show that argmaxλ∈RλSn−λ^2 n/2 =Sn/n.
(b)Suppose thatf(λ) is decreasing forλ >0. Show that for anyε >0 and
Λn=Sn/nthat,


Mn≥εΛnf(Λn(1 +ε)) exp

(


(1−ε^2 )S^2
2 n

)


(c) Use the previous result to show that for anyδ∈(0,1)

P


(


existsn:Sn≥ε>inf 0


2 n
(1−ε^2 )

(


log

(


1


δ

)


+ log

(


1


εΛnf(Λn(1 +ε))

)))


≤δ.
Free download pdf