Bandit Algorithms

(Jeff_L) #1
20.4 Exercises 249

(d) Find anfsuch that


∫∞


0 f(λ)dλ= 1 andf(λ)≥0 for allλ∈Rand

log

(


1


λf(λ)

)


= (1 +o(1)) log log

(


1


λ

)


asλ→0.
(e) Use the previous results to show that

P


(


lim sup
n→∞

√ Sn
2 nlog log(n)

≤ 1


)


= 1.


The last part of the previous exercises is one half of the statement of the
law of iterated logarithm, which states thatlim supn→∞√ 2 nlog log(Sn n)= 1
happens to be true with probability one. In words, the magnitudes of the
largest fluctuations of the partial sum (√ Sn)nis almost surely of the order
2 nlog lognasn→∞.

20.11 LetF= (Ft)nt=0be a filtration andX 1 ,X 2 ,...,Xnbe a sequence of
F-adapted random variables withXt∈{− 1 , 0 , 1 }andμt=E[Xt|Ft− 1 ,Xt 6 = 0],
which we define to be zero whenever∑ P(Xt 6 = 0|Ft− 1 )= 0. Then withSt=
t
s=1(Xs−μs|Xs|) andNt=


∑t
s=1|Xs|,

P



existst≤n:|St|≥


2 Ntlog

(


c


Nt
δ

)


andNt> 0


≤δ ,

wherec >0 is a universal constant.


This result appeared in a paper by the authors and others with the constant
c= 4


2 /π/erf(


2)≈ 3 .43 [Lattimore et al., 2018].

20.12(Sequential likelihood ratios and confidence sets) Let Θ be
a set and for eachθ∈Θ,pθbe a density over the reals with respect to the
common reference measureμ. Let (Xt)tbe an independent sequence of random
variables so thatXt∼pθ∗dμand letθˆt∈Θ beσ(X 1 ,...,Xt)-measurable (think
of∑θˆtas an ‘estimate’ ofθ∗based onX 1 ,...,Xt). Forθ∈Θ, defineLt(θ) =
t
s=1log(pθˆs− 1 (Xs)/pθ(Xs)). Show thatP(Lt(θ∗)≥log(1/δ) for somet≥1)≤
δfor anyδ∈(0,1). Hence, Θt ={θ : Lt(θ)<log(1/δ)}is a sequence of
confidence sets such thatP(existst∈Nsuch thatθ∗6∈Θt)≤δ.

Hint Use Cramer-Chernoff method (λplays no role here) and observe that
(exp(Lt(θ∗)))t∈Nis a martingale.

Free download pdf