Bandit Algorithms

(Jeff_L) #1
9.5 Exercises 125

(a)Show that for all 1-subgaussian bandits this new policy suffers regret at most


Rn≤C




i:∆i> 0

∆i+

1


∆i

log+(n∆^2 i)


,


whereC >0 is a universal constant.
(b)Under the same conditions as the previous part show there exists a universal
constantC >0 such that


Rn≤C


knlog(k) +

∑k

i=1

∆i.

(c) Repeat parts (a) and (b) using the index

μˆi(t−1) +


4


Ti(t−1)

log+

(


t
Ti(t−1)

)


.


9.4(Gaussian noise and the tangent approximation) Letg(t) =at+b
withb >0 and


u(x,t) =

1



2 πt

exp

(



x^2
2 t

)



1



2 πt

exp

(


− 2 ab−

(x− 2 b)^2
2 t

)


(a) Show thatu(x,t)>0 forx∈(−∞,g(t)) andu(x,t) = 0 forx=g(t).
(b) Show thatu(x,t) satisfies the heat equation:


∂tu(x,t) =

1


2 ∂


2
xu(x,t).
(c)LetBtbe a standard Brownian motion, which for any fixedthas density
with respect to the Lebesgue measure.

p(x,t) =

√^1


2 πt

exp

(



x^2
2 t

)


.


Defineτ=min{t:Bt=g(t)}be the first time the Brownian motion hits the
boundary. Put on your physicists hat (or work hard) to argue that

P(τ≥t) =

∫g(t)

−∞

u(x,t)dx.

(d)Letv(t) be the density of timeτwith respect to the Lebesgue measure so
thatP(τg≤t) =


∫t
0 v(t)dt. Show that

v(t) =√b
2 πt^3

exp

(


−g(t)

2
2 t

)


(e)In the last part you established the exact density of the hitting time of a
Brownian motion approaching a linear boundary. We now generalize this
to nonlinear boundaries, but at the cost that now we only have a bound.
Suppose thatf : [0,∞)→[0,∞) is concave and differentiable and let
λ:R→Rbe the intersection of the tangent tofattwith they-axis given
Free download pdf