Bandit Algorithms

3.4 Notes 50

live a very long life! One application of Doob’s optional stopping theorem is a useful and apriori surprising generalization of Markov’s inequality to nonnegative supermartingales.

Theorem3.9 (Maximal inequality). Let(Xt)∞t=0be a supermartingale with
Xt≥ 0 almost surely for al lt. Then for anyε > 0 ,

P

(

sup t∈N

Xt≥ε

)

≤

E[X 0 ]

ε Proof LetAnbe the event thatsupt≤nXt≥εandτ = (n+ 1)∧min{t≤ n:Xt≥ε}, where the minimum of an empty set is assumed to be infinite so thatτ =n+ 1 ifXt< εfor all 0≤t≤n. Clearlyτ is a stopping time and P(τ≤n+ 1) = 1. Then by Theorem 3.8 and elementary calculation,

E[X 0 ]≥E[Xτ]≥E[XτI{τ≤n}]≥E[εI{τ≤n}] =εP(τ≤n) =εP(An),

where the second inequality uses the definition of the stopping time and the
nonnegativity of the supermartingale. Rearranging shows thatP(An)≤E[X 0 ]/ε
for all n ∈ N. Since A 1 ⊆ A 2 ⊆ ...it follows that P(supt∈NXt≥ε) =
P(∪n∈NAn)≤E[X 0 ]/ε.

Markov’s inequality (which we will cover in the next chapter) combined with the definition of a supermartingale shows thatP(Xn≥ε)≤E[X 0 ]/ε. In fact, in the above we have effectively applied Markov’s inequality to the random variableXτ. The maximal inequality is a strict improvement by replacingXnwith supt∈NXtat no cost whatsoever.

A similar theorem holds for submartingales. You will provide a proof in Exercise 3.8.

Theorem3.10.Let(Xt)nt=0be a submartingale withXt≥ 0 almost surely for
al lt. Then for anyε > 0 ,

P

(

max t∈{ 0 , 1 ,...,n}

Xt≥ε

)

≤

E[Xn] ε

3.4 Notes

1 Some authors include in the definition of a stopping timeτthatP(τ <∞)= 1 and call random times without this propertyMarkov times. We donotadopt this convention and allow stopping times to be infinite with nonzero probability. Stopping times are also calledoptional times. 2 There are several notations for probability kernels depending on the application. The following are commonly seen and equivalent:K(x,A) =K(A|x) =Kx(A). For example, in statistics a parametric family is often given by{Pθ:θ∈Θ}

Bandit Algorithms

(

)

≤

E[X 0 ]

(

)

≤

3.4 Notes

Get our desktop app

Company

Features

Documentation

Resources