Bandit Algorithms

(Jeff_L) #1
3.3 Martingales and stopping times 49

the gambler at the end of roundtcan decide to stop (δt= 1) or continue (δt= 0)
based on the information available to them. Denoting byτ=min{t:δt= 1}
the time when the gambler stops, the question is whether by a clever choice
of (δt)t∈N,E[Sτ]can be made positive. Here, (δt)t∈N, a sequence of binary,F-
adapted random variables, is called astopping rule, whileτis a stopping time
with respectF. Note that the stopping rule is not allowed to inject additional
randomness beyond what is already there inF.

Definition3.6.LetF= (Ft)t∈Nbe a filtration. A random variableτwith values
inN∪{∞}is astopping timewith respect toFifI{τ≤t}isFt-measurable
for allt∈N. Theσ-algebra at stopping timeτis

Fτ={A∈F∞:A∩{τ≤t}∈Ftfor allt}.

The filtration is usually indicated by writing ‘τis aF-stopping time’. When
the underlying filtration is obvious from context it may be omitted. This is
also true for martingales.

Using the interpretation ofσ-algebras encoding information, if (Ft)tis thought
of as the knowledge available at timet,Fτis the information available at the
random timeτ. Exercise 3.7 asks you to explore properties of stoppedσ-algebras;
amongst other things, it asks you to show thatFτis in fact aσ-algebra.

Example 3.7. In the gambler example, the first time when the gambler’s
winnings hits 100 is a stopping time:τ =min{t:St= 100}. On the other
hand,τ=min{t:St+1=− 1 }is not a stopping time becauseI{τ=t}is not
Ft-measurable.

To answer to the previous question whetherE[Sτ]can be made positive by a
clever choice of a stopping timeτis answered in the negative by a fundamental
theorem of Doob:

Theorem3.8 (Doob’s optional stopping). LetF= (Ft)t∈Nbe a filtration and
(Xt)t∈Nbe anF-adapted martingale andτanF-stopping time such that at least
one of the fol lowing holds:
(a) There exists ann∈Nsuch thatP(τ > n) = 0.
(b)E[τ]< ∞and there exists a constantc ∈Rsuch that for all t∈ N,
E[|Xt+1−Xt||Ft]≤calmost surely on the event thatτ > t.
(c) There exists a constantcsuch that|Xt∧τ|≤calmost surely for al lt∈N.

ThenXτis almost surely well defined andE[Xτ] =E[X 0 ]. Furthermore, when
(Xt)is a super/sub-martingale rather than a martingale, then equality is replaced
with less/greater-than, respectively.


The theorem implies that ifSτ is almost-surely well defined then either
E[τ]=∞orE[Sτ]= 0. Gamblers trying to outsmart the casino would need to
Free download pdf