35.2 Optimal stopping ( ) 412
Theorem35.1.For any priorQ,
lim sup
n→∞
BR∗n(Q)
n^1 /^2
= 0.
Furthermore, there exists a priorQsuch that for al lε > 0 ,
lim infn→∞
BR∗n(Q)
n^1 /^2 −ε
=∞.
The lower bound has a worst-case flavor in the sense it holds for a specific prior.
The prior that yields the lower bound is a little unnatural because it assigns the
overwhelming majority of its mass to bandits with small suboptimality gaps. In
particular,Q({ν∈E: ∆min(ν)≤∆})≥c/log(1/∆) for some constantc >0. For
more regular priors the Bayesian optimal regret satisfiesBR∗n(Q) = Θ(log^2 (n)).
See the bibliographic remarks for pointers to the literature.
35.2 Optimal stopping ( )
Let (Ut)nt=1be a sequence of random variables adapted to filtrationF= (Ft)nt=1.
Optimal stopping is concerned with finding solutions to optimization problems of
the following form:
sup
τ∈Rn 1
E[Uτ], (35.1)
whereRn 1 is the set ofF-stopping timesτ with 1≤τ≤n. Whennis finite
the situation is conceptually straightforward. The idea is to usebackwards
inductionto define the expected optimal utility conditioned on the information
inFtstarting fromt=nand working backwards tot= 1. TheSnell envelope
is a sequence of random variables (Et)nt=1defined by
Et=
{
Un, ift=n;
max{Ut,E[Et+1|Ft]}, otherwise.
Theorem35.2. Assume thatnis finite andUtis integrable for allt∈[n].
Then the stopping timeτ=min{t∈[n] :Ut=Et}achieves the supremum in
Eq.(35.1).
Backwards induction is not directly applicable when the horizon is infinite.
There are several standard ways around this problem. The most convenient for
our purposes is to introduce a Markov structure. Let{Px:x∈S}be a Markov
kernel from a Borel space (S,G) to itself andu:S →Ris aG/B(R)-measurable
function. Let (St,Ut)∞t=1be a sequence of random elements taking values in
(S×R,G⊗B(R)) such that the following hold:
(a)P(St+1∈·|S 1 ,...,St) =PSt(·) almost surely.
(b)Ut=u(St).