Bandit Algorithms

(Jeff_L) #1
9.1 The MOSS algorithm 119

Before the proof we state and prove a strengthened version of Corollary 5.5.

Theorem9.2. LetX 1 ,X 2 ,...,Xnbe a sequence of independentσ-subgaussian
random variables andSt=


∑t
s=1Xs. Then for anyε >^0 ,

P(existst≤n:St≥ε)≤exp

(



ε^2
2 nσ^2

)


. (9.1)


The bound in Eq. (9.1) is the same as the bound onP(Sn≥ε)that appears
in a simple reformulation of Corollary 5.5, so this new result is strictly stronger.

Proof From the definition of subgaussian random variables and Lemma 5.4,

E[exp (λSn)]≤exp

(


nσ^2 λ^2
2

)


.


Then choosingλ=ε/(nσ^2 ) leads to


P(existst≤n:St≥ε) =P

(


maxt≤nexp (λSt)≥exp (λε)

)



E[exp (λSn)]
exp (λε)

≤exp

(


nσ^2 λ^2
2

−λε

)


= exp

(



ε^2
2 nσ^2

)


.


The novel step is the first inequality, which follows from Doob’s submartingale
inequality (Theorem 3.10) and the fact that thatexp(λSt) is a submartingale
with respect to the filtration generated byX 1 ,X 2 ,...,Xn(Exercise 9.1).


Before the proof of Theorem 9.1 we need one more lemma to bound the
probability that the index of the optimal arm ever drops too far below the actual
mean of the optimal arm. The proof of this lemma relies on a tool called the
peeling device, which is an important technique in probability theory and has
many applications beyond bandits. For example, it can be used to prove the law
of the iterated logarithm.

Lemma9.3.Letδ∈(0,1)andX 1 ,X 2 ,...be independent and 1 -subgaussian and
μˆt=^1 t

∑t
s=1Xs. Then for any∆>^0 ,

P


(


existss≥1 : ˆμs+


4


slog

+

(


1



)


+ ∆≤ 0


)



15 δ
∆^2.
Free download pdf