Bandit Algorithms

(Jeff_L) #1
12.2 Regret analysis 161

statement of which requires us to introduce the notion of adapted and predictable
sequences of random variables.

Lemma12.2.LetF= (Ft)nt=0be a filtration and fori∈[k]let(Y ̃ti)tbeF-adapted
such that:


1 For anyS⊂[k]with|S|> 2 ,E

[∏


i∈SY ̃ti

∣∣


Ft− 1

]


≤ 0.


2 E


[ ̃


Yti

∣∣


Ft− 1

]


=ytifor allt∈[n]andi∈[k].

Furthermore, let(αti)tiand(λti)tibe real-valuedF-predictable random sequences
such that for al lt,iit holds that 0 ≤αtiY ̃ti≤ 2 λti. Then for al lδ∈(0,1),


P


(n

t=1

∑k

i=1

αti

( ̃


Yti
1 +λti−yti

)


≥log

(


1


δ

))


≤δ.

The proof relies on the Cramer-Chernoff method and is deferred until the end
of the chapter. Equipped with this result we can easily bound the termsLˆni−Lni.

Lemma12.3.Letδ∈(0,1). With probability at least 1 −δthe fol lowing inequalities
hold simultaneously:


max
i∈[k]

(


Lˆni−Lni

)



log(k+1δ )
2 γ

and

∑k

i=1

(


Lˆni−Lni

)



log(k+1δ )
2 γ

. (12.7)


Proof Fixδ′∈(0,1) to be chosen later. Then

∑k

i=1

(Lˆni−Lni) =

∑n

t=1

∑k

i=1

(


Atiyti
Pti+γ

−yti

)


=


1


2 γ

∑n

t=1

∑k

i=1

2 γ

(


1


1 +Pγti

Atiyti
Pti −yti

)


.


Introduceλti=Pγti,Y ̃ti=APtitiyti andαti= 2γ. Notice that the conditions of
Lemma 12.2 are now satisfied. In particular, for anyS⊂[k] with|S|>1 it holds
that


i∈SAti= 0 and hence


i∈SY ̃ti= 0. Therefore

P


(k

i=1

(Lˆni−Lni)≥log(1/δ

′)


2 γ

)


≤δ′. (12.8)

Similarly, for any fixedi,

P

(


Lˆni−Lni≥log(1/δ

′)


2 γ

)


≤δ′. (12.9)

To see this use the previous argument withαtj=I{j=i} 2 γ. The result follows
by choosingδ′=δ/(k+ 1) and the union bound.


Lemma12.4.L ̃n−Lˆn=γ

∑k
j=1Lˆnj.
Free download pdf