12.2 Regret analysis 161statement of which requires us to introduce the notion of adapted and predictable
sequences of random variables.Lemma12.2.LetF= (Ft)nt=0be a filtration and fori∈[k]let(Y ̃ti)tbeF-adapted
such that:
1 For anyS⊂[k]with|S|> 2 ,E[∏
i∈SY ̃ti∣∣
Ft− 1]
≤ 0.
2 E
[ ̃
Yti∣∣
Ft− 1]
=ytifor allt∈[n]andi∈[k].Furthermore, let(αti)tiand(λti)tibe real-valuedF-predictable random sequences
such that for al lt,iit holds that 0 ≤αtiY ̃ti≤ 2 λti. Then for al lδ∈(0,1),
P
(n
∑t=1∑ki=1αti( ̃
Yti
1 +λti−yti)
≥log(
1
δ))
≤δ.The proof relies on the Cramer-Chernoff method and is deferred until the end
of the chapter. Equipped with this result we can easily bound the termsLˆni−Lni.Lemma12.3.Letδ∈(0,1). With probability at least 1 −δthe fol lowing inequalities
hold simultaneously:
max
i∈[k](
Lˆni−Lni)
≤
log(k+1δ )
2 γand∑ki=1(
Lˆni−Lni)
≤
log(k+1δ )
2 γ. (12.7)
Proof Fixδ′∈(0,1) to be chosen later. Then∑ki=1(Lˆni−Lni) =∑nt=1∑ki=1(
Atiyti
Pti+γ−yti)
=
1
2 γ∑nt=1∑ki=12 γ(
1
1 +PγtiAtiyti
Pti −yti)
.
Introduceλti=Pγti,Y ̃ti=APtitiyti andαti= 2γ. Notice that the conditions of
Lemma 12.2 are now satisfied. In particular, for anyS⊂[k] with|S|>1 it holds
that∏
i∈SAti= 0 and hence∏
i∈SY ̃ti= 0. ThereforeP
(k
∑i=1(Lˆni−Lni)≥log(1/δ′)
2 γ)
≤δ′. (12.8)Similarly, for any fixedi,P(
Lˆni−Lni≥log(1/δ′)
2 γ)
≤δ′. (12.9)To see this use the previous argument withαtj=I{j=i} 2 γ. The result follows
by choosingδ′=δ/(k+ 1) and the union bound.
Lemma12.4.L ̃n−Lˆn=γ∑k
j=1Lˆnj.