12.2 Regret analysis 161
statement of which requires us to introduce the notion of adapted and predictable
sequences of random variables.
Lemma12.2.LetF= (Ft)nt=0be a filtration and fori∈[k]let(Y ̃ti)tbeF-adapted
such that:
1 For anyS⊂[k]with|S|> 2 ,E
[∏
i∈SY ̃ti
∣∣
Ft− 1
]
≤ 0.
2 E
[ ̃
Yti
∣∣
Ft− 1
]
=ytifor allt∈[n]andi∈[k].
Furthermore, let(αti)tiand(λti)tibe real-valuedF-predictable random sequences
such that for al lt,iit holds that 0 ≤αtiY ̃ti≤ 2 λti. Then for al lδ∈(0,1),
P
(n
∑
t=1
∑k
i=1
αti
( ̃
Yti
1 +λti−yti
)
≥log
(
1
δ
))
≤δ.
The proof relies on the Cramer-Chernoff method and is deferred until the end
of the chapter. Equipped with this result we can easily bound the termsLˆni−Lni.
Lemma12.3.Letδ∈(0,1). With probability at least 1 −δthe fol lowing inequalities
hold simultaneously:
max
i∈[k]
(
Lˆni−Lni
)
≤
log(k+1δ )
2 γ
and
∑k
i=1
(
Lˆni−Lni
)
≤
log(k+1δ )
2 γ
. (12.7)
Proof Fixδ′∈(0,1) to be chosen later. Then
∑k
i=1
(Lˆni−Lni) =
∑n
t=1
∑k
i=1
(
Atiyti
Pti+γ
−yti
)
=
1
2 γ
∑n
t=1
∑k
i=1
2 γ
(
1
1 +Pγti
Atiyti
Pti −yti
)
.
Introduceλti=Pγti,Y ̃ti=APtitiyti andαti= 2γ. Notice that the conditions of
Lemma 12.2 are now satisfied. In particular, for anyS⊂[k] with|S|>1 it holds
that
∏
i∈SAti= 0 and hence
∏
i∈SY ̃ti= 0. Therefore
P
(k
∑
i=1
(Lˆni−Lni)≥log(1/δ
′)
2 γ
)
≤δ′. (12.8)
Similarly, for any fixedi,
P
(
Lˆni−Lni≥log(1/δ
′)
2 γ
)
≤δ′. (12.9)
To see this use the previous argument withαtj=I{j=i} 2 γ. The result follows
by choosingδ′=δ/(k+ 1) and the union bound.
Lemma12.4.L ̃n−Lˆn=γ
∑k
j=1Lˆnj.