18.4 Regret analysis 219expected regret of Exp4 defined in Algorithm 11 afternrounds. Then,
Rn≤√
2 nklog(M). (18.7)
After translating the notation, the proof of the following lemma can be extracted
from the analysis of Exp3 in the proof of Theorem 11.2 (Exercise 18.2).Lemma18.2.For anym∗∈[M], it holds that
∑nt=1X ̃tm∗−∑nt=1∑M
m=1QtmX ̃tm≤log(M)
η+
η
2∑nt=1∑M
m=1Ptm(1−Xˆtm)^2.Proof of Theorem 18.1 Let Ft = σ(E(1),A 1 ,E(2),A 2 ,...,At− 1 ,E(t)) and
abbreviateEt[·] =E[·|Ft]. Letm∗be the index of the best performing expert in
hindsight:m∗= argmaxm∈[M]∑nt=1Em(t)xt, (18.8)which is not random by the assumption that the experts are oblivious. Applying
Lemma 18.2 shows that
∑n
t=1X ̃tm∗−∑nt=1∑M
m=1QtmX ̃tm≤log(M)
η+η
2∑M
t=1∑M
m=1Qtm(1−X ̃tm)^2. (18.9)Whenγ= 0 the estimatorXˆtiis unbiased so thatEt[Xˆt] =xtand
Et[X ̃t] =Et[E(t)Xˆt] =E(t)E[Xˆt] =E(t)xt. (18.10)Taking the expectation of both sides of Eq. (18.9) and using the tower rule for
conditional expectation and the fact thatQtisFt-measurable leads to
Rn≤log(M)
η+
η
2∑nt=1∑M
m=1E
[
Qtm(1−X ̃tm)^2]
. (18.11)
Like in Chapter 11, it is more convenient to work with losses. LetYˆti= 1−Xˆti,
yti= 1−xtiandY ̃tm= 1−X ̃tm. Note thatY ̃t=E(t)Yˆtand recall the notation
Ati=I{At=i}, which means thatYˆti=AtiPtiyti andEt[Y ̃tm^2 ] =Et
(
EmA(t)tytAt
PtAt) 2
=
∑ki=1(
E(mit)yti) 2
Pti≤
∑ki=1E(mit)
Pti. (18.12)
Therefore, using the definition ofPti,
E
[M
∑
m=1Qtm(1−X ̃tm)^2]
≤E
[M
∑
m=1Qtm∑ki=1Emi(t)
Pti]
=E
[k
∑i=1∑M
m=1QtmE(t)
mi
Pti]
=k.