18.4 Regret analysis 219
expected regret of Exp4 defined in Algorithm 11 afternrounds. Then,
Rn≤
√
2 nklog(M). (18.7)
After translating the notation, the proof of the following lemma can be extracted
from the analysis of Exp3 in the proof of Theorem 11.2 (Exercise 18.2).
Lemma18.2.For anym∗∈[M], it holds that
∑n
t=1
X ̃tm∗−
∑n
t=1
∑M
m=1
QtmX ̃tm≤
log(M)
η
+
η
2
∑n
t=1
∑M
m=1
Ptm(1−Xˆtm)^2.
Proof of Theorem 18.1 Let Ft = σ(E(1),A 1 ,E(2),A 2 ,...,At− 1 ,E(t)) and
abbreviateEt[·] =E[·|Ft]. Letm∗be the index of the best performing expert in
hindsight:
m∗= argmaxm∈[M]
∑n
t=1
Em(t)xt, (18.8)
which is not random by the assumption that the experts are oblivious. Applying
Lemma 18.2 shows that
∑n
t=1
X ̃tm∗−
∑n
t=1
∑M
m=1
QtmX ̃tm≤log(M)
η
+η
2
∑M
t=1
∑M
m=1
Qtm(1−X ̃tm)^2. (18.9)
Whenγ= 0 the estimatorXˆtiis unbiased so thatEt[Xˆt] =xtand
Et[X ̃t] =Et[E(t)Xˆt] =E(t)E[Xˆt] =E(t)xt. (18.10)
Taking the expectation of both sides of Eq. (18.9) and using the tower rule for
conditional expectation and the fact thatQtisFt-measurable leads to
Rn≤
log(M)
η
+
η
2
∑n
t=1
∑M
m=1
E
[
Qtm(1−X ̃tm)^2
]
. (18.11)
Like in Chapter 11, it is more convenient to work with losses. LetYˆti= 1−Xˆti,
yti= 1−xtiandY ̃tm= 1−X ̃tm. Note thatY ̃t=E(t)Yˆtand recall the notation
Ati=I{At=i}, which means thatYˆti=AtiPtiyti and
Et[Y ̃tm^2 ] =Et
(
EmA(t)tytAt
PtAt
) 2
=
∑k
i=1
(
E(mit)yti
) 2
Pti
≤
∑k
i=1
E(mit)
Pti
. (18.12)
Therefore, using the definition ofPti,
E
[M
∑
m=1
Qtm(1−X ̃tm)^2
]
≤E
[M
∑
m=1
Qtm
∑k
i=1
Emi(t)
Pti
]
=E
[k
∑
i=1
∑M
m=1QtmE
(t)
mi
Pti
]
=k.