28.2 Regret analysis 314
fora 1 = argminb∈AF(b). To see the second part note that
〈at−at+1,yt〉=
1
η〈at−at+1,∇F(at)−∇F( ̃at+1)〉
=^1
η
(D(at+1,at) +D(at, ̃at+1)−D(at+1, ̃at+1))
≤
1
η
(D(at+1,at) +D(at, ̃at+1)).
The result follows by substituting this into Eq. (28.10) and completing as for the
first part.
The assumption thata 1 minimizes the potential was only used to bound
D(a,a 1 )≤F(a)−F(a 1 ). For a different initialization the following bound
still holds:
Rn(a)≤
1
η
(
D(a,a 1 ) +
∑n
t=1
D(at,a ̃t+1)
)
. (28.11)
As we shall see in Chapter 31, this is useful when using mirror descent to
analyze nonstationary bandits.
The first part of Theorem 28.4 also holds for follow the regularized leader as
stated in the next result, the proof of which is left for Exercise 28.3.
Theorem28.5. Letη > 0 andF be Legendre with domainDandA ⊆cl(D).
Then for anya∈Athe regret of fol low the regularized leader is bounded by
Rn(a)≤
F(a)−F(a 1 )
η
+
∑n
t=1
〈at−at+1,yt〉−
1
η
∑n
t=1
D(at+1,at).
We now give two applications of Theorem 28.4. Let diamF(A) =
maxa,b∈AF(a)−F(b) be the diameter ofAwith respect toF.
Proposition28.6.LetA=Bd 2 ={a∈Rd:‖a‖ 2 ≤ 1 }be the standard unit bal l
and assumeyt∈Bd 2 for allt. Then mirror descent with potentialF(a) =^12 ‖a‖^22
andη=
√
1 /nsatisfiesRn≤
√
n.
Proof By Eq. (28.9) we have ̃at+1=at−ηytso
D(at, ̃at+1) =
1
2
‖ ̃at+1−at‖^22 =
η^2
2
‖yt‖^22.
Therefore since diamF(A) = 1/2 and‖yt‖ 2 ≤1 for allt,
Rn≤
diamF(A)
η +
η
2
∑n
t=1
‖yt‖^22 ≤
1
2 η+
ηn
2 =
√
n.
Proposition28.7.LetA=Pd− 1 be the probability simplex andyt∈L= [0,1]d