28.5 Notes 319
Proof Leta∗= argmina∈A
∑n
t=1〈a,yt〉be the optimal action. Then
Rn=E
[n
∑
t=1
〈At−ra∗,yt〉
]
+
∑n
t=1
〈ra∗−a∗,yt〉≤Rn(ra∗) + (1−r)n,
where the inequality follows from the definition ofAand Cauchy-Schwarz. By
Theorem 28.5 and Theorem 26.12,
Rn(a∗)≤
diamF(A)
η
+
η
2
E
[n
∑
t=1
‖Yˆt‖^2 ∇F(Zt)− 1
]
,
whereZt∈[A ̄t,A ̄t+1] lies on the chord connectingA ̄tandA ̄t+1. The algorithm is
stable in the sense that no matter how the losses are chosen,A ̄t+1 cannot
be too far fromA ̄t. This also means thatZt is close toA ̄t. By definition,
η‖Yˆt‖≤ηd/(1−r) = 1/2. Combining this with Eq. (28.13) shows that
1 −‖Zt‖
1 −‖A ̄t‖
≤ sup
α∈[0,1]
1 −‖αA ̄t+ (1−α)A ̄t+1‖
1 −‖A ̄t‖
= max
{
1 ,
1 −‖A ̄t+1‖
1 −‖A ̄t‖
}
≤max
{
1 ,
1 +η‖Lˆt− 1 ‖
1 +η‖Lˆt‖
}
≤max
{
1 ,
1 +η‖Lˆt− 1 ‖
1 /2 +η‖Lˆt− 1 ‖
}
≤ 2.
The next step is to find the Hessian ofF, which is
∇^2 F(a) =
I
1 −‖a‖
+
aa>
‖a‖(1−‖a‖)^2
≥
I
1 −‖a‖
Therefore∇^2 F(a)−^1 ≤(1−‖a‖)Iand so
E
[
‖Yˆt‖^2 ∇F(Zt)− 1
]
≤E
[
(1−‖Zt‖)‖Yˆt‖^2
]
=d^2 E
[
(1−‖Zt‖)Et〈At,yt〉^2
(1−‖A ̄t‖)^2
]
≤ 2 d.
The diameter is bounded by diamF(A ̄)≤log(1/(1−r)) and hence
Rn≤(1−r)n+^1
η
log
(
1
1 −r
)
+ηnd
≤
1
η
log
(
1
2 ηd
)
+ 3ηnd
≤ 3
√
2 ndlog(n).
28.5 Notes
1 Findingat+1for both mirror descent and follow the regularized leader requires
solving a convex optimization problem. Provided the dimension is not too
large and the action set and potential are reasonably nice, there exist practical
approximation algorithms for this problem. The two-step process described in
Eqs. (28.7) and (28.8) is sometimes an easier way to go. Usually(28.7)can
be solved analytically while(28.8)can be quite expensive. In some important