30.4 Follow the perturbed leader 344
where the inequality follows from the fact thatexp(−x)≤ 1 −x+x^2 /2 for all
x≥0. Taking the expectation leads to
E
[d
∑
i=1
A ̄tiYˆti^2
]
=E
[d
∑
i=1
y^2 tiAti
A ̄ti
]
≤d.
The diameter is easily bounded by noting thatFis negative inco(A) and using
the Cauchy-Schwarz inequality:
diamF(co(A)) = sup
a∈co(A)
∑d
i=1
(
ailog(ai)−ai+A ̄ 1 i+A ̄ 1 ilog
(
1
A ̄ 1 i
))
≤m+
∑d
i=1
A ̄ 1 ilog
(
1
A ̄ 1 i
)
≤m(1 + log(d/m)).
Putting together the pieces shows that
Rn≤
m(1 + log(d/m))
η
+
ηnd
2
=
√
2 nmd(1 + log (d/m)).
Algorithm 17 plays mirror descent on the convex hull of the actions, which has
dimensiond−1. In principle it would be possible to do the same thing on the
set of distributions over actions, which has dimension|A|−1. Repeating the
analysis leads to a suboptimal regret ofO(m
√
dnlog(d/m)). We encourage
the reader to go through this calculation to see where things go wrong.
Like in Section 30.2, the main problem is computation. In each round the algorithm
needs to find a distributionPtoverAsuch that
∑
a∈APt(a) =A ̄t. Feasibility
follows from the definition ofco(A) while Carath ́eodory’s theorem proves the
support ofPtnever needs to be larger thand+ 1. SinceAis finite we can
write this problem in terms of linear constraints, but naively the computation
complexity is polynomial ink, which is exponential inm. The algorithm also
needs to computeA ̄t+1fromA ̄tandYˆt. This is a convex optimization problem,
but the computation complexity depends on the representation ofAand may be
intractable.
30.4 Follow the perturbed leader
In this section we describe an efficient algorithm for online combinatorial
optimization under the assumption that for ally∈[0,∞)dthe optimization
problem of finding
a∗= argmina∈A〈a,y〉 (30.2)