Bandit Algorithms

(Jeff_L) #1
28.6 Bibliographic remarks 322

11 There is a nice application of online linear optimization to minimax theorems.
LetXandY be arbitrary sets. For any functionf:X×Y→R,

xinf∈Xsup
y∈Y

f(x,y)≥sup
y∈Y
xinf∈Xf(x,y).

Under certain conditions the inequality becomes an equality. Theorems
guaranteeing this are called minimax theorems. The following result by Sion
[1958] is one of the more generic variants. The statement uses notions of
quasiconvexity and semicontinuity, which are defined in the next note.

Theorem28.12 (Sion’s minimax theorem).Suppose thatXandY are convex
subsets of linear topological spaces with at least one ofXorYcompact. Let
f:X×Y →Rbe a function such thatf(·,y)is lower semicontinuous and
quasiconvex for al ly∈Yandf(x,·)is upper semicontinuous and quasiconcave
for allx∈X. Then

xinf∈Xsup
y∈Y

f(x,y) = sup
y∈Y
xinf∈Xf(x,y).

There is a short topological proof of this theorem [Komiya, 1988]. You will
use the tools of online linear optimization to analyze two special cases in
Exercise 28.14. WhenXandYare probability simplexes andfis linear, the
resulting theorem is Von Neumann’s minimax theorem [von Neumann, 1928].
The minimax theorems form a bridge between minimax adversarial regret and
Bayesian regret, which we discuss in Chapters 34 and 36.
12 LetXbe a subset of a linear topological space andf:X→R. The function
fisquasiconvexiff−^1 ((−∞,a)) is convex for alla∈Randquasiconcave
if−fis quasiconvex.fisupper semicontinuousif for allx∈Xandε > 0
there exists a neighborhoodUofxsuch thatf(y)≤f(x) +εfor ally∈U. It is
lower semicontinuousif for allx∈Xandε >0 there exists a neighborhood
Uofxsuch thatf(y)≥f(x)−εfor ally∈U.

28.6 Bibliographic remarks


The results in this chapter come from a wide variety of sources. The online convex
optimization framework is due to Zinkevich [2003], where online gradient descent
is introduced and analyzed. Mirror descent was first developed by Nemirovski
[1979] and Nemirovsky and Yudin [1983] for classical optimization while ‘follow
the regularized leader’ appears first in the work by Shalev-Shwartz [2007], Shalev-
Shwartz and Singer [2007]. An implicit form of regularization is to add a
perturbation of the losses, leading to the ‘follow the perturbed leader’ algorithm
[Hannan, 1957, Kalai and Vempala, 2005a], which is further explored in the context
of combinatorial bandit problems in Chapter 30 (and see also Exercise 11.6).
Readers interested in an overview of online learning will like the short book by
Hazan [2016] while the book by Cesa-Bianchi and Lugosi [2006] has a little more
Free download pdf