Bandit Algorithms

28.6 Bibliographic remarks 322

11 There is a nice application of online linear optimization to minimax theorems. LetXandY be arbitrary sets. For any functionf:X×Y→R,

xinf∈Xsup y∈Y

f(x,y)≥sup y∈Y xinf∈Xf(x,y).

Under certain conditions the inequality becomes an equality. Theorems guaranteeing this are called minimax theorems. The following result by Sion [1958] is one of the more generic variants. The statement uses notions of quasiconvexity and semicontinuity, which are defined in the next note.

Theorem28.12 (Sion’s minimax theorem).Suppose thatXandY are convex subsets of linear topological spaces with at least one ofXorYcompact. Let f:X×Y →Rbe a function such thatf(·,y)is lower semicontinuous and quasiconvex for al ly∈Yandf(x,·)is upper semicontinuous and quasiconcave for allx∈X. Then

xinf∈Xsup y∈Y

f(x,y) = sup y∈Y xinf∈Xf(x,y).

There is a short topological proof of this theorem [Komiya, 1988]. You will use the tools of online linear optimization to analyze two special cases in Exercise 28.14. WhenXandYare probability simplexes andfis linear, the resulting theorem is Von Neumann’s minimax theorem [von Neumann, 1928]. The minimax theorems form a bridge between minimax adversarial regret and Bayesian regret, which we discuss in Chapters 34 and 36. 12 LetXbe a subset of a linear topological space andf:X→R. The function fisquasiconvexiff−^1 ((−∞,a)) is convex for alla∈Randquasiconcave if−fis quasiconvex.fisupper semicontinuousif for allx∈Xandε > 0 there exists a neighborhoodUofxsuch thatf(y)≤f(x) +εfor ally∈U. It is lower semicontinuousif for allx∈Xandε >0 there exists a neighborhood Uofxsuch thatf(y)≥f(x)−εfor ally∈U.

28.6 Bibliographic remarks

The results in this chapter come from a wide variety of sources. The online convex optimization framework is due to Zinkevich [2003], where online gradient descent is introduced and analyzed. Mirror descent was first developed by Nemirovski [1979] and Nemirovsky and Yudin [1983] for classical optimization while ‘follow the regularized leader’ appears first in the work by Shalev-Shwartz [2007], Shalev- Shwartz and Singer [2007]. An implicit form of regularization is to add a perturbation of the losses, leading to the ‘follow the perturbed leader’ algorithm [Hannan, 1957, Kalai and Vempala, 2005a], which is further explored in the context of combinatorial bandit problems in Chapter 30 (and see also Exercise 11.6). Readers interested in an overview of online learning will like the short book by Hazan [2016] while the book by Cesa-Bianchi and Lugosi [2006] has a little more

Bandit Algorithms

28.6 Bibliographic remarks

Get our desktop app

Company

Features

Documentation

Resources