Bandit Algorithms

(Jeff_L) #1

BIBLIOGRAPHY 528


A. Gy ̈orgy and Cs. Szepesv ́ari. Shifting regret, mirror descent, and matrices.
In M. F. Balcan and K. Q. Weinberger, editors,Proceedings of The 33rd
International Conference on Machine Learning, volume 48 ofProceedings of
Machine Learning Research, pages 2943–2951, New York, New York, USA,
20–22 Jun 2016. PMLR. [360]
A. Gy ̈orgy, T. Linder, G. Lugosi, and G. Ottucs ́ak. The on-line shortest path
problem under partial monitoring.Journal of Machine Learning Research, 8
(Oct):2369–2403, 2007. [350]
P. R. Halmos. Measure theory. 1950.New york, 1950. [52]
M. Hanawal, V. Saligrama, M. Valko, and R. Munos. Cheap bandits. In F. Bach
and D. Blei, editors,Proceedings of the 32nd International Conference on
Machine Learning, volume 37 ofProceedings of Machine Learning Research,
pages 2133–2142, Lille, France, 07–09 Jul 2015. PMLR. [338]
J. Hannan. Approximation to Bayes risk in repeated play.Contributions to the
Theory of Games, 3:97–139, 1957. [142, 322, 350]
G. H. Hardy.Divergent Series. Oxford University Press, 1973. [498]
A. Hatcher.Algebraic Topology. Cambridge University Press, 2002. [471]
E. Hazan. Introduction to online convex optimization.Foundations and Trends®
in Optimization, 2(3-4):157–325, 2016. [321, 322]
E. Hazan and S. Kale. A simple multi-armed bandit algorithm with optimal
variation-bounded regret. In S. M. Kakade and U. von Luxburg, editors,
Proceedings of the 24th Annual Conference on Learning Theory, volume 19 of
Proceedings of Machine Learning Research, pages 817–820. PMLR, 2011. [164]
E. Hazan, Z. Karnin, and R. Meka. Volumetric spanners: an efficient exploration
basis for learning.Journal of Machine Learning Research, 17(119):1–34, 2016.
[255, 307]
M. Herbster and M. K. Warmuth. Tracking the best expert.Machine Learning,
32(2):151–178, 1998. [360]
M. Herbster and M. K. Warmuth. Tracking the best linear predictor.Journal of
Machine Learning Research, 1(Sep):281–309, 2001. [360]
Y-C. Ho, R. S. Sreenivas, and P. Vakili. Ordinal optimization of DEDS.Discrete
Event Dynamic Systems, 1992. [389]
J. Honda and A. Takemura. An asymptotically optimal bandit algorithm for
bounded support models. InProceedings of Conference on Learning Theory
(COLT), pages 67–79, 2010. [116, 124, 134, 135, 200, 201]
J. Honda and A. Takemura. An asymptotically optimal policy for finite support
models in the multiarmed bandit problem.Machine Learning, 85(3):361–391,



  1. [116]
    J. Honda and A. Takemura. Optimality of Thompson sampling for Gaussian
    bandits depends on priors. In S. Kaski and J. Corander, editors,Proceedings of
    the 17th International Conference on Artificial Intelligence and Statistics,
    volume 33 ofProceedings of Machine Learning Research, pages 375–383,
    Reykjavik, Iceland, 22–25 Apr 2014. PMLR. [444]

Free download pdf