Bandit Algorithms

(Jeff_L) #1

BIBLIOGRAPHY 517


Learning Research, pages 1109–1134, Barcelona, Spain, 13–15 Jun 2014. PMLR.
[338]
J.-V. Audibert and S. Bubeck. Regret bounds and minimax policies under partial
monitoring.Journal of Machine Learning Research, 11(Oct):2785–2836, 2010a.
[152]
J.-Y. Audibert and S. Bubeck. Minimax policies for adversarial and stochastic
bandits. InProceedings of Conference on Learning Theory (COLT), pages
217–226, 2009. [124, 152, 326]
J.-Y. Audibert and S. Bubeck. Best arm identification in multi-armed bandits.
InProceedings of Conference on Learning Theory (COLT), 2010b. [360, 388]
J.-Y. Audibert, R. Munos, and Cs. Szepesv ́ari. Tuning bandit algorithms in
stochastic environments. In M. Hutter, R. A. Servedio, and E. Takimoto,
editors,Algorithmic Learning Theory, pages 150–165, Berlin, Heidelberg, 2007.
Springer Berlin Heidelberg. [67, 82, 106, 110]
J.-Y. Audibert, R. Munos, and Cs. Szepesv ́ari. Exploration-exploitation tradeoff
using variance estimates in multi-armed bandits.Theoretical Computer Science,
410(19):1876–1902, 2009. [67]
J.-Y. Audibert, S. Bubeck, and G. Lugosi. Regret in online combinatorial
optimization.Mathematics of Operations Research, 39(1):31–45, 2013. [323]
P. Auer. Using confidence bounds for exploitation-exploration trade-offs.Journal
of Machine Learning Research, 3(Nov):397–422, 2002. [235, 259]
P. Auer and C. Chiang. An algorithm with nearly optimal pseudo-regret for both
stochastic and adversarial bandits. In V. Feldman, A. Rakhlin, and O. Shamir,
editors,29th Annual Conference on Learning Theory, volume 49 ofProceedings
of Machine Learning Research, pages 116–120, Columbia University, New York,
New York, USA, 23–26 Jun 2016. PMLR. [153, 154]
P. Auer and R. Ortner. Logarithmic online regret bounds for undiscounted
reinforcement learning. In B. Sch ̈olkopf, J. C. Platt, and T. Hoffman, editors,
Advances in Neural Information Processing Systems 19, pages 49–56. MIT
Press, 2007. [501]
P. Auer and R. Ortner. UCB revisited: Improved regret bounds for the stochastic
multi-armed bandit problem.Periodica Mathematica Hungarica, 61(1-2):55–65,



  1. [95, 123]
    P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. Gambling in a
    rigged casino: The adversarial multi-armed bandit problem. InFoundations
    of Computer Science, 1995. Proceedings., 36th Annual Symposium on, pages
    322–331. IEEE, 1995. [142, 154, 193]
    P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed
    bandit problem.Machine Learning, 47:235–256, 2002a. [91, 106]
    P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic
    multiarmed bandit problem.SIAM Journal on Computing, 32(1):48–77, 2002b.
    [164, 194, 222, 360]
    P. Auer, R. Ortner, and Cs. Szepesv ́ari. Improved rates for the stochastic

Free download pdf