Bandit Algorithms

(Jeff_L) #1

BIBLIOGRAPHY 526


R. Fruit, M. Pirotta, and A. Lazaric. Near optimal exploration-exploitation
in non-communicating markov decision processes. InAdvances in Neural
Information Processing Systems, pages 2997–3007, 2018. [501]
Y. Gai, B. Krishnamachari, and R. Jain. Combinatorial network optimization with
unknown variables: Multi-armed bandits with linear rewards and individual
observations.IEEE/ACM Transactions on Networking, 20(5):1466–1478, 2012.
[350]
P. Gajane, R. Ortner, and P. Auer. A sliding-window algorithm for Markov
decision processes with arbitrarily changing rewards and transitions. 2018.
[361]


A. Garivier. Informational confidence bounds for self-normalized averages and
applications.arXiv preprint arXiv:1309.3376, 2013. [107, 123]


A. Garivier and O. Capp ́e. The KL-UCB algorithm for bounded stochastic bandits
and beyond. InProceedings of Conference on Learning Theory (COLT), 2011.
[134, 135]
A. Garivier and E. Kaufmann. Optimal best arm identification with fixed
confidence. In V. Feldman, A. Rakhlin, and O. Shamir, editors,29th Annual
Conference on Learning Theory, volume 49 ofProceedings of Machine Learning
Research, pages 998–1027, Columbia University, New York, New York, USA,
23–26 Jun 2016. PMLR. [388, 392]
A. Garivier and E. Moulines. On upper-confidence bound policies for switching
bandit problems. In J. Kivinen, Cs. Szepesv ́ari, E. Ukkonen, and T. Zeugmann,
editors,Algorithmic Learning Theory, pages 174–188, Berlin, Heidelberg, 2011.
Springer Berlin Heidelberg. [360, 361]
A. Garivier, E. Kaufmann, and W. M. Koolen. Maximin action identification: A
new bandit framework for games. In V. Feldman, A. Rakhlin, and O. Shamir,
editors,29th Annual Conference on Learning Theory, volume 49 ofProceedings
of Machine Learning Research, pages 1028–1050, Columbia University, New
York, New York, USA, 23–26 Jun 2016a. PMLR. [388]
A. Garivier, T. Lattimore, and E. Kaufmann. On explore-then-commit strategies.
In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors,
Advances in Neural Information Processing Systems 29, NIPS, pages 784–792.
Curran Associates, Inc., 2016b. [91, 115]
A. Garivier, P. M ́enard, and G. Stoltz. Explore first, exploit next: The true shape
of regret in bandit problems.arXiv preprint arXiv:1602.07182, 2016c. [201]
A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin.
Bayesian data analysis, volume 2. CRC press Boca Raton, FL, 2014. [407]
C. Gentile and F. Orabona. On multilabel classification and ranking with partial
feedback. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger,
editors,Advances in Neural Information Processing Systems 25, NIPS, pages
1151–1159. Curran Associates, Inc., 2012. [269]
C. Gentile and F. Orabona. On multilabel classification and ranking with bandit
feedback.Journal of Machine Learning Research, 15(1):2451–2487, 2014. [269]

Free download pdf