Bandit Algorithms

BIBLIOGRAPHY 515

Information Processing Systems 24, NIPS, pages 1035–1043. Curran Associates,
Inc., 2011. [338]
A. Agarwal, D. P. Foster, D. Hsu, S. M. Kakade, and A. Rakhlin. Stochastic
convex optimization with bandit feedback.SIAM Journal on Optimization, 23
(1):213–240, 2013. [389]
A. Agarwal, D. Hsu, S. Kale, J. Langford, L. Li, and R. Schapire. Taming the
monster: A fast and simple algorithm for contextual bandits. In E. P. Xing
and T. Jebara, editors,Proceedings of the 31st International Conference on
Machine Learning, volume 32 ofProceedings of Machine Learning Research,
pages 1638–1646, Bejing, China, 22–24 Jun 2014. PMLR. [221, 222, 223]
A. Agarwal, S. Bird, M. Cozowicz, L. Hoang, J. Langford, S. Lee, J. Li,
D. Melamed, G. Oshri, and O. Ribas. Making contextual decisions with
low technical debt.arXiv preprint arXiv:1606.03966, 2016. [16]
R. Agrawal. Sample mean based index policies with O(logn) regret for the multi-
armed bandit problem.Advances in Applied Probability, pages 1054–1078, 1995.
[106, 116]
S. Agrawal and N. R. Devanur. Bandits with concave rewards and convex
knapsacks. InProceedings of the 15th ACM conference on Economics and
computation, pages 989–1006. ACM, 2014. [338]
S. Agrawal and N. R. Devanur. Linear contextual bandits with knapsacks. In
Advances in Neural Information Processing Systems 29, NIPS, pages 3458–3467.
Curran Associates Inc., 2016. [338]
S. Agrawal and N. Goyal. Analysis of Thompson sampling for the multi-armed
bandit problem. InProceedings of Conference on Learning Theory (COLT),

[444]
S. Agrawal and N. Goyal. Further optimal regret bounds for Thompson
sampling. In C. M. Carvalho and P. Ravikumar, editors,Proceedings of the 16th
International Conference on Artificial Intel ligence and Statistics, volume 31 of
Proceedings of Machine Learning Research, pages 99–107, Scottsdale, Arizona,
USA, 29 Apr–01 May 2013a. PMLR. [442, 444]
S. Agrawal and N. Goyal. Thompson sampling for contextual bandits with linear
payoffs. In S. Dasgupta and D. McAllester, editors,Proceedings of the 30th
International Conference on Machine Learning, volume 28 ofProceedings of
Machine Learning Research, pages 127–135, Atlanta, Georgia, USA, 17–19 Jun
2013b. PMLR. [444]
S. Agrawal and R. Jia. Optimistic posterior sampling for reinforcement learning:
worst-case regret bounds. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach,
R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural
Information Processing Systems 30, NIPS, pages 1184–1194. Curran Associates,
Inc., 2017. [501]
S. Agrawal, V. Avadhanula, V. Goyal, and A. Zeevi. Thompson sampling for
the MNL-bandit. In S. Kale and O. Shamir, editors,Proceedings of the 2017
Conference on Learning Theory, volume 65 ofProceedings of Machine Learning
Research, pages 76–78, Amsterdam, Netherlands, 07–10 Jul 2017. PMLR. [444]

Bandit Algorithms

BIBLIOGRAPHY 515

Get our desktop app

Company

Features

Documentation

Resources