Bandit Algorithms

(Jeff_L) #1

BIBLIOGRAPHY 523


W. Chen, W. Hu, F. Li, J. Li, Y. Liu, and P. Lu. Combinatorial multi-armed
bandit with general reward functions. In D. D. Lee, M. Sugiyama, U. V.
Luxburg, I. Guyon, and R. Garnett, editors,Advances in Neural Information
Processing Systems 29, pages 1659–1667. Curran Associates, Inc., 2016a. [350]
W. Chen, Y. Wang, Y. Yuan, and Q. Wang. Combinatorial multi-armed bandit
and its extension to probabilistically triggered arms. Journal of Machine
Learning Research, 17(50):1–33, 2016b. URLhttp://jmlr.org/papers/v17/
14-298.html. [350]
Y. R. Chen and M. N. Katehakis. Linear programming for finite state multi-armed
bandit problems.Mathematics of Operations Research, 11(1):180–183, 1986.
[427]
H. Chernoff. Sequential design of experiments. The Annals of Mathematical
Statistics, 30(3):755–770, 1959. [15, 388]
H. Chernoff. A career in statistics. Past, Present, and Future of Statistical
Science, page 29, 2014. [135]
W. Cheung, D. Simchi-Levi, and R. Zhu. Learning to optimize under non-
stationarity. 2018. [361]
W. Chu, L. Li, L. Reyzin, and R. Schapire. Contextual bandits with linear
payoff functions. In G. Gordon, D. Dunson, and M. Dud ́ık, editors,Proceedings
of the 14th International Conference on Artificial Intelligence and Statistics,
volume 15 ofProceedings of Machine Learning Research, pages 208–214, Fort
Lauderdale, FL, USA, 11–13 Apr 2011. PMLR. [259]
A. Chuklin, I. Markov, and M. de Rijke.Click Models for Web Search. Morgan
& Claypool Publishers, 2015. [374]
A. Cohen and T. Hazan. Following the perturbed leader for online structured
learning. In F. Bach and D. Blei, editors,Proceedings of the 32nd International
Conference on Machine Learning, volume 37 ofProceedings of Machine Learning
Research, pages 1034–1042, Lille, France, 07–09 Jul 2015. PMLR. [350, 351]
A. Cohen, T. Hazan, and T. Koren. Tight bounds for bandit combinatorial
optimization. In Satyen Kale and Ohad Shamir, editors,Proceedings of the
2017 Conference on Learning Theory, volume 65 ofProceedings of Machine
Learning Research, pages 629–642, Amsterdam, Netherlands, 07–10 Jul 2017.
PMLR. [349]
R. Combes, S. Magureanu, A. Proutiere, and C. Laroche. Learning to rank:
Regret lower bounds and efficient algorithms. InProceedings of the 2015 ACM
SIGMETRICS International Conference on Measurement and Modeling of
Computer Systems, pages 231–244. ACM, 2015a. ISBN 978-1-4503-3486-0.
[374]
R. Combes, M. Shahi, A. Proutiere, and M. Lelarge. Combinatorial bandits
revisited. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett,
editors,Advances in Neural Information Processing Systems 28, NIPS, pages
2116–2124. Curran Associates, Inc., 2015b. [350]
R. Combes, S. Magureanu, and A. Prouti`ere. Minimal exploration in structured

Free download pdf