Bandit Algorithms

BIBLIOGRAPHY 541

A. Slivkins and E. Upfal. Adapting to a changing environment: the brownian
restless bandits. InCOLT, pages 343–354, 2008. [361]
M. Soare, A. Lazaric, and R. Munos. Best-arm identification in linear bandits. In
Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger,
editors,Advances in Neural Information Processing Systems 27, NIPS, pages
828–836. Curran Associates, Inc., 2014. [286, 388]
I. M. Sonin. A generalized Gittins index for a Markov chain and its recursive
calculation.Statistics & Probability Letters, 78(12):1526–1533, 2008. [427]
N. Srebro, K. Sridharan, and A. Tewari. On the universality of online mirror
descent. InAdvances in neural information processing systems, pages 2645–2653,

[320]
K. Sridharan and A. Tewari. Convex games in banach spaces. InProceedings of
the 23rd Conference on Learning Theory, pages 1–13. Omnipress, 2010. [323]
G. Stoltz.Incomplete information and internal regret in prediction of individual
sequences. PhD thesis, Universit ́e Paris Sud-Paris XI, 2005. [154]
H. Strasser. Mathematical theory of statistics: statistical experiments and
asymptotic decision theory, volume 7. Walter de Gruyter, 2011. [407]
R. E. Strauch. Negative dynamic programming.The Annals of Mathematical
Statistics, 37(4):871–890, 08 1966. [503]
A. Strehl and M. Littman. A theoretical analysis of model-based interval
estimation. InProceedings of the 22nd international conference on Machine
learning, ICML, pages 856–863, New York, NY, USA, 2005. ACM. [502]
A. Strehl and M. Littman. An analysis of model-based interval estimation for
Markov decision processes.Journal of Computer and System Sciences, 74(8):
1309–1331, 2008. [502, 509]
A. Strehl, L. Li, E. Wiewiora, J. Langford, and M. Littman. PAC model-free
reinforcement learning. InProceedings of the 23rd international conference on
Machine learning, pages 881–888, New York, NY, USA, 2006. ACM. [502]
M. J. A. Strens. A Bayesian framework for reinforcement learning. InProceedings
of the 17th International Conference on Machine Learning, ICML ’00, pages
943–950, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc.
ISBN 1-55860-707-2. [501]
Y. Sui, A. Gotovos, J. Burdick, and A. Krause. Safe exploration for optimization
with gaussian processes. In Francis Bach and David Blei, editors,Proceedings
of the 32nd International Conference on Machine Learning, volume 37 of
Proceedings of Machine Learning Research, pages 997–1005, Lille, France, 07–
09 Jul 2015. PMLR. [338]
Q. Sun, W. Zhou, and J. Fan. Adaptive huber regression: Optimality and phase
transition.arXiv preprint arXiv:1706.06991, 2017. [111]
R. Sutton and A. Barto.Reinforcement Learning: An Introduction. MIT Press,

[91, 425]
R. Sutton and A. Barto.Reinforcement Learning: An Introduction. MIT Press,
second edition, 2018. [500]

Bandit Algorithms

BIBLIOGRAPHY 541

Get our desktop app

Company

Features

Documentation

Resources