Bandit Algorithms

(Jeff_L) #1

BIBLIOGRAPHY 541


A. Slivkins and E. Upfal. Adapting to a changing environment: the brownian
restless bandits. InCOLT, pages 343–354, 2008. [361]
M. Soare, A. Lazaric, and R. Munos. Best-arm identification in linear bandits. In
Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger,
editors,Advances in Neural Information Processing Systems 27, NIPS, pages
828–836. Curran Associates, Inc., 2014. [286, 388]
I. M. Sonin. A generalized Gittins index for a Markov chain and its recursive
calculation.Statistics & Probability Letters, 78(12):1526–1533, 2008. [427]
N. Srebro, K. Sridharan, and A. Tewari. On the universality of online mirror
descent. InAdvances in neural information processing systems, pages 2645–2653,



  1. [320]
    K. Sridharan and A. Tewari. Convex games in banach spaces. InProceedings of
    the 23rd Conference on Learning Theory, pages 1–13. Omnipress, 2010. [323]
    G. Stoltz.Incomplete information and internal regret in prediction of individual
    sequences. PhD thesis, Universit ́e Paris Sud-Paris XI, 2005. [154]
    H. Strasser. Mathematical theory of statistics: statistical experiments and
    asymptotic decision theory, volume 7. Walter de Gruyter, 2011. [407]
    R. E. Strauch. Negative dynamic programming.The Annals of Mathematical
    Statistics, 37(4):871–890, 08 1966. [503]
    A. Strehl and M. Littman. A theoretical analysis of model-based interval
    estimation. InProceedings of the 22nd international conference on Machine
    learning, ICML, pages 856–863, New York, NY, USA, 2005. ACM. [502]
    A. Strehl and M. Littman. An analysis of model-based interval estimation for
    Markov decision processes.Journal of Computer and System Sciences, 74(8):
    1309–1331, 2008. [502, 509]
    A. Strehl, L. Li, E. Wiewiora, J. Langford, and M. Littman. PAC model-free
    reinforcement learning. InProceedings of the 23rd international conference on
    Machine learning, pages 881–888, New York, NY, USA, 2006. ACM. [502]
    M. J. A. Strens. A Bayesian framework for reinforcement learning. InProceedings
    of the 17th International Conference on Machine Learning, ICML ’00, pages
    943–950, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc.
    ISBN 1-55860-707-2. [501]
    Y. Sui, A. Gotovos, J. Burdick, and A. Krause. Safe exploration for optimization
    with gaussian processes. In Francis Bach and David Blei, editors,Proceedings
    of the 32nd International Conference on Machine Learning, volume 37 of
    Proceedings of Machine Learning Research, pages 997–1005, Lille, France, 07–
    09 Jul 2015. PMLR. [338]
    Q. Sun, W. Zhou, and J. Fan. Adaptive huber regression: Optimality and phase
    transition.arXiv preprint arXiv:1706.06991, 2017. [111]
    R. Sutton and A. Barto.Reinforcement Learning: An Introduction. MIT Press,

  2. [91, 425]
    R. Sutton and A. Barto.Reinforcement Learning: An Introduction. MIT Press,
    second edition, 2018. [500]

Free download pdf