Bandit Algorithms

(Jeff_L) #1

BIBLIOGRAPHY 542


V. Syrgkanis, A. Krishnamurthy, and R. Schapire. Efficient algorithms for
adversarial contextual learning. InProceedings of The 33rd International
Conference on Machine Learning, volume 48 ofProceedings of Machine Learning
Research, pages 2159–2168, New York, New York, USA, 2016. PMLR. [223]
Cs. Szepesv ́ari.Algorithms for Reinforcement Learning. Synthesis Lectures on
Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers,



  1. [501]
    I. Szita and A. L ̋orincz. Optimistic initialization and greediness lead to polynomial
    time learning in factored MDPs. InProceedings of the 26th International
    Conference on Machine Learning, ICML ’09, pages 1001–1008, New York, NY,
    USA, 2009. ACM. [502]
    I. Szita and Cs. Szepesv ́ari. Model-based reinforcement learning with nearly
    tight exploration complexity bounds. InProceedings of the 27th International
    Conference on International Conference on Machine Learning, ICML’10, pages
    1031–1038, USA, 2010. Omnipress. ISBN 978-1-60558-907-7. [502]
    M. Talagrand. The missing factor in Hoeffding’s inequalities.Annales de l’IHP
    Probabilit ́es et statistiques, 31(4):689–702, 1995. [77]
    Gunnar Taraldsen. Optimal learning from the doob-dynkin lemma.arXiv preprint
    arXiv:1801.00974, 2018. [37]
    J. Teevan, S. T. Dumais, and E. Horvitz. Characterizing the value of personalizing
    search. InProceedings of the 30th Annual International ACM SIGIR Conference
    on Research and Development in Information Retrieval, pages 757–758, New
    York, NY, USA, 2007. ACM. [375]
    A. Tewari and P. L. Bartlett. Optimistic linear programming gives logarithmic
    regret for irreducible mdps. In J. C. Platt, D. Koller, Y. Singer, and S. T.
    Roweis, editors,Advances in Neural Information Processing Systems 20, pages
    1505–1512. Curran Associates, Inc., 2008. [501]
    A. Tewari and S. A. Murphy. From ads to interventions: Contextual bandits in
    mobile health. InMobile Health, pages 495–517. Springer, 2017. [222]
    G. Theocharous, Z. Wen, Y. Abbasi-Yadkori, and N. Vlassis. Posterior sampling
    for large scale reinforcement learning.arXiv preprint arXiv:1711.07979, 2017.
    [502]
    W. Thompson. On the likelihood that one unknown probability exceeds another
    in view of the evidence of two samples.Biometrika, 25(3/4):285–294, 1933. [7,
    15, 66, 91, 429, 441, 444]
    W. R. Thompson. On the theory of apportionment. American Journal of
    Mathematics, 57(2):450–456, 1935. [500]
    M. J. Todd.Minimum-volume ellipsoids: Theory and algorithms. SIAM, 2016.
    [255, 256]
    J. R. R. Tolkien.The Hobbit. Ballantine Books, 1937. [429]
    L. Tran-Thanh, A. Chapman, E. Munoz de Cote, A. Rogers, and N. R. Jennings.
    Epsilon–first policies for budget–limited multi-armed bandits. InProceedings of
    the 24th AAAI Conference on Artificial Intel ligence, AAAI, pages 1211–1216,

  2. [338]

Free download pdf