Bandit Algorithms

(Jeff_L) #1

BIBLIOGRAPHY 540


A. Sani, A. Lazaric, and R. Munos. Risk-aversion in multi-armed bandits.
In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors,
Advances in Neural Information Processing Systems 25, pages 3275–3283.
Curran Associates, Inc., 2012. [66]
Y. Seldin and G. Lugosi. An improved parametrization and analysis of the
EXP3++ algorithm for stochastic and adversarial bandits. In S. Kale and
O. Shamir, editors,Proceedings of the 2017 Conference on Learning Theory,
volume 65 ofProceedings of Machine Learning Research, pages 1743–1759,
Amsterdam, Netherlands, 07–10 Jul 2017. PMLR. [153]
Y. Seldin and A. Slivkins. One practical algorithm for both stochastic and
adversarial bandits. In E. P. Xing and T. Jebara, editors,Proceedings of the
31st International Conference on Machine Learning, volume 32 ofProceedings
of Machine Learning Research, pages 1287–1295, Bejing, China, 22–24 Jun



  1. PMLR. [153]
    S. Shalev-Shwartz.Online learning: Theory, algorithms, and applications. PhD
    thesis, The Hebrew University of Jerusalem, 2007. [322]
    S. Shalev-Shwartz and S. Ben-David.Understanding Machine Learning: From
    Theory to Algorithms. Cambridge University Press, 2009. [222, 223, 226]
    S. Shalev-Shwartz and Y. Singer. A primal-dual perspective of online learning
    algorithms.Machine Learning, 69(2-3):115–142, 2007. [322]
    O. Shamir. On the complexity of bandit and derivative-free stochastic convex
    optimization. In S. Shalev-Shwartz and I. Steinwart, editors,COLT, volume 30
    ofJMLR Workshop and Conference Proceedings, pages 3–24. JMLR.org, 2013.
    [338, 388]
    O. Shamir. On the complexity of bandit linear optimization. In P. Gr ̈unwald,
    E. Hazan, and S. Kale, editors,Proceedings of The 28th Conference on Learning
    Theory, volume 40 ofProceedings of Machine Learning Research, pages 1523–
    1551, Paris, France, 03–06 Jul 2015. PMLR. [278, 334]
    T. Sharot. The optimism bias.Current Biology, 21(23):R941–R945, 2011a. [105,
    106]
    T. Sharot. The optimism bias: A tour of the irrationally positive brain.
    Pantheon/Random House, 2011b. [106]
    D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche,
    J. Schrittwieser, I. Antonoglou, V. Panneershelvam, and M. Lanctot. Mastering
    the game of go with deep neural networks and tree search.Nature, 529(7587):
    484–489, 2016. [8]
    S. D. Silvey and B. Sibson. Discussion of dr. wynn’s and of dr. laycock’s papers.
    Journal of Royal Statistical Society (B), 34:174–175, 1972. [255]
    M. Sion. On general minimax theorems.Pacific Journal of mathematics, 8(1):
    171–176, 1958. [322]
    A. Slivkins. Contextual bandits with similarity information.Journal of Machine
    Learning Research, 15(1):2533–2568, 2014. [337]
    A. Slivkins.Introduction to Multi-Armed Bandits. TBD, 2018. [15, 337]

Free download pdf