Bandit Algorithms

(Jeff_L) #1

BIBLIOGRAPHY 531


E. Kaufmann. On Bayesian index policies for sequential resource allocation.The
Annals of Statistics, 46(2):842–865, 04 2018. [116, 124, 442, 444]
E. Kaufmann, O. Cappe, and A. Garivier. On Bayesian upper confidence bounds
for bandit problems. In N. D. Lawrence and M. Girolami, editors,Proceedings
of the 15th International Conference on Artificial Intelligence and Statistics,
volume 22 ofProceedings of Machine Learning Research, pages 592–600, La
Palma, Canary Islands, 21–23 Apr 2012a. PMLR. [442, 444]
E. Kaufmann, N. Korda, and R. Munos. Thompson sampling: An asymptotically
optimal finite-time analysis. In NaderH. Bshouty, Gilles Stoltz, Nicolas Vayatis,
and Thomas Zeugmann, editors,Algorithmic Learning Theory, volume 7568 of
Lecture Notes in Computer Science, pages 199–213. Springer Berlin Heidelberg,
2012b. ISBN 978-3-642-34105-2. [116, 442, 444]
J. Kawale, H. H. Bui, B. Kveton, L. Tran-Thanh, and S. Chawla. Efficient
Thompson sampling for online matrix-factorization recommendation. In
C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors,
Advances in Neural Information Processing Systems 28, NIPS, pages 1297–1305.
Curran Associates, Inc., 2015. [444]
A. Kazerouni, M. Ghavamzadeh, Y. Abbasi, and B. Van Roy. Conservative
contextual linear bandits. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach,
R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural
Information Processing Systems 30, pages 3910–3919. Curran Associates, Inc.,



  1. [338]
    M. Kearns and S. Singh. Near-optimal reinforcement learning in polynomial time.
    Machine Learning, 49(2-3):209–232, 2002. [502]
    M. J. Kearns and U. V. Vazirani. An introduction to computational learning
    theory. MIT press, 1994. [223]
    J. Kiefer and J. Wolfowitz. The equivalence of two extremum problems.Canadian
    Journal of Mathematics, 12(5):363–365, 1960. [255]
    M. J. Kim. Thompson sampling for stochastic control: The finite parameter case.
    IEEE Transactions on Automatic Control, 62(12):6415–6422, 2017. [444]
    J. Kirschner and A. Krause. Information directed sampling and bandits with
    heteroscedastic noise.arXiv preprint arXiv:1801.09667, 2018. [84]
    R. Kleinberg. Nearly tight bounds for the continuum-armed bandit problem. In
    L. K. Saul, Y. Weiss, and L. Bottou, editors,Advances in Neural Information
    Processing Systems 17, NIPS, pages 697–704. MIT Press, 2005. [337]
    R. Kleinberg, A. Slivkins, and E. Upfal. Multi-armed bandits in metric spaces.
    InProceedings of the 40th Annual ACM Symposium on Theory of Computing,
    pages 681–690. ACM, 2008. [337]
    T. Koc ́ak, G. Neu, M. Valko, and R. Munos. Efficient learning by implicit
    exploration in bandit problems with side observations. In Z. Ghahramani,
    M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors,Advances
    in Neural Information Processing Systems 27, pages 613–621. Curran Associates,
    Inc., 2014. [163, 164, 165, 339]

Free download pdf