Bandit Algorithms

(Jeff_L) #1

BIBLIOGRAPHY 516


N. Ailon, Z. Karnin, and T. Joachims. Reducing dueling bandits to cardinal
bandits. InProceedings of the 31st International Conference on International
Conference on Machine Learning, ICML’14, pages II–856–II–864. JMLR.org,



  1. [337]
    J. Aldrich. “but you have to remember P. J. Daniell of Sheffield”.Electronic
    Journal for History of Probability and Statistics, 3(2), 2007. [52]
    C. Allenberg, P. Auer, L. Gy ̈orfi, and G. Ottucs ́ak. Hannan consistency in on-line
    learning in case of unbounded losses under partial monitoring. InProceedings
    of the 17th International Conference on Algorithmic Learning Theory, ALT,
    pages 229–243, Berlin, Heidelberg, 2006. Springer-Verlag. [152, 164, 320]


N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the
frequency moments. InProceedings of the 28th annual ACM symposium on
Theory of computing, pages 20–29. ACM, 1996. [111]
N. Alon, N. Cesa-Bianchi, C. Gentile, and Y. Mansour. From bandits to experts: A
tale of domination and independence. In C. J. C. Burges, L. Bottou, M. Welling,
Z. Ghahramani, and K. Q. Weinberger, editors,Advances in Neural Information
Processing Systems 26, NIPS, pages 1610–1618. Curran Associates, Inc., 2013.
[339, 473]
N. Alon, N. Cesa-Bianchi, O. Dekel, and T. Koren. Online learning with feedback
graphs: Beyond bandits. In Peter Gr ̈unwald, Elad Hazan, and Satyen Kale,
editors,Proceedings of The 28th Conference on Learning Theory, volume 40 of
Proceedings of Machine Learning Research, pages 23–35, Paris, France, 03–06
Jul 2015. PMLR. [339]
V. Anantharam, P. Varaiya, and J. Walrand. Asymptotically efficient allocation
rules for the multiarmed bandit problem with multiple plays-part i: Iid rewards.
IEEE Transactions on Automatic Control, 32(11):968–976, 1987. [235]
J. R. Anderson, J. L. Dillon, and J. E. Hardaker. Agricultural decision analysis.



  1. [2]
    F. J. Anscombe. Sequential medical trials.Journal of the American Statistical
    Association, 58(302):365–383, 1963. [91]
    A. Antos, G. Bart ́ok, D. P ́al, and Cs. Szepesv ́ari. Toward a classification of
    finite partial-monitoring games.Theoretical Computer Science, 473:77–99, 2013.
    [472]
    A. Arapostathis, V. S. Borkar, E. Fernandez-Gaucherand, M. K. Ghosh, and S. I.
    Marcus. Discrete-time controlled Markov processes with average cost criterion:
    a survey. SIAM Journal of Control and Optimization, 31(2):282–344, 1993.
    [500]
    R. Arora, O. Dekel, and A. Tewari. Online bandit learning against an adaptive
    adversary: from regret to policy regret.arXiv preprint arXiv:1206.6400, 2012.
    [153]
    B. Ashwinkumar, J. Langford, and A. Slivkins. Resourceful contextual bandits.
    In M. F. Balcan, V. Feldman, and Cs. Szepesv ́ari, editors,Proceedings of The
    27th Conference on Learning Theory, volume 35 ofProceedings of Machine

Free download pdf