Bandit Algorithms

(Jeff_L) #1

BIBLIOGRAPHY 544


P. Varaiya, J. Walrand, and C. Buyukkoc. Extensions of the multiarmed bandit
problem: The discounted case.IEEE Transactions on Automatic Control, 30
(5):426–439, 1985. [427]
C. Vernade, O. Capp ́e, and V. Perchet. Stochastic bandit models for delayed
conversions.arXiv preprint arXiv:1706.09186, 2017. [339]
C. Vernade, A. Carpentier, G. Zappella, B. Ermis, and M. Brueckner. Contextual
bandits under delayed feedback.arXiv preprint arXiv:1807.02089, 2018. [339]
S. Villar, J. Bowden, and J. Wason. Multi-armed bandit models for the optimal
design of clinical trials: benefits and challenges.Statistical science: a review
journal of the Institute of Mathematical Statistics, 30(2):199–215, 2015. [16]
W. Vogel. An asymptotic minimax theorem for the two armed bandit problem.
The Annals of Mathematical Statistics, 31(2):444–451, 1960. [193]
J. von Neumann. Zur theorie der gesellschaftsspiele.Mathematische annalen, 100
(1):295–320, 1928. [322]
V. G. Vovk. Aggregating strategies. Proceedings of Computational Learning
Theory, 1990. [142, 154]
S. Wang and W. Chen. Thompson sampling for combinatorial semi-bandits. In
Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International
Conference on Machine Learning, volume 80 ofProceedings of Machine Learning
Research, pages 5114–5122, Stockholmsm ̈assan, Stockholm Sweden, 10–15 Jul



  1. PMLR. [350, 445]
    P. L Wawrzynski and A. Pacut. Truncated importance sampling for reinforcement
    learning with experience replay. In Proceedings of the International
    Multiconference on Computer Science and Information Technology, pages 305–
    315, 2007. [165]
    R. Weber. On the Gittins index for multiarmed bandits.The Annals of Applied
    Probability, 2(4):1024–1033, 1992. [427]
    R. Weber and G. Weiss. On an index policy for restless bandits. Journal of
    Applied Probability, 27(3):637–648, 1990. [427]
    C-Y. Wei and H. Luo. More adaptive algorithms for adversarial bandits. In
    S ́ebastien Bubeck, Vianney Perchet, and Philippe Rigollet, editors,Proceedings
    of the 31st Conference On Learning Theory, volume 75 ofProceedings of
    Machine Learning Research, pages 1263–1291. PMLR, 06–09 Jul 2018. [320,
    323, 326]
    M. J. Weinberger and E. Ordentlich. On delayed prediction of individual sequences.
    InInformation Theory, 2002. Proceedings. 2002 IEEE International Symposium
    on, page 148. IEEE, 2002. [338]
    T. Weissman, E. Ordentlich, G. Seroussi, and S. Verd ́u. Inequalities for the`^1
    deviation of the empirical distribution. Technical report, Hewlett-Packard
    Labs, 2003. [83]
    Z. Wen, B. Kveton, and A. Ashkan. Efficient learning in large-scale combinatorial
    semi-bandits. In F. Bach and D. Blei, editors,Proceedings of the 32nd
    International Conference on Machine Learning, volume 37 ofProceedings

Free download pdf