BIBLIOGRAPHY 544
P. Varaiya, J. Walrand, and C. Buyukkoc. Extensions of the multiarmed bandit
problem: The discounted case.IEEE Transactions on Automatic Control, 30
(5):426–439, 1985. [427]
C. Vernade, O. Capp ́e, and V. Perchet. Stochastic bandit models for delayed
conversions.arXiv preprint arXiv:1706.09186, 2017. [339]
C. Vernade, A. Carpentier, G. Zappella, B. Ermis, and M. Brueckner. Contextual
bandits under delayed feedback.arXiv preprint arXiv:1807.02089, 2018. [339]
S. Villar, J. Bowden, and J. Wason. Multi-armed bandit models for the optimal
design of clinical trials: benefits and challenges.Statistical science: a review
journal of the Institute of Mathematical Statistics, 30(2):199–215, 2015. [16]
W. Vogel. An asymptotic minimax theorem for the two armed bandit problem.
The Annals of Mathematical Statistics, 31(2):444–451, 1960. [193]
J. von Neumann. Zur theorie der gesellschaftsspiele.Mathematische annalen, 100
(1):295–320, 1928. [322]
V. G. Vovk. Aggregating strategies. Proceedings of Computational Learning
Theory, 1990. [142, 154]
S. Wang and W. Chen. Thompson sampling for combinatorial semi-bandits. In
Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International
Conference on Machine Learning, volume 80 ofProceedings of Machine Learning
Research, pages 5114–5122, Stockholmsm ̈assan, Stockholm Sweden, 10–15 Jul
- PMLR. [350, 445]
P. L Wawrzynski and A. Pacut. Truncated importance sampling for reinforcement
learning with experience replay. In Proceedings of the International
Multiconference on Computer Science and Information Technology, pages 305–
315, 2007. [165]
R. Weber. On the Gittins index for multiarmed bandits.The Annals of Applied
Probability, 2(4):1024–1033, 1992. [427]
R. Weber and G. Weiss. On an index policy for restless bandits. Journal of
Applied Probability, 27(3):637–648, 1990. [427]
C-Y. Wei and H. Luo. More adaptive algorithms for adversarial bandits. In
S ́ebastien Bubeck, Vianney Perchet, and Philippe Rigollet, editors,Proceedings
of the 31st Conference On Learning Theory, volume 75 ofProceedings of
Machine Learning Research, pages 1263–1291. PMLR, 06–09 Jul 2018. [320,
323, 326]
M. J. Weinberger and E. Ordentlich. On delayed prediction of individual sequences.
InInformation Theory, 2002. Proceedings. 2002 IEEE International Symposium
on, page 148. IEEE, 2002. [338]
T. Weissman, E. Ordentlich, G. Seroussi, and S. Verd ́u. Inequalities for the`^1
deviation of the empirical distribution. Technical report, Hewlett-Packard
Labs, 2003. [83]
Z. Wen, B. Kveton, and A. Ashkan. Efficient learning in large-scale combinatorial
semi-bandits. In F. Bach and D. Blei, editors,Proceedings of the 32nd
International Conference on Machine Learning, volume 37 ofProceedings