Bandit Algorithms

(Jeff_L) #1

BIBLIOGRAPHY 545


of Machine Learning Research, pages 1113–1122, Lille, France, 07–09 Jul 2015.
PMLR. [350]
P. Whittle. Multi-armed bandits and the Gittins index.Journal of the Royal
Statistical Society (B), pages 143–149, 1980. [427]
P. Whittle. Restless bandits: Activity allocation in a changing world.Journal of
applied probability, 25(A):287–298, 1988. [360, 427]
H. Wu and X. Liu. Double Thompson sampling for dueling bandits. In D. D. Lee,
M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors,Advances
in Neural Information Processing Systems 29, NIPS, pages 649–657. Curran
Associates, Inc., 2016. [337]
Y. Wu, A. Gy ̈orgy, and Cs. Szepesv ́ari. Online learning with gaussian payoffs
and side observations. InAdvances in Neural Information Processing Systems
28 , NIPS, pages 1360–1368. Curran Associates Inc., 2015. [473]
Y. Wu, R. Shariff, T. Lattimore, and Cs. Szepesv ́ari. Conservative bandits. In
M. Balcan and K. Weinberger, editors,Proceedings of The 33rd International
Conference on Machine Learning, volume 48 ofProceedings of Machine Learning
Research, pages 1254–1262, New York, New York, USA, 20–22 Jun 2016. PMLR.
[338]
H. P. Wynn. The sequential generation of d-optimum experimental designs.The
Annals of Mathematical Statistics, pages 1655–1664, 1970. [255]
Y. Xia, H. Li, T. Qin, N. Yu, and T.-Y. Liu. Thompson sampling for budgeted
multi-armed bandits. InProceedings of the 24th International Conference
on Artificial Intelligence, IJCAI, pages 3960–3966. AAAI Press, 2015. ISBN
978-1-57735-738-4. [338]
Y. Yao. Some results on the Gittins index for a normal reward process. InTime
Series and Related Topics, pages 284–294. Institute of Mathematical Statistics,



  1. [427]
    B. Yu. Assouad, fano, and le cam. In D. Pollard, E. Torgersen, and G. L. Yang,
    editors,Festschrift for Lucien Le Cam: Research Papers in Probability and
    Statistics, pages 423–435. Springer, 1997. [193, 194]
    Y. Yue and T. Joachims. Interactively optimizing information retrieval systems as
    a dueling bandits problem. InProceedings of the 26th International Conference
    on Machine Learning, pages 1201–1208. ACM, 2009. [337]
    Y. Yue and T. Joachims. Beat the mean bandit. In L. Getoor and T. Scheffer,
    editors,Proceedings of the 28th International Conference on Machine Learning,
    ICML, pages 241–248, New York, NY, USA, June 2011. ACM. [337]
    Y. Yue, J. Broder, R. Kleinberg, and T. Joachims. The k-armed dueling bandits
    problem. InConference on Learning Theory, 2009. [337]
    J. Zimmert and Y. Seldin. An optimal algorithm for stochastic and adversarial
    bandits.arXiv preprint: arXiv 1807.07623, 2018. [152, 154, 337]
    M. Zinkevich. Online convex programming and generalized infinitesimal gradient
    ascent. InProceedings of the 20th International Conference on Machine
    Learning, ICML, pages 928–935. AAAI Press, 2003. [322]

Free download pdf