Bandit Algorithms

(Jeff_L) #1

BIBLIOGRAPHY 521


S. Bubeck, O. Dekel, T. Koren, and Y. Peres. Bandit convex optimization:√
T regret in one dimension. In P. Gr ̈unwald, E. Hazan, and S. Kale,
editors,Proceedings of The 28th Conference on Learning Theory, volume 40
ofProceedings of Machine Learning Research, pages 266–278, Paris, France,
03–06 Jul 2015a. PMLR. [338, 443]
S. Bubeck, R. Eldan, and J. Lehec. Finite-time analysis of projected Langevin
Monte Carlo. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and
R. Garnett, editors,Advances in Neural Information Processing Systems 28,
NIPS, pages 1243–1251. Curran Associates, Inc., 2015b. [307, 320]
S. Bubeck, Y.T. Lee, and R. Eldan. Kernel-based methods for bandit convex
optimization. InProceedings of the 49th Annual ACM SIGACT Symposium on
Theory of Computing, STOC 2017, pages 72–85, New York, NY, USA, 2017.
ACM. ISBN 978-1-4503-4528-6. [338]
S. Bubeck, M. Cohen, and Y. Li. Sparsity, variance and curvature in multi-armed
bandits. In F. Janoos, M. Mohri, and K. Sridharan, editors,Proceedings of
Algorithmic Learning Theory, volume 83 ofProceedings of Machine Learning
Research, pages 111–127. PMLR, 07–09 Apr 2018. [164, 320, 323]
A. N. Burnetas and M. N. Katehakis. Optimal adaptive policies for sequential
allocation problems.Advances in Applied Mathematics, 17(2):122–142, 1996.
[116, 135, 200, 201]
A. N. Burnetas and M. N. Katehakis. Optimal adaptive policies for Markov
decision processes.Mathematics of Operations Research, 22(1):222–255, 1997a.
[501]
A. N. Burnetas and M. N. Katehakis. On the finite horizon one-armed bandit
problem.Stochastic Analysis and Applications, 16(1):845–859, 1997b. [425]
A. N. Burnetas and M. N. Katehakis. Asymptotic Bayes analysis for the
finite-horizon one-armed-bandit problem.Probability in the Engineering and
Informational Sciences, 17(1):53–82, 2003. [427]
R. R. Bush and F. Mosteller. A stochastic model with applications to learning.
The Annals of Mathematical Statistics, pages 559–585, 1953. [7]
O. Capp ́e, A. Garivier, O. Maillard, R. Munos, and G. Stoltz. Kullback–Leibler
upper confidence bounds for optimal sequential allocation. The Annals of
Statistics, 41(3):1516–1541, 2013. [116, 135, 137, 201]
A. Carpentier and A. Locatelli. Tight (lower) bounds for the fixed budget best
arm identification bandit problem. In V. Feldman, A. Rakhlin, and O. Shamir,
editors,29th Annual Conference on Learning Theory, volume 49 ofProceedings
of Machine Learning Research, pages 590–604, Columbia University, New York,
New York, USA, 23–26 Jun 2016. PMLR. [388]
A. Carpentier and R. Munos. Bandit theory meets compressed sensing
for high dimensional stochastic linear bandit. In N. D. Lawrence and
M. Girolami, editors,Proceedings of the 15th International Conference on
Artificial Intelligence and Statistics, volume 22 ofProceedings of Machine
Learning Research, pages 190–198, La Palma, Canary Islands, 21–23 Apr 2012.
PMLR. [269, 334]

Free download pdf