Bandit Algorithms

BIBLIOGRAPHY 529

J. Honda and A. Takemura. Non-asymptotic analysis of a new bandit algorithm for
semi-bounded rewards.Journal of Machine Learning Research, 16:3721–3756,

[135, 200, 201]
X. Hu, Prashanth L.A., A. Gy ̈orgy, and Cs. Szepesv ́ari. (bandit) convex
optimization with biased noisy gradient oracles. InAISTATS, pages 819–
828, 2016. [338, 388]
R. Huang, M. M. Ajallooeian, Cs. Szepesv ́ari, and M. M ̈uller. Structured best
arm identification with fixed confidence. In S. Hanneke and L. Reyzin, editors,
Proceedings of the 28th International Conference on Algorithmic Learning
Theory, volume 76 ofProceedings of Machine Learning Research, pages 593–616,
Kyoto University, Kyoto, Japan, 2017a. PMLR. [388]
R. Huang, T. Lattimore, A. Gy ̈orgy, and Cs. Szepesv ́ari. Following the leader
and fast rates in online linear prediction: Curved constraint sets and other
regularities.Journal of Machine Learning Research, 18:1–31, 2017b. [321]
W. Huang, J. Ok, L. Li, and W. Chen. Combinatorial pure exploration with
continuous and separable reward functions and its applications. InIJCAI,
pages 2291–2297, 2018. [388]
M. Hutter and J. Poland. Adaptive online prediction by following the perturbed
leader.Journal of Machine Learning Research, 6:639–660, 2005. [350]
E. L. Ionides. Truncated importance sampling.Journal of Computational and
Graphical Statistics, 17(2):295–311, 2008. [165]
V. I. Ivanenko and V. A. Labkovsky. On regularities of mass random phenomena.

[142]
T. Jaksch, P. Auer, and R. Ortner. Near-optimal regret bounds for reinforcement
learning.Journal of Machine Learning Research, 99:1563–1600, August 2010.
ISSN 1532-4435. [501, 504]
K. Jamieson and R. Nowak. Best-arm identification algorithms for multi-armed
bandits in the fixed confidence setting. InInformation Sciences and Systems
(CISS), 2014 48th Annual Conference on, pages 1–6. IEEE, 2014. [388]
K. Jamieson and A. Talwalkar. Non-stochastic best arm identification and
hyperparameter optimization. InArtificial Intelligence and Statistics, pages
240–248, 2016. [390]
K. Jamieson, S. Katariya, A. Deshpande, and R. Nowak. Sparse dueling bandits.
In G. Lebanon and S. V. N. Vishwanathan, editors,Proceedings of the 18th
International Conference on Artificial Intelligence and Statistics, volume 38
ofProceedings of Machine Learning Research, pages 416–424, San Diego,
California, USA, 09–12 May 2015. PMLR. [337]
E. T. Jaynes.Probability theory: the logic of science. Cambridge university press,

[407]
A. Jefferson, L. Bortolotti, and B. Kuzmanovic. What is unrealistic optimism?
Consciousness and Cognition, 50:3–11, 2017. [106]
P. Joulani, A. Gyorgy, and Cs. Szepesv ́ari. Online learning under delayed feedback.
In Sanjoy Dasgupta and David McAllester, editors,Proceedings of the 30th
International Conference on Machine Learning, volume 28 ofProceedings of

Bandit Algorithms

BIBLIOGRAPHY 529

Get our desktop app

Company

Features

Documentation

Resources