Bandit Algorithms

(Jeff_L) #1

BIBLIOGRAPHY 529


J. Honda and A. Takemura. Non-asymptotic analysis of a new bandit algorithm for
semi-bounded rewards.Journal of Machine Learning Research, 16:3721–3756,



  1. [135, 200, 201]
    X. Hu, Prashanth L.A., A. Gy ̈orgy, and Cs. Szepesv ́ari. (bandit) convex
    optimization with biased noisy gradient oracles. InAISTATS, pages 819–
    828, 2016. [338, 388]
    R. Huang, M. M. Ajallooeian, Cs. Szepesv ́ari, and M. M ̈uller. Structured best
    arm identification with fixed confidence. In S. Hanneke and L. Reyzin, editors,
    Proceedings of the 28th International Conference on Algorithmic Learning
    Theory, volume 76 ofProceedings of Machine Learning Research, pages 593–616,
    Kyoto University, Kyoto, Japan, 2017a. PMLR. [388]
    R. Huang, T. Lattimore, A. Gy ̈orgy, and Cs. Szepesv ́ari. Following the leader
    and fast rates in online linear prediction: Curved constraint sets and other
    regularities.Journal of Machine Learning Research, 18:1–31, 2017b. [321]
    W. Huang, J. Ok, L. Li, and W. Chen. Combinatorial pure exploration with
    continuous and separable reward functions and its applications. InIJCAI,
    pages 2291–2297, 2018. [388]
    M. Hutter and J. Poland. Adaptive online prediction by following the perturbed
    leader.Journal of Machine Learning Research, 6:639–660, 2005. [350]
    E. L. Ionides. Truncated importance sampling.Journal of Computational and
    Graphical Statistics, 17(2):295–311, 2008. [165]
    V. I. Ivanenko and V. A. Labkovsky. On regularities of mass random phenomena.

  2. [142]
    T. Jaksch, P. Auer, and R. Ortner. Near-optimal regret bounds for reinforcement
    learning.Journal of Machine Learning Research, 99:1563–1600, August 2010.
    ISSN 1532-4435. [501, 504]
    K. Jamieson and R. Nowak. Best-arm identification algorithms for multi-armed
    bandits in the fixed confidence setting. InInformation Sciences and Systems
    (CISS), 2014 48th Annual Conference on, pages 1–6. IEEE, 2014. [388]
    K. Jamieson and A. Talwalkar. Non-stochastic best arm identification and
    hyperparameter optimization. InArtificial Intelligence and Statistics, pages
    240–248, 2016. [390]
    K. Jamieson, S. Katariya, A. Deshpande, and R. Nowak. Sparse dueling bandits.
    In G. Lebanon and S. V. N. Vishwanathan, editors,Proceedings of the 18th
    International Conference on Artificial Intelligence and Statistics, volume 38
    ofProceedings of Machine Learning Research, pages 416–424, San Diego,
    California, USA, 09–12 May 2015. PMLR. [337]
    E. T. Jaynes.Probability theory: the logic of science. Cambridge university press,

  3. [407]
    A. Jefferson, L. Bortolotti, and B. Kuzmanovic. What is unrealistic optimism?
    Consciousness and Cognition, 50:3–11, 2017. [106]
    P. Joulani, A. Gyorgy, and Cs. Szepesv ́ari. Online learning under delayed feedback.
    In Sanjoy Dasgupta and David McAllester, editors,Proceedings of the 30th
    International Conference on Machine Learning, volume 28 ofProceedings of

Free download pdf