Bandit Algorithms

(Jeff_L) #1

BIBLIOGRAPHY 533


T. L. Lai. Adaptive treatment allocation and the multi-armed bandit problem.
The Annals of Statistics, pages 1091–1114, 1987. [106, 116, 124, 135, 426]
T. L. Lai. Martingales in sequential analysis and time series, 1945–1985.Electronic
Journal for history of probability and statistics, 5(1), 2009. [245]
T. L. Lai and T. Graves. Asymptotically efficient adaptive choice of control laws
in controlled Markov chains.SIAM Journal on Control and Optimization, 35
(3):715–743, 1997. [501]
T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules.
Advances in applied mathematics, 6(1):4–22, 1985. [66, 106, 115, 116, 135, 201,
250]
J. Langford and T. Zhang. The epoch-greedy algorithm for multi-armed bandits
with side information. In J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis,
editors,Advances in Neural Information Processing Systems 20, NIPS, pages
817–824. Curran Associates, Inc., 2008. [91, 225]
P. Laplace.Pierre-Simon Laplace Philosophical Essay on Probabilities: Translated
from the fifth French edition of 1825 With Notes by the Translator, volume 13.
Springer Science & Business Media, 2012. [41]
T. Lattimore. The pareto regret frontier for bandits. In C. Cortes, N. D.
Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors,Advances in
Neural Information Processing Systems 28, NIPS, pages 208–216. Curran
Associates, Inc., 2015a. [152, 279]
T. Lattimore. Optimally confident UCB: Improved regret for finite-armed bandits.
arXiv preprint arXiv:1507.07880, 2015b. [124]
T. Lattimore. Regret analysis of the finite-horizon Gittins index strategy for
multi-armed bandits. In V. Feldman, A. Rakhlin, and O. Shamir, editors,29th
Annual Conference on Learning Theory, volume 49 ofProceedings of Machine
Learning Research, pages 1214–1245, Columbia University, New York, New
York, USA, 23–26 Jun 2016a. PMLR. [115, 427]
T. Lattimore. Regret analysis of the anytime optimally confident UCB algorithm.
Technical report, University of Alberta, 2016b. [124]
T. Lattimore. Regret analysis of the finite-horizon gittins index strategy for
multi-armed bandits. InConference on Learning Theory, pages 1214–1245,
2016c. [425]
T. Lattimore. A scale free algorithm for stochastic bandits with bounded kurtosis.
In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan,
and R. Garnett, editors,Advances in Neural Information Processing Systems
30 , pages 1584–1593. Curran Associates, Inc., 2017. [111, 200, 201]
T. Lattimore. Refining the confidence level for optimistic bandit strategies.
Journal of Machine Learning Research, 2018. [124, 126, 201]
T. Lattimore and M. Hutter. PAC bounds for discounted MDPs. In Nicolas Vayatis
Nader H. Bshouty, Gilles Stoltz and Thomas Zeugmann, editors,Proceedings
of the 23th International Conference on Algorithmic Learning Theory, volume
7568 ofLecture Notes in Computer Science, pages 320–334. Springer Berlin /
Heidelberg, 2012. [502]

Free download pdf