Bandit Algorithms

Bibliography

Y. Abbasi-Yadkori.Forced-exploration based algorithms for playing in bandits
with large action sets. PhD thesis, University of Alberta, 2009a. [235]
Y. Abbasi-Yadkori. Forced-exploration based algorithms for playing in bandits
with large action sets. Master’s thesis, University of Alberta, Department of
Computing Science, 2009b. [91]
Y. Abbasi-Yadkori.Online Learning for Linearly Parametrized Control Problems.
PhD thesis, University of Alberta, 2012. [501, 502]
Y. Abbasi-Yadkori and Cs. Szepesv ́ari. Regret bounds for the adaptive control
of linear quadratic systems. In S. M. Kakade and U. von Luxburg, editors,
Proceedings of the 24th Annual Conference on Learning Theory, volume 19 of
Proceedings of Machine Learning Research, pages 1–26, Budapest, Hungary,
09–11 Jun 2011. PMLR. [501]
Y. Abbasi-Yadkori and Cs. Szepesv ́ari. Bayesian optimal control of smoothly
parameterized systems. InProceedings of the 31st Conference on Uncertainty
in Artificial Intelligence, UAI, pages 2–11, Arlington, Virginia, United States,

AUAI Press. ISBN 978-0-9966431-0-8. [502]
Y. Abbasi-Yadkori, A. Antos, and Cs. Szepesv ́ari. Forced-exploration based
algorithms for playing in stochastic linear bandits. InCOLT Workshop on
On-line Learning with Limited Feedback, 2009. [91, 235]
Y. Abbasi-yadkori, D. P ́al, and Cs. Szepesv ́ari. Improved algorithms for linear
stochastic bandits. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira,
and K. Q. Weinberger, editors,Advances in Neural Information Processing
Systems 24, NIPS, pages 2312–2320. Curran Associates, Inc., 2011. [235]
Y. Abbasi-Yadkori, D. Pal, and Cs. Szepesv ́ari. Online-to-confidence-set
conversions and application to sparse stochastic bandits. In N. D. Lawrence
and M. Girolami, editors,Proceedings of the 15th International Conference
on Artificial Intelligence and Statistics, volume 22 ofProceedings of Machine
Learning Research, pages 1–9, La Palma, Canary Islands, 21–23 Apr 2012.
PMLR. [269]
Y. Abbasi-Yadkori, P. L. Bartlett, V. Kanade, Y. Seldin, and Cs. Szepesv ́ari.
Online learning in Markov decision processes with adversarially chosen
transition probability distributions. In Advances in Neural Information
Processing Systems 26, NIPS, pages 2508–2516, USA, 2013. Curran Associates
Inc. [502]

Bandit Algorithms

Bibliography

Get our desktop app

Company

Features

Documentation

Resources