Bandit Algorithms

BIBLIOGRAPHY 542

V. Syrgkanis, A. Krishnamurthy, and R. Schapire. Efficient algorithms for
adversarial contextual learning. InProceedings of The 33rd International
Conference on Machine Learning, volume 48 ofProceedings of Machine Learning
Research, pages 2159–2168, New York, New York, USA, 2016. PMLR. [223]
Cs. Szepesv ́ari.Algorithms for Reinforcement Learning. Synthesis Lectures on
Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers,

[501]
I. Szita and A. L ̋orincz. Optimistic initialization and greediness lead to polynomial
time learning in factored MDPs. InProceedings of the 26th International
Conference on Machine Learning, ICML ’09, pages 1001–1008, New York, NY,
USA, 2009. ACM. [502]
I. Szita and Cs. Szepesv ́ari. Model-based reinforcement learning with nearly
tight exploration complexity bounds. InProceedings of the 27th International
Conference on International Conference on Machine Learning, ICML’10, pages
1031–1038, USA, 2010. Omnipress. ISBN 978-1-60558-907-7. [502]
M. Talagrand. The missing factor in Hoeffding’s inequalities.Annales de l’IHP
Probabilit ́es et statistiques, 31(4):689–702, 1995. [77]
Gunnar Taraldsen. Optimal learning from the doob-dynkin lemma.arXiv preprint
arXiv:1801.00974, 2018. [37]
J. Teevan, S. T. Dumais, and E. Horvitz. Characterizing the value of personalizing
search. InProceedings of the 30th Annual International ACM SIGIR Conference
on Research and Development in Information Retrieval, pages 757–758, New
York, NY, USA, 2007. ACM. [375]
A. Tewari and P. L. Bartlett. Optimistic linear programming gives logarithmic
regret for irreducible mdps. In J. C. Platt, D. Koller, Y. Singer, and S. T.
Roweis, editors,Advances in Neural Information Processing Systems 20, pages
1505–1512. Curran Associates, Inc., 2008. [501]
A. Tewari and S. A. Murphy. From ads to interventions: Contextual bandits in
mobile health. InMobile Health, pages 495–517. Springer, 2017. [222]
G. Theocharous, Z. Wen, Y. Abbasi-Yadkori, and N. Vlassis. Posterior sampling
for large scale reinforcement learning.arXiv preprint arXiv:1711.07979, 2017.
[502]
W. Thompson. On the likelihood that one unknown probability exceeds another
in view of the evidence of two samples.Biometrika, 25(3/4):285–294, 1933. [7,
15, 66, 91, 429, 441, 444]
W. R. Thompson. On the theory of apportionment. American Journal of
Mathematics, 57(2):450–456, 1935. [500]
M. J. Todd.Minimum-volume ellipsoids: Theory and algorithms. SIAM, 2016.
[255, 256]
J. R. R. Tolkien.The Hobbit. Ballantine Books, 1937. [429]
L. Tran-Thanh, A. Chapman, E. Munoz de Cote, A. Rogers, and N. R. Jennings.
Epsilon–first policies for budget–limited multi-armed bandits. InProceedings of
the 24th AAAI Conference on Artificial Intel ligence, AAAI, pages 1211–1216,

[338]

Bandit Algorithms

BIBLIOGRAPHY 542

Get our desktop app

Company

Features

Documentation

Resources