BIBLIOGRAPHY 518
continuum-armed bandit problem. In International Conference on
Computational Learning Theory, pages 454–468. Springer, 2007. [337]
P. Auer, T. Jaksch, and R. Ortner. Near-optimal regret bounds for reinforcement
learning. InAdvances in Neural Information Processing Systems 21, NIPS,
pages 89–96, 2009. [501]
B. Awerbuch and R. Kleinberg. Adaptive routing with end-to-end feedback:
Distributed learning and geometric approaches. InProceedings of the 36th
annual ACM symposium on Theory of computing, pages 45–53. ACM, 2004.
[350]
S. J. Axler.Linear algebra done right, volume 2. Springer, 1997. [470]
M. G. Azar, I. Osband, and R. Munos. Minimax regret bounds for reinforcement
learning. In D. Precup and Y. W. Teh, editors,Proceedings of the 34th
International Conference on Machine Learning, volume 70 ofProceedings of
Machine Learning Research, pages 263–272, International Convention Centre,
Sydney, Australia, 06–11 Aug 2017. PMLR. [501]
A. Badanidiyuru, R. Kleinberg, and A. Slivkins. Bandits with knapsacks. In
Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium
on, pages 207–216. IEEE, 2013. [338]
P. L. Bartlett and A. Tewari. Regal: A regularization based algorithm for
reinforcement learning in weakly communicating MDPs. InProceedings of the
25th Conference on Uncertainty in Artificial Intelligence, UAI, pages 35–42,
Arlington, Virginia, United States, 2009. AUAI Press. [506]
G. Bart ́ok. A near-optimal algorithm for finite partial-monitoring games
against adversarial opponents. In S. Shalev-Shwartz and I. Steinwart, editors,
Proceedings of the 26th Annual Conference on Learning Theory, volume 30,
pages 696–710. PMLR, 2013. [471, 473]
G. Bart ́ok, D. P ́al, and Cs. Szepesv ́ari. Toward a classification of finite partial-
monitoring games. InInternational Conference on Algorithmic Learning Theory,
pages 224–238. Springer, 2010. [472]
G. Bart ́ok, N. Zolghadr, and Cs. Szepesv ́ari. An adaptive algorithm for finite
stochastic partial monitoring. InProceedings of the 29th International Coference
on International Conference on Machine Learning, ICML, pages 1779–1786,
USA, 2012. Omnipress. [473]
G. Bart ́ok, D. P. Foster, D. P ́al, A. Rakhlin, and Cs. Szepesv ́ari. Partial
monitoring—classification, regret bounds, and algorithms. Mathematics of
Operations Research, 39(4):967–997, 2014. [472]
J. A. Bather and H. Chernoff. Sequential decisions in the control of a spaceship. In
Fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 3,
pages 181–207, 1967. [15]
R. Bellman. The theory of dynamic programming. Technical report, RAND
CORP SANTA MONICA CA, 1954. [500]
R. E. Bellman.Eye of the Hurricane. World Scientific, 1984. [500]
J. O. Berger.Statistical Decision Theory and Bayesian Analysis. Springer Science
& Business Media, 1985. [406]