BIBLIOGRAPHY 543
L. Tran-Thanh, A. Chapman, A. Rogers, and N. R. Jennings. Knapsack based
optimal policies for budget-limited multi-armed bandits. InProceedings of the
26th AAAI Conference on Artificial Intelligence, AAAI’12, pages 1134–1140.
AAAI Press, 2012. [338]
J. A. Tropp. An introduction to matrix concentration inequalities.Foundations
and Trends®in Machine Learning, 8(1-2):1–230, 2015. [78]
J. N. Tsitsiklis. A short proof of the Gittins index theorem. The Annals of
Applied Probability, pages 194–199, 1994. [427]
A. B. Tsybakov.Introduction to nonparametric estimation. Springer Science &
Business Media, 2008. [184, 193]
C. Ionescu Tulcea. Mesures dans les espaces produits.Atti Accad. Naz. Lincei
Rend, 7:208–211, 1949–50. [52]
E. Uchibe and K. Doya. Competitive-cooperative-concurrent reinforcement
learning with importance sampling. InProceedings of the International
Conference on Simulation of Adaptive Behavior: From Animals and Animats,
pages 287–296, 2004. [165]
A. W. Van Der Vaart and J. A. Wellner. Weak convergence. InWeak Convergence
and Empirical Processes, pages 16–28. Springer, 1996. [83, 321]
M. Valko. Bandits on graphs and structures, 2016. [339]
M. Valko, A. Carpentier, and R. Munos. Stochastic simultaneous optimistic
optimization. In Sanjoy Dasgupta and David McAllester, editors,Proceedings
of the 30th International Conference on Machine Learning, volume 28 of
Proceedings of Machine Learning Research, pages 19–27, Atlanta, Georgia,
USA, 17–19 Jun 2013a. PMLR. [388]
M. Valko, N. Korda, R. Munos, I. Flaounas, and N. Cristianini. Finite-time
analysis of kernelised contextual bandits. InProceedings of the 29th Conference
on Uncertainty in Artificial Intelligence, UAI, pages 654–663, Arlington,
Virginia, United States, 2013b. AUAI Press. [235]
M. Valko, R. Munos, B. Kveton, and T. Koc ́ak. Spectral bandits for smooth
graph functions. In E. P. Xing and T. Jebara, editors,Proceedings of the
31st International Conference on Machine Learning, volume 32 ofProceedings
of Machine Learning Research, pages 46–54, Bejing, China, 22–24 Jun 2014.
PMLR. [235, 259]
S. van de Geer. Empirical Processes in M-estimation, volume 6. Cambridge
university press, 2000. [78, 83, 245, 321]
D. van der Hoeven, T. van Erven, and W. Kot lowski. The many faces of
exponential weights in online learning.arXiv preprint arXiv:1802.07543, 2018.
[307]
H. P. Vanchinathan, G. Bart ́ok, and A. Krause. Efficient partial monitoring with
prior information. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence,
and K. Q. Weinberger, editors,Advances in Neural Information Processing
Systems 27, NIPS, pages 1691–1699. Curran Associates, Inc., 2014. [473]
Vladimir Vapnik.Statistical learning theory. 1998, volume 3. Wiley, New York,
- [226]