BIBLIOGRAPHY 516
N. Ailon, Z. Karnin, and T. Joachims. Reducing dueling bandits to cardinal
bandits. InProceedings of the 31st International Conference on International
Conference on Machine Learning, ICML’14, pages II–856–II–864. JMLR.org,
- [337]
J. Aldrich. “but you have to remember P. J. Daniell of Sheffield”.Electronic
Journal for History of Probability and Statistics, 3(2), 2007. [52]
C. Allenberg, P. Auer, L. Gy ̈orfi, and G. Ottucs ́ak. Hannan consistency in on-line
learning in case of unbounded losses under partial monitoring. InProceedings
of the 17th International Conference on Algorithmic Learning Theory, ALT,
pages 229–243, Berlin, Heidelberg, 2006. Springer-Verlag. [152, 164, 320]
N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the
frequency moments. InProceedings of the 28th annual ACM symposium on
Theory of computing, pages 20–29. ACM, 1996. [111]
N. Alon, N. Cesa-Bianchi, C. Gentile, and Y. Mansour. From bandits to experts: A
tale of domination and independence. In C. J. C. Burges, L. Bottou, M. Welling,
Z. Ghahramani, and K. Q. Weinberger, editors,Advances in Neural Information
Processing Systems 26, NIPS, pages 1610–1618. Curran Associates, Inc., 2013.
[339, 473]
N. Alon, N. Cesa-Bianchi, O. Dekel, and T. Koren. Online learning with feedback
graphs: Beyond bandits. In Peter Gr ̈unwald, Elad Hazan, and Satyen Kale,
editors,Proceedings of The 28th Conference on Learning Theory, volume 40 of
Proceedings of Machine Learning Research, pages 23–35, Paris, France, 03–06
Jul 2015. PMLR. [339]
V. Anantharam, P. Varaiya, and J. Walrand. Asymptotically efficient allocation
rules for the multiarmed bandit problem with multiple plays-part i: Iid rewards.
IEEE Transactions on Automatic Control, 32(11):968–976, 1987. [235]
J. R. Anderson, J. L. Dillon, and J. E. Hardaker. Agricultural decision analysis.
- [2]
F. J. Anscombe. Sequential medical trials.Journal of the American Statistical
Association, 58(302):365–383, 1963. [91]
A. Antos, G. Bart ́ok, D. P ́al, and Cs. Szepesv ́ari. Toward a classification of
finite partial-monitoring games.Theoretical Computer Science, 473:77–99, 2013.
[472]
A. Arapostathis, V. S. Borkar, E. Fernandez-Gaucherand, M. K. Ghosh, and S. I.
Marcus. Discrete-time controlled Markov processes with average cost criterion:
a survey. SIAM Journal of Control and Optimization, 31(2):282–344, 1993.
[500]
R. Arora, O. Dekel, and A. Tewari. Online bandit learning against an adaptive
adversary: from regret to policy regret.arXiv preprint arXiv:1206.6400, 2012.
[153]
B. Ashwinkumar, J. Langford, and A. Slivkins. Resourceful contextual bandits.
In M. F. Balcan, V. Feldman, and Cs. Szepesv ́ari, editors,Proceedings of The
27th Conference on Learning Theory, volume 35 ofProceedings of Machine