Bandit Algorithms

(Jeff_L) #1

BIBLIOGRAPHY 530


Machine Learning Research, pages 1453–1461, Atlanta, Georgia, USA, 17–19
Jun 2013. PMLR. [339]
K. Jun, A. Bhargava, R. Nowak, and R. Willett. Scalable generalized linear
bandits: Online computation and hashing. In I. Guyon, U. V. Luxburg,
S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,
Advances in Neural Information Processing Systems 30, pages 99–109. Curran
Associates, Inc., 2017. [235, 269]
L. P. Kaelbling.Learning in embedded systems. MIT press, 1993. [106]
D. Kahneman and A. Tversky. Prospect theory: An analysis of decision under
risk.Econometrica, 47(2):263–91, 1979. [65]
S. Kakade.On The Sample Complexity Of Reinforcement Learning. PhD thesis,
University College London, 2003. [502]
S. M. Kakade, S. Shalev-Shwartz, and A. Tewari. Efficient bandit algorithms for
online multiclass prediction. InProceedings of the 25th International Conference
on Machine Learning, pages 440–447, 2008. [223]
A. Kalai and S. Vempala. Efficient algorithms for online decision problems.
Journal of Computer and System Sciences, 71(3):291–307, 2005a. [322, 350]
A. Kalai and S. Vempala. Efficient algorithms for online decision problems.
Journal of Computer and System Sciences, 71(3):291–307, 2005b. [350]
L. Kallenberg. A note on M.N. Katehakis’ and Y.-R. Chen’s computation of the
Gittins index.Mathematics of operations research, 11(1):184–186, 1986. [427]
L. Kallenberg. Markov decision processes: Lecture notes. 2016. [500]
O. Kallenberg.Foundations of modern probability. Springer-Verlag, 2002. [40, 41,
52, 186, 407, 408]
Z. Karnin, T. Koren, and O. Somekh. Almost optimal exploration in multi-armed
bandits. In S. Dasgupta and D. McAllester, editors,Proceedings of the 30th
International Conference on Machine Learning, volume 28 ofProceedings of
Machine Learning Research, pages 1238–1246, Atlanta, Georgia, USA, 17–19
Jun 2013. PMLR. [388]
S. Katariya, B. Kveton, Cs. Szepesv ́ari, and Z. Wen. DCM bandits: Learning to
rank with multiple clicks. InProceedings of the 33rd International Conference
on Machine Learning, pages 1215–1224, 2016. [374]
S. Katariya, B. Kveton, Cs. Szepesv ́ari, C. Vernade, and Z. Wen. Bernoulli
rank-1 bandits for click feedback. InProceedings of the 26th International Joint
Conference on Artificial Intel ligence, 2017a. [374]
S. Katariya, B. Kveton, Cs. Szepesv ́ari, C. Vernade, and Z. Wen. Stochastic rank-
1 bandits. InProceedings of the 20th International Conference on Artificial
Intel ligence and Statistics, 2017b. [374]
M. N. Katehakis and H. Robbins. Sequential choice from several populations.
Proceedings of the National Academy of Sciences of the United States of America,
92(19):8584, 1995. [106, 115, 116]
V. Ya Katkovnik and Yu Kulchitsky. Convergence of a class of random search
algorithms.Automation Remote Control, 8:1321–1326, 1972. [389]

Free download pdf