Bandit Algorithms

(Jeff_L) #1

BIBLIOGRAPHY 532


T. Koc ́ak, M. Valko, R. Munos, and S. Agrawal. Spectral Thompson sampling.
InAAAI, pages 1911–1917, 2014. [445]
L. Kocsis and Cs. Szepesv ́ari. Discounted UCB. In2nd PASCAL Challenges
Workshop, pages 784–791, 2006. [8, 16, 361]
H. Komiya. Elementary proof for sion’s minimax theorem.Kodai mathematical
journal, 11(1):5–7, 1988. [322]
J. Komiyama, J. Honda, H. Kashima, and H. Nakagawa. Regret lower bound
and optimal algorithm in dueling bandit problem. In P. Gr ̈unwald, E. Hazan,
and S. Kale, editors,Proceedings of The 28th Conference on Learning Theory,
volume 40 ofProceedings of Machine Learning Research, pages 1141–1154,
Paris, France, 03–06 Jul 2015a. PMLR. [337]
J. Komiyama, J. Honda, and H. Nakagawa. Regret lower bound and optimal
algorithm in finite stochastic partial monitoring. In C. Cortes, N. D. Lawrence,
D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural
Information Processing Systems 28, NIPS, pages 1792–1800. Curran Associates,
Inc., 2015b. [473]
W. M. Koolen, M. K. Warmuth, and J. Kivinen. Hedging structured concepts.
InCOLT, pages 93–105. Omnipress, 2010. [350]
N. Korda, E. Kaufmann, and R. Munos. Thompson sampling for 1-dimensional
exponential family bandits. In C. J. C. Burges, L. Bottou, M. Welling,
Z. Ghahramani, and K. Q. Weinberger, editors,Advances in Neural Information
Processing Systems 26, pages 1448–1456. Curran Associates, Inc., 2013. [116,
137, 442, 444]
S. R. Kulkarni and G. Lugosi. Finite-time lower bounds for the two-armed bandit
problem.IEEE Transactions on Automatic Control, 45(4):711–714, 2000. [201]
B. Kveton, Cs. Szepesv ́ari, Z. Wen, and A. Ashkan. Cascading bandits: Learning to
rank in the cascade model. InProceedings of the 32nd International Conference
on International Conference on Machine Learning - Volume 37, pages 767–776.
JMLR.org, 2015a. [374]
B. Kveton, Z. Wen, A. Ashkan, and Cs. Szepesv ́ari. Tight regret bounds
for stochastic combinatorial semi-bandits. In G. Lebanon and S. V. N.
Vishwanathan, editors,Proceedings of the 18th International Conference on
Artificial Intelligence and Statistics, volume 38 ofProceedings of Machine
Learning Research, pages 535–543, San Diego, California, USA, 09–12 May
2015b. PMLR. [350]
B. Kveton, Z. Wen, Z. Ashkan, and Cs. Szepesv ́ari. Combinatorial cascading
bandits. InAdvances in Neural Information Processing Systems 28, NIPS,
pages 1450–1458. Curran Associates Inc., 2015c. [374]
B. Kveton, Cs. Szepesv ́ari, Z. Wen, M. Ghavamzadeh, and T. Lattimore. Garbage
in, reward out: Bootstrapping exploration in multi-armed bandits. 2018. [445]
P. Lagree, C. Vernade, and O. Capp ́e. Multiple-play bandits in the position-based
model. InAdvances in Neural Information Processing Systems 29, NIPS, pages
1597–1605. Curran Associates Inc., 2016. [374]

Free download pdf