Bandit Algorithms

(Jeff_L) #1

BIBLIOGRAPHY 524


stochastic bandits. InAdvances in Neural Information Processing Systems 30
(NIPS), pages 1761–1769, 2017. [237, 337]
A. R. Conn, K. Scheinberg, and L. N. Vicente.Introduction to Derivative-Free
Optimization. SIAM, 2009. [388]
T. M. Cover. Universal portfolios.Mathematical Finance, 1(1):1–29, 1991. [307]
T. M. Cover and J. A. Thomas.Elements of information theory. John Wiley &
Sons, 2012. [185]
W. Cowan and M. N. Katehakis. An asymptotically optimal policy for uniform
bandits of unknown support.arXiv preprint arXiv:1505.01918, 2015. [201]
W. Cowan, J. Honda, and M. N. Katehakis. Normal bandits of unknown means
and variances: Asymptotic optimality, finite horizon regret bounds, and a
solution to an open problem.arXiv preprint arXiv:1504.05823, 2015. [201]
K. Crammer and C. Gentile. Multiclass classification with bandit feedback using
adaptive regularization.Machine learning, 90(3):347–383, 2013. [269]
N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of
click position-bias models. InProceedings of the 2008 International Conference
on Web Search and Data Mining, pages 87–94. ACM, 2008. [374]
V. Dani, T. P. Hayes, and S. M. Kakade. Stochastic linear optimization under
bandit feedback. InProceedings of Conference on Learning Theory, COLT,
pages 355–366, 2008. [235, 278]
V. H. de la Pe ̃na, T.L. Lai, and Q. Shao.Self-normalized processes: Limit theory
and Statistical Applications. Springer Science & Business Media, 2008. [78,
245]
R. Degenne and V. Perchet. Anytime optimal algorithms in stochastic multi-
armed bandits. In M. F. Balcan and K. Q. Weinberger, editors,Proceedings
of The 33rd International Conference on Machine Learning, volume 48 of
Proceedings of Machine Learning Research, pages 1587–1595, New York, New
York, USA, 20–22 Jun 2016. PMLR. [124]
O. Dekel, C. Gentile, and K. Sridharan. Robust selective sampling from single
and multiple teachers. InCOLT, pages 346–358, 2010. [269]
O. Dekel, C. Gentile, and K. Sridharan. Selective sampling and active learning
from single and multiple teachers.Journal of Machine Learning Research, 13
(Sep):2655–2697, 2012. [269]
A. Dembo and O. Zeitouni. Large deviations techniques and applications,
volume 38. Springer Science & Business Media, 2009. [80]
E. V. Denardo, H. Park, and U. G. Rothblum. Risk-sensitive and risk-neutral
multiarmed bandits.Mathematics of Operations Research, 32(2):374–394, 2007.
[66]
T. Desautels, A. Krause, and J. W. Burdick. Parallelizing exploration-exploitation
tradeoffs in gaussian process bandit optimization.Journal of Machine Learning
Research, 15:4053–4103, 2014. [339]
R. L. Dobrushin. Eine allgemeine formulierung des fundamentalsatzes von shannon
in der informationstheorie.Usp. Mat. Nauk, 14(6(90)):3–104, 1959. [185]

Free download pdf