Understanding Machine Learning: From Theory to Algorithms

References 439

Breiman, L. (1996), Bias, variance, and arcing classifiers, Technical Report 460, Statis-
tics Department, University of California at Berkeley.
Breiman, L. (2001), ‘Random forests’,Machine learning 45 (1), 5–32.
Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. J. (1984),Classification and
Regression Trees, Wadsworth & Brooks.
Cand`es, E. (2008), ‘The restricted isometry property and its implications for com-
pressed sensing’,Comptes Rendus Mathematique 346 (9), 589–592.
Candes, E. J. (2006), Compressive sampling,in‘Proc. of the Int. Congress of Math.,
Madrid, Spain’.
Candes, E. & Tao, T. (2005), ‘Decoding by linear programming’,IEEE Trans. on
Information Theory 51 , 4203–4215.
Cesa-Bianchi, N. & Lugosi, G. (2006),Prediction, learning, and games, Cambridge
University Press.
Chang, H. S., Weiss, Y. & Freeman, W. T. (2009), ‘Informative sensing’,arXiv preprint
arXiv:0901.4275.
Chapelle, O., Le, Q. & Smola, A. (2007), Large margin optimization of ranking mea-
sures,in‘NIPS Workshop: Machine Learning for Web Search’.
Collins, M. (2000), Discriminative reranking for natural language parsing,in‘Machine
Learning’.
Collins, M. (2002), Discriminative training methods for hidden Markov models: Theory
and experiments with perceptron algorithms,in‘Conference on Empirical Methods
in Natural Language Processing’.
Collobert, R. & Weston, J. (2008), A unified architecture for natural language process-
ing: deep neural networks with multitask learning,in‘International Conference on
Machine Learning (ICML)’.
Cortes, C. & Vapnik, V. (1995), ‘Support-vector networks’, Machine Learning
20 (3), 273–297.
Cover, T. (1965), ‘Behavior of sequential predictors of binary sequences’,Trans. 4th
Prague Conf. Information Theory Statistical Decision Functions, Random Processes
pp. 263–272.
Cover, T. & Hart, P. (1967), ‘Nearest neighbor pattern classification’,Information
Theory, IEEE Transactions on 13 (1), 21–27.
Crammer, K. & Singer, Y. (2001), ‘On the algorithmic implementation of multiclass
kernel-based vector machines’,Journal of Machine Learning Research 2 , 265–292.
Cristianini, N. & Shawe-Taylor, J. (2000),An Introduction to Support Vector Machines,
Cambridge University Press.
Daniely, A., Sabato, S., Ben-David, S. & Shalev-Shwartz, S. (2011), Multiclass learn-
ability and the erm principle,in‘Conference on Learning Theory (COLT)’.
Daniely, A., Sabato, S. & Shwartz, S. S. (2012), Multiclass learning approaches: A
theoretical comparison with implications,in‘NIPS’.
Davis, G., Mallat, S. & Avellaneda, M. (1997), ‘Greedy adaptive approximation’,Jour-
nal of Constructive Approximation 13 , 57–98.
Devroye, L. & Gy ̈orfi, L. (1985),Nonparametric Density Estimation: The L B1 S View,
Wiley.
Devroye, L., Gy ̈orfi, L. & Lugosi, G. (1996),A Probabilistic Theory of Pattern Recog-
nition, Springer.

Understanding Machine Learning: From Theory to Algorithms

Get our desktop app

Company

Features

Documentation

Resources