References 441
Hinton, G. E., Osindero, S. & Teh, Y.-W. (2006), ‘A fast learning algorithm for deep
belief nets’,Neural Computation 18 (7), 1527–1554.
Hiriart-Urruty, J.-B. & Lemar ́echal, C. (1996),Convex Analysis and Minimization Al-
gorithms: Part 1: Fundamentals, Vol. 1, Springer.
Hsu, C.-W., Chang, C.-C. & Lin, C.-J. (2003), ‘A practical guide to support vector
classification’.
Hyafil, L. & Rivest, R. L. (1976), ‘Constructing optimal binary decision trees is NP-
complete’,Information Processing Letters 5 (1), 15–17.
Joachims, T. (2005), A support vector method for multivariate performance measures,
in‘Proceedings of the International Conference on Machine Learning (ICML)’.
Kakade, S., Sridharan, K. & Tewari, A. (2008), On the complexity of linear prediction:
Risk bounds, margin bounds, and regularization,in‘NIPS’.
Karp, R. M. (1972),Reducibility among combinatorial problems, Springer.
Kearns, M. J., Schapire, R. E. & Sellie, L. M. (1994), ‘Toward efficient agnostic learn-
ing’,Machine Learning 17 , 115–141.
Kearns, M. & Mansour, Y. (1996), On the boosting ability of top-down decision tree
learning algorithms,in‘ACM Symposium on the Theory of Computing (STOC)’.
Kearns, M. & Ron, D. (1999), ‘Algorithmic stability and sanity-check bounds for leave-
one-out cross-validation’,Neural Computation 11 (6), 1427–1453.
Kearns, M. & Valiant, L. G. (1988), Learning Boolean formulae or finite automata is
as hard as factoring, Technical Report TR-14-88, Harvard University Aiken Compu-
tation Laboratory.
Kearns, M. & Vazirani, U. (1994),An Introduction to Computational Learning Theory,
MIT Press.
Kleinberg, J. (2003), ‘An impossibility theorem for clustering’,Advances in Neural
Information Processing Systemspp. 463–470.
Klivans, A. R. & Sherstov, A. A. (2006), Cryptographic hardness for learning intersec-
tions of halfspaces,in‘FOCS’.
Koller, D. & Friedman, N. (2009),Probabilistic Graphical Models: Principles and Tech-
niques, MIT Press.
Koltchinskii, V. & Panchenko, D. (2000), Rademacher processes and bounding the risk
of function learning,in‘High Dimensional Probability II’, Springer, pp. 443–457.
Kuhn, H. W. (1955), ‘The hungarian method for the assignment problem’,Naval re-
search logistics quarterly 2 (1-2), 83–97.
Kutin, S. & Niyogi, P. (2002), Almost-everywhere algorithmic stability and general-
ization error,in‘Proceedings of the 18th Conference in Uncertainty in Artificial
Intelligence’, pp. 275–282.
Lafferty, J., McCallum, A. & Pereira, F. (2001), Conditional random fields: Probabilistic
models for segmenting and labeling sequence data,in‘International Conference on
Machine Learning’, pp. 282–289.
Langford, J. (2006), ‘Tutorial on practical prediction theory for classification’,Journal
of machine learning research 6 (1), 273.
Langford, J. & Shawe-Taylor, J. (2003), PAC-Bayes & margins,in‘NIPS’, pp. 423–430.
Le Cun, L. (2004), Large scale online learning.,in‘Advances in Neural Information
Processing Systems 16: Proceedings of the 2003 Conference’, Vol. 16, MIT Press,
p. 217.