Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

19 Nearest Neighbor


Nearest Neighbor algorithms are among the simplest of all machine learning
algorithms. The idea is to memorize the training set and then to predict the
label of any new instance on the basis of the labels of its closest neighbors in
the training set. The rationale behind such a method is based on the assumption
that the features that are used to describe the domain points are relevant to
their labelings in a way that makes close-by points likely to have the same label.
Furthermore, in some situations, even when the training set is immense, finding
a nearest neighbor can be done extremely fast (for example, when the training
set is the entire Web and distances are based on links).
Note that, in contrast with the algorithmic paradigms that we have discussed
so far, like ERM, SRM, MDL, or RLM, that are determined by some hypothesis
class,H, the Nearest Neighbor method figures out a label on any test point
without searching for a predictor within some predefined class of functions.
In this chapter we describe Nearest Neighbor methods for classification and
regression problems. We analyze their performance for the simple case of binary
classification and discuss the efficiency of implementing these methods.

19.1 kNearest Neighbors


Throughout the entire chapter we assume that our instance domain,X, is en-
dowed with a metric functionρ. That is,ρ:X×X →Ris a function that returns
the distance between any two elements ofX. For example, ifX=Rdthenρcan
be the Euclidean distance,ρ(x,x′) =‖x−x′‖=

√∑

d
i=1(xi−x′i)^2.
LetS= (x 1 ,y 1 ),...,(xm,ym) be a sequence of training examples. For each
x∈ X, letπ 1 (x),...,πm(x) be a reordering of{ 1 ,...,m}according to their
distance tox,ρ(x,xi). That is, for alli < m,

ρ(x,xπi(x))≤ρ(x,xπi+1(x)).

For a numberk, thek-NN rule for binary classification is defined as follows:

Understanding Machine Learning,©c2014 by Shai Shalev-Shwartz and Shai Ben-David
Published 2014 by Cambridge University Press.
Personal use only. Not for distribution. Do not post.
Please link tohttp://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning
Free download pdf