Understanding Machine Learning: From Theory to Algorithms

19 Nearest Neighbor

Nearest Neighbor algorithms are among the simplest of all machine learning algorithms. The idea is to memorize the training set and then to predict the label of any new instance on the basis of the labels of its closest neighbors in the training set. The rationale behind such a method is based on the assumption that the features that are used to describe the domain points are relevant to their labelings in a way that makes close-by points likely to have the same label. Furthermore, in some situations, even when the training set is immense, finding a nearest neighbor can be done extremely fast (for example, when the training set is the entire Web and distances are based on links). Note that, in contrast with the algorithmic paradigms that we have discussed so far, like ERM, SRM, MDL, or RLM, that are determined by some hypothesis class,H, the Nearest Neighbor method figures out a label on any test point without searching for a predictor within some predefined class of functions. In this chapter we describe Nearest Neighbor methods for classification and regression problems. We analyze their performance for the simple case of binary classification and discuss the efficiency of implementing these methods.

19.1 kNearest Neighbors

Throughout the entire chapter we assume that our instance domain,X, is en- dowed with a metric functionρ. That is,ρ:X×X →Ris a function that returns the distance between any two elements ofX. For example, ifX=Rdthenρcan be the Euclidean distance,ρ(x,x′) =‖x−x′‖=

√∑

d i=1(xi−x′i)^2. LetS= (x 1 ,y 1 ),...,(xm,ym) be a sequence of training examples. For each x∈ X, letπ 1 (x),...,πm(x) be a reordering of{ 1 ,...,m}according to their distance tox,ρ(x,xi). That is, for alli < m,

ρ(x,xπi(x))≤ρ(x,xπi+1(x)).

For a numberk, thek-NN rule for binary classification is defined as follows:

Understanding Machine Learning,©c2014 by Shai Shalev-Shwartz and Shai Ben-David Published 2014 by Cambridge University Press. Personal use only. Not for distribution. Do not post. Please link tohttp://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning

Understanding Machine Learning: From Theory to Algorithms

19 Nearest Neighbor

19.1 kNearest Neighbors

√∑

Get our desktop app

Company

Features

Documentation

Resources