Pattern Recognition and Machine Learning

(Jeff_L) #1
126 2. PROBABILITY DISTRIBUTIONS

Figure 2.27 (a) In theK-nearest-
neighbour classifier, a new point,
shown by the black diamond, is clas-
sified according to the majority class
membership of theKclosest train-
ing data points, in this caseK =
3. (b) In the nearest-neighbour
(K=1) approach to classification,
the resulting decision boundary is
composed of hyperplanes that form
perpendicular bisectors of pairs of
points from different classes.


x 1

x 2

(a)

x 1

x 2

(b)

If we wish to minimize the probability of misclassification, this is done by assigning
the test pointxto the class having the largest posterior probability, corresponding to
the largest value ofKk/K. Thus to classify a new point, we identify theKnearest
points from the training data set and then assign the new point to the class having the
largest number of representatives amongst this set. Ties can be broken at random.
The particular case ofK=1is called thenearest-neighbourrule, because a test
point is simply assigned to the same class as the nearest point from the training set.
These concepts are illustrated in Figure 2.27.
In Figure 2.28, we show the results of applying theK-nearest-neighbour algo-
rithm to the oil flow data, introduced in Chapter 1, for various values ofK.As
expected, we see thatKcontrols the degree of smoothing, so that smallKproduces
many small regions of each class, whereas largeKleads to fewer larger regions.

x 6

x 7

K=1

0 1 2

0

1

2

x 6

x 7

K=3

0 1 2

0

1

2

x 6

x 7

K=31

0 1 2

0

1

2

Figure 2.28 Plot of 200 data points from the oil data set showing values ofx 6 plotted againstx 7 , where the
red, green, and blue points correspond to the ‘laminar’, ‘annular’, and ‘homogeneous’ classes, respectively. Also
shown are the classifications of the input space given by theK-nearest-neighbour algorithm for various values
ofK.

Free download pdf