and select the smallest. This procedure is linear in the number of training
instances: in other words, the time it takes to make a single prediction is pro-
portional to the number of training instances. Processing an entire test set takes
time proportional to the product of the number of instances in the training and
test sets.
Nearest neighbors can be found more efficiently by representing the training
set as a tree, although it is not quite obvious how. One suitable structure is a
kD-tree.This is a binary tree that divides the input space with a hyperplane
and then splits each partition again, recursively. All splits are made parallel to
one of the axes, either vertically or horizontally, in the two-dimensional case.
The data structure is called a kD-treebecause it stores a set of points in k-
dimensional space,kbeing the number of attributes.
Figure 4.12(a) gives a small example with k=2, and Figure 4.12(b) shows the
four training instances it represents, along with the hyperplanes that constitute
the tree. Note that these hyperplanes are notdecision boundaries: decisions are
made on a nearest-neighbor basis as explained later. The first split is horizon-
tal (h),through the point (7,4)—this is the tree’s root. The left branch is not
split further: it contains the single point (2,2), which is a leaf of the tree. The
right branch is split vertically (v) at the point (6,7). Its left child is empty, and
its right child contains the point (3,8). As this example illustrates, each region
contains just one point—or, perhaps, no points. Sibling branches of the tree—
for example, the two daughters of the root in Figure 4.12(a)—are not neces-
sarily developed to the same depth. Every point in the training set corresponds
to a single node, and up to half are leaf nodes.
130 CHAPTER 4| ALGORITHMS: THE BASIC METHODS
(2,2)
(7,4); h
(6,7); v
(3,8)
(a) a
a
(2,2)
(7,4)
(6,7)
(3,8)
2
(b)^1
Figure 4.12A kD-tree for four training instances: (a) the tree and (b) instances and
splits.