Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

4.7 INSTANCE-BASED LEARNING 131


How do you build a kD-tree from a dataset? Can it be updated efficiently as
new training examples are added? And how does it speed up nearest-neighbor
calculations? We tackle the last question first.
To locate the nearest neighbor of a given target point, follow the tree down
from its root to locate the region containing the target. Figure 4.13 shows a space
like that of Figure 4.12(b) but with a few more instances and an extra bound-
ary. The target, which is not one of the instances in the tree, is marked by a star.
The leaf node of the region containing the target is colored black. This is not
necessarily the target’s closest neighbor, as this example illustrates, but it is a
good first approximation. In particular, any nearer neighbor must lie closer—
within the dashed circle in Figure 4.13. To determine whether one exists, first
check whether it is possible for a closer neighbor to lie within the node’s sibling.
The black node’s sibling is shaded in Figure 4.13, and the circle does not inter-
sect it, so the sibling cannot contain a closer neighbor. Then back up to the
parent node and check itssibling—which here covers everything above the hor-
izontal line. In this case it must be explored, because the area it covers intersects
with the best circle so far. To explore it, find its daughters (the original point’s
two aunts), check whether they intersect the circle (the left one does not, but
the right one does), and descend to see whether it contains a closer point (it
does).


Figure 4.13Using a kD-tree to find the nearest neighbor of the star.

Free download pdf