Algorithms in a Nutshell

(Tina Meador) #1
Nearest Neighbor Queries | 289

Computational
Geometry

From this random data, the number of double recursions appears to be on the
order of .3*log(n) for two dimensions, but this jumps to 342*log(n) for 10 dimen-
sions (a 1,000-fold increase). The important observation is that both of these
estimation functions conform to O(logn). But what happens whendincreases to
be “sufficiently close” tonin some way? The data graphed in Figure 9-22 shows
that asdincreases, the number of double recursions actually approachesn/2. In
fact, asdincreases, the number of single recursions conforms to a normal distri-
bution whose mean is very close to log(n), which tells us that eventually all
recursive invocations are of the double variety. The impact this fact has on the
performance of nearest neighbor queries is that asdapproaches log(n), the invest-
ment in using kd-trees begins to diminish until the resulting performance is no
better than O(n) since the number of double recursions plateaus atn/2.
Certain input set data sets force NEARESTNEIGHBORto work hard even in two
dimensions. For example, let’s change the input for Table 9-6 such that then
unique two-dimensional points are found on a unit circle of radiusr>1, but the
nearest query points still lie within the unit square. Whenn=131,072 points, the
number of single recursions has jumped 10-fold to 235.8 while the number of
double recursions has exploded to 932.78 (a 200-fold increase!). Thus the nearest
neighbor query will degenerate in the worst case to O(n) given specifically tailored
queries for a given input set.
We can also evaluate the performance of the NEARESTNEIGHBORalgorithm by
comparing its performance against a straight brute force O(n) comparison. Given
a data set of sizen=4,096 points where 128 searches random are to be executed,
how large must the dimensionalitydof the input set be before the brute-force
NEARESTNEIGHBORimplementation outperforms the kd-tree implementation?
We ran 100 trials and discarded the best and worst trials, computing the average
of the remaining 98 trials. The results are graphed in Figure 9-23 and show that
ford=10 dimensions and higher, the brute-force nearest neighbor implementa-
tion outperforms the NEARESTNEIGHBORkd-tree algorithm. If we increase the
number of points ton=131,072, the crossover occurs atd=12, so the specific
crossover point depends upon the machine hardware on which the code executes
the specific values ofnandd, and the distribution of points in the input set. We
do not include in this crossover analysis the cost of constructing the kd-tree since
that cost can be amortized across all searches; when done in this case the results
shown in Figure 9-23 still hold.

4,096 15.96 2.72 466.1 1183.1
8,192 17.18 3.3 925.22 1876.66
16,384 19.9 3.38 1552.98 2939.08
32,768 18.78 3.14 2769.72 5118.76
65,536 20.88 3.16 3272.24 4788.3
131,072 23.32 3.98 5376.06 7703.72

Table 9-6. Ratio of double recursion invocations to single (continued)

n

d=2
#Recursions

d=2
#Double recursion

d=10
#Recursion

d=10
#Double recursion

Algorithms in a Nutshell
Algorithms in a Nutshell By Gary Pollice, George T. Heineman, Stanley Selkow ISBN:
9780596516246 Publisher: O'Reilly Media, Inc.


Prepared for Ming Yi, Safari ID: [email protected]
Licensed by Ming Yi
Print Publication Date: 2008/10/21 User number: 594243
© 2009 Safari Books Online, LLC. This PDF is made available for personal use only during the relevant subscription term, subject to the Safari Terms of Service. Any other use

Free download pdf