Algorithms in a Nutshell

Nearest Neighbor Queries | 289

Computational Geometry

From this random data, the number of double recursions appears to be on the order of .3*log(n) for two dimensions, but this jumps to 342*log(n) for 10 dimensions (a 1,000-fold increase). The important observation is that both of these estimation functions conform to O(logn). But what happens whendincreases to be “sufficiently close” tonin some way? The data graphed in Figure 9-22 shows that asdincreases, the number of double recursions actually approachesn/2. In fact, asdincreases, the number of single recursions conforms to a normal distribution whose mean is very close to log(n), which tells us that eventually all recursive invocations are of the double variety. The impact this fact has on the performance of nearest neighbor queries is that asdapproaches log(n), the invest- ment in using kd-trees begins to diminish until the resulting performance is no better than O(n) since the number of double recursions plateaus atn/2. Certain input set data sets force NEARESTNEIGHBORto work hard even in two dimensions. For example, let’s change the input for Table 9-6 such that then unique two-dimensional points are found on a unit circle of radiusr>1, but the nearest query points still lie within the unit square. Whenn=131,072 points, the number of single recursions has jumped 10-fold to 235.8 while the number of double recursions has exploded to 932.78 (a 200-fold increase!). Thus the nearest neighbor query will degenerate in the worst case to O(n) given specifically tailored queries for a given input set. We can also evaluate the performance of the NEARESTNEIGHBORalgorithm by comparing its performance against a straight brute force O(n) comparison. Given a data set of sizen=4,096 points where 128 searches random are to be executed, how large must the dimensionalitydof the input set be before the brute-force NEARESTNEIGHBORimplementation outperforms the kd-tree implementation? We ran 100 trials and discarded the best and worst trials, computing the average of the remaining 98 trials. The results are graphed in Figure 9-23 and show that ford=10 dimensions and higher, the brute-force nearest neighbor implementation outperforms the NEARESTNEIGHBORkd-tree algorithm. If we increase the number of points ton=131,072, the crossover occurs atd=12, so the specific crossover point depends upon the machine hardware on which the code executes the specific values ofnandd, and the distribution of points in the input set. We do not include in this crossover analysis the cost of constructing the kd-tree since that cost can be amortized across all searches; when done in this case the results shown in Figure 9-23 still hold.

4,096 15.96 2.72 466.1 1183.1 8,192 17.18 3.3 925.22 1876.66 16,384 19.9 3.38 1552.98 2939.08 32,768 18.78 3.14 2769.72 5118.76 65,536 20.88 3.16 3272.24 4788.3 131,072 23.32 3.98 5376.06 7703.72

Table 9-6. Ratio of double recursion invocations to single (continued)

n

d=2 #Recursions

d=2 #Double recursion

d=10 #Recursion

d=10 #Double recursion

Algorithms in a Nutshell
Algorithms in a Nutshell By Gary Pollice, George T. Heineman, Stanley Selkow ISBN:
9780596516246 Publisher: O'Reilly Media, Inc.

Prepared for Ming Yi, Safari ID: [email protected]
Licensed by Ming Yi
Print Publication Date: 2008/10/21 User number: 594243
© 2009 Safari Books Online, LLC. This PDF is made available for personal use only during the relevant subscription term, subject to the Safari Terms of Service. Any other use

Algorithms in a Nutshell

Get our desktop app

Company

Features

Documentation

Resources