
NearestNeighbor and Computational Geometry
Many of our fast algorithms rely on data structures, both existing ones and new ones we have introduced.
Nearest neighbor search and other fundamental proximity problems based on metrics are ubiquitous across science and
engineering, and also serve as illustrative computational prototypes for solving more complex problems.



RankApproximate Nearest Neighbor Search:
Retaining Meaning and Speed in High Dimensions
Parikshit Ram, Dongryeol Lee, Hua Ouyang, and Alexander Gray
Neural Information Processing Systems (NIPS) 2009, appeared 2010
A new formulation of approximate nearestneighbor search which allows for the first time direct control of the error in terms of ranks, which retain meaning while the traditionallyused numerical distances are known to become meaningless as dimension grows. Empirical results
show favorable accuracy for the same runtime compared to existing methods such as LSH.
[pdf]
Abstract:
The longstanding problem of efficient nearestneighbor (NN) search has ubiquitous applications ranging from astrophysics to MP3 fingerprinting to bioinformatics to movie recommendations. As the dimensionality of the dataset increases, exact NN search becomes computationally prohibitive; (1 + epsilon) distanceapproximate NN search can provide large speedups but risks losing the meaning of NN search
present in the ranks (ordering) of the distances. This paper presents a simple,
practical algorithm allowing the user to, for the first time, directly control the
true accuracy of NN search (in terms of ranks) while still achieving the large
speedups over exact NN. Experiments with highdimensional datasets show that
it often achieves faster and more accurate results than the bestknown distanceapproximate method, with much more stable behavior.
@inproceedings{ram2010ranknn,
author = "Parikshit Ram and Dongryeol Lee and Hua Ouyang and Alexander G. Gray",
title = "{RankApproximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions}",
booktitle = "Advances in Neural Information Processing Systems (NIPS) 22 (Dec 2009)",
year = "2010",
publisher = {MIT Press}
}

See also
AllNearestNeighbors
The allnearestneighbors problem (as well as the standard singlequery nearestneighbor problem) is a special case of generalized Nbody problem.
[see webpage here]
Euclidean Minimum Spanning Tree
We demonstrate the most overall efficient algorithm for the classic EMST problem,
in the context of hierarchical clustering. [see webpage here]
Cover Trees
We showed new analysis methods for cover trees, in the context of bichromatic, dualtree algorithms. [see full entry here]
Cosine Trees
We introduced cosine trees, an analog of ball trees for dot products rather than distances, in the context of accelerating linear algebraic
computations. [see full entry here]
Subspace Trees
We introduced subspace trees, which create splits along principal component directions, in the context of accelerating highdimensional kernel summations. [see full entry here]
In preparation
Comparison/Survey of Data Structures
We have implemented and compared 45 of the most widelyknown or promising algorithms and data structures for proximity problems (5 variants of nearestneighbor search and 3 variants of range search). We believe this is likely the most comprehensive experimental study of proximity problems that exists.
Adaptive and Parameterized Analysis of Proximity Data Structures
We are pursuing the development of theoretical techniques for more realistic runtime analysis of practical algorithms, toward a theory for the design of optimal data structures for proximity problems.


New Algorithms for Efficient HighDimensional Nonparametric Classification
Ting Liu, Andrew W. Moore, Alexander G. Gray
Journal of Machine Learning Research (JMLR), 2006
Fast exact algorithms for nearestneighbor classification (and related problems), exploiting the fact that solving this does not strictly require finding the nearest neighbors and demonstrating speedups in higher dimensionalities than typical for nearest neighbor search.
[pdf]
Abstract:
This paper is about nonapproximate acceleration of highdimensional nonparametric operations such as k nearest neighbor classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the data points close to the query, but merely need to answer questions about the properties of that set of data points. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. This is applicable to many algorithms in nonparametric statistics, memorybased learning and kernelbased learning. But for clarity, this paper concentrates on pure kNN classification. We introduce new balltree algorithms that on realworld data sets give accelerations from 2fold to 100fold compared against highly optimized traditional balltreebased kNN. These results include data sets with up to 10^6 dimensions and 10^5 records, and demonstrate nontrivial speedups while giving exact answers.
@article{liu2006nnclsf,
Author = "Ting Liu and Andrew W. Moore and Alexander G. Gray",
Title = "{New Algorithms for Efficient High Dimensional Nonparametric
Classification}",
journal = "Journal of Machine Learning Research (JMLR)",
volume = "7",
pages = "11351158",
year = "2006"
}


An Investigation of Practical Approximate Nearest Neighbor Algorithms
Ting Liu, Andrew W. Moore, Alexander Gray, and Ke Yang
Neural Information Processing Systems (NIPS) 2004, appeared 2005
A way to do approximate nearestneighbor search using a data structure called a spill tree, which yields demonstrates significant speedup over other methods such as LSH.
[pdf]
Abstract:
This paper concerns approximate nearest neighbor searching algorithms, which have become increasingly important, especially in high dimensional perception areas such as computer vision, with dozens of publications in recent years. Much of this enthusiasm is due to a successful new approximate nearest neighbor approach called Locality Sensitive Hashing (LSH). In this paper we ask the question: can earlier spatial data structure approaches to exact nearest neighbor, such as metric trees, be altered to provide approximate answers to proximity queries and if so, how? We introduce a new kind of metric tree that allows overlap: certain datapoints may appear in both the children of a parent. We also introduce new approximate kNN search algorithms on this structure. We show why these structures should be able to exploit the same random projectionbased approximations that LSH enjoys, but with a simpler algorithm and perhaps with greater efficiency. We then provide a detailed empirical evaluation on five large, high dimensional datasets which show up to 31fold accelerations over LSH. This result holds true throughout the spectrum of approximation levels.
@inproceedings{liu2005approxnn,
Author = "Ting Liu and Andrew W. Moore and Alexander G. Gray and Ke Yang",
Title = "{An Investigation of Practical Approximate Nearest Neighbor
Algorithms}",
booktitle = "Advances in Neural Information Processing Systems (NIPS) 17 (Dec 2004)",
Year = "2005",
publisher = {MIT Press}
}

