Nearest-Neighbor and Computational Geometry
Many of our fast algorithms rely on data structures, both existing ones and new ones we have introduced. Nearest neighbor search and other fundamental proximity problems based on metrics are ubiquitous across science and engineering, and also serve as illustrative computational prototypes for solving more complex problems.
FASTlab logo GT

FASTlab Home Papers/Code Team
nearest neighbor Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions
Parikshit Ram, Dongryeol Lee, Hua Ouyang, and Alexander Gray
Neural Information Processing Systems (NIPS) 2009, appeared 2010

A new formulation of approximate nearest-neighbor search which allows for the first time direct control of the error in terms of ranks, which retain meaning while the traditionally-used numerical distances are known to become meaningless as dimension grows. Empirical results show favorable accuracy for the same runtime compared to existing methods such as LSH. [pdf]

Abstract: The long-standing problem of efficient nearest-neighbor (NN) search has ubiquitous applications ranging from astrophysics to MP3 fingerprinting to bioinformatics to movie recommendations. As the dimensionality of the dataset increases, exact NN search becomes computationally prohibitive; (1 + epsilon) distance-approximate NN search can provide large speedups but risks losing the meaning of NN search present in the ranks (ordering) of the distances. This paper presents a simple, practical algorithm allowing the user to, for the first time, directly control the true accuracy of NN search (in terms of ranks) while still achieving the large speedups over exact NN. Experiments with high-dimensional datasets show that it often achieves faster and more accurate results than the best-known distance-approximate method, with much more stable behavior.

@inproceedings{ram2010ranknn, author = "Parikshit Ram and Dongryeol Lee and Hua Ouyang and Alexander G. Gray", title = "{Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions}", booktitle = "Advances in Neural Information Processing Systems (NIPS) 22 (Dec 2009)", year = "2010", publisher = {MIT Press} }
See also

The all-nearest-neighbors problem (as well as the standard single-query nearest-neighbor problem) is a special case of generalized N-body problem. [see webpage here]

Euclidean Minimum Spanning Tree
We demonstrate the most overall efficient algorithm for the classic EMST problem, in the context of hierarchical clustering. [see webpage here]

Cover Trees
We showed new analysis methods for cover trees, in the context of bichromatic, dual-tree algorithms. [see full entry here]

Cosine Trees
We introduced cosine trees, an analog of ball trees for dot products rather than distances, in the context of accelerating linear algebraic computations. [see full entry here]

Subspace Trees
We introduced subspace trees, which create splits along principal component directions, in the context of accelerating high-dimensional kernel summations. [see full entry here]

In preparation

Comparison/Survey of Data Structures
We have implemented and compared 45 of the most widely-known or promising algorithms and data structures for proximity problems (5 variants of nearest-neighbor search and 3 variants of range search). We believe this is likely the most comprehensive experimental study of proximity problems that exists.

Adaptive and Parameterized Analysis of Proximity Data Structures
We are pursuing the development of theoretical techniques for more realistic runtime analysis of practical algorithms, toward a theory for the design of optimal data structures for proximity problems.
nearest neighbor classification New Algorithms for Efficient High-Dimensional Nonparametric Classification
Ting Liu, Andrew W. Moore, Alexander G. Gray
Journal of Machine Learning Research (JMLR), 2006

Fast exact algorithms for nearest-neighbor classification (and related problems), exploiting the fact that solving this does not strictly require finding the nearest neighbors and demonstrating speedups in higher dimensionalities than typical for nearest neighbor search. [pdf]

Abstract: This paper is about non-approximate acceleration of high-dimensional nonparametric operations such as k nearest neighbor classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the data points close to the query, but merely need to answer questions about the properties of that set of data points. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. This is applicable to many algorithms in nonparametric statistics, memory-based learning and kernel-based learning. But for clarity, this paper concentrates on pure k-NN classification. We introduce new ball-tree algorithms that on real-world data sets give accelerations from 2-fold to 100-fold compared against highly optimized traditional ball-tree-based k-NN. These results include data sets with up to 10^6 dimensions and 10^5 records, and demonstrate non-trivial speed-ups while giving exact answers.

@article{liu2006nnclsf, Author = "Ting Liu and Andrew W. Moore and Alexander G. Gray", Title = "{New Algorithms for Efficient High Dimensional Nonparametric Classification}", journal = "Journal of Machine Learning Research (JMLR)", volume = "7", pages = "1135--1158", year = "2006" }
An Investigation of Practical Approximate Nearest Neighbor Algorithms
Ting Liu, Andrew W. Moore, Alexander Gray, and Ke Yang
Neural Information Processing Systems (NIPS) 2004, appeared 2005

A way to do approximate nearest-neighbor search using a data structure called a spill tree, which yields demonstrates significant speedup over other methods such as LSH. [pdf]

Abstract: This paper concerns approximate nearest neighbor searching algorithms, which have become increasingly important, especially in high dimensional perception areas such as computer vision, with dozens of publications in recent years. Much of this enthusiasm is due to a successful new approximate nearest neighbor approach called Locality Sensitive Hashing (LSH). In this paper we ask the question: can earlier spatial data structure approaches to exact nearest neighbor, such as metric trees, be altered to provide approximate answers to proximity queries and if so, how? We introduce a new kind of metric tree that allows overlap: certain datapoints may appear in both the children of a parent. We also introduce new approximate k-NN search algorithms on this structure. We show why these structures should be able to exploit the same random projection-based approximations that LSH enjoys, but with a simpler algorithm and perhaps with greater efficiency. We then provide a detailed empirical evaluation on five large, high dimensional datasets which show up to 31-fold accelerations over LSH. This result holds true throughout the spectrum of approximation levels.

@inproceedings{liu2005approxnn, Author = "Ting Liu and Andrew W. Moore and Alexander G. Gray and Ke Yang", Title = "{An Investigation of Practical Approximate Nearest Neighbor Algorithms}", booktitle = "Advances in Neural Information Processing Systems (NIPS) 17 (Dec 2004)", Year = "2005", publisher = {MIT Press} }