Generalized N-Body Problems
Computations that involve pairwise similarities or distances, and generalizations thereof, are responsible for much of the expense encountered in statistical methods and scientific simulations.
FASTlab logo GT

FASTlab Home Papers/Code Team
Linear-time Algorithms for Pairwise Statistical Problems
Parikshit Ram, Dongryeol Lee, William March, and Alexander G. Gray
Neural Information Processing Systems (NIPS) 2009, appeared 2010

We show that dual-tree algorithms run in linear time for problems involving all pairwise distances which are naively quadratic in cost. This includes the first rigorous analysis of general (bichromatic) batch nearest-nearest neighbors (aka all-nearest-neighbors) on cover trees and the first rigorous analysis of the famous Fast Multipole Method and its relatives. [pdf]

Abstract: Several key computational bottlenecks in machine learning involve pairwise distance computations, including all-nearest-neighbors (finding the nearest neighbor(s) for each point, e.g. in manifold learning) and kernel summations (e.g. in kernel density estimation or kernel machines). We consider the general, bichromatic case for these problems, in addition to the scientific problem of N-body simulation. In this paper we show for the first time O(N) worst case runtimes for practical algorithms for these problems based on the cover tree data structure.

@inproceedings{ram2010ltaps, Author = "Parikshit Ram and Dongryeol Lee and William March and Alexander G. Gray", Title = "{Linear-time Algorithms for Pairwise Statistical Problems}", booktitle = "Advances in Neural Information Processing Systems (NIPS) 22 (Dec 2009)", Year = "2010", publisher = {MIT Press} }
See also

Proximity Problems
Proximity problems such as nearest-neighbor search, range search, and their variants represent special cases of generalized N-body problems. [see webpage here]

N-Body Problems with Continuous Functions
Kernel summations represent a special case of generalized N-body problem containing continuous functions. [see webpage here]

Data Structures for N-Body Problems
The generalized N-body framework can utilize a large variety of existing hierarchical data structures. [see webpage here]

In preparation

Formalization of Generalized N-Body Problems and Methods
We have formalized a general class of problems which can be efficiently solved using fast N-Body-like algorithms, as well as a single generic N-Body algorithm which can be specialized to each such problem. We have also developed a MapReduce-like parallel programming model for such problems based on this formalization.

Efficient Molecular Dynamics with the Axilrod-Teller Potential
A fast algorithm for force computation using the 3-body Axilrod-Teller potential function, with a demonstration of its use in molecular dynamics.

Fast Monte Carlo and Multi-Body Potential Evaluation
An ultra-fast approach to evaluating potentials combining Monte Carlo and multipole techniques, which generalizes to multi-body (more than pairwise) potentials.

Fast Hartree-Fock Calculation
A new approach to computing Hartree-Fock exchange and Coulomb matrices, based on a four-tree algorithm.
Fast Algorithms and Efficient Statistics: n-Point Correlation Functions
Andrew W. Moore, Andy J. Connolly, Chris Genovese, Alexander G. Gray, Larry Grone, Nick Kanidoris II, Robert C. Nichol, Jeff Schneider, Alex S. Szalay, Istvan Szapudi, and Larry Wasserman
Mining the Sky, 2001

The paper which introduced our fast algorithm for n-point correlation functions to astronomy, and explained our algorithm in more detail. [pdf]

Abstract: We present here a new algorithm for the fast computation of N-Point correlation functions in large astronomical data sets. The algorithm is based on kdtrees which are decorated with cached sufficient statistics thus allowing for orders of magnitude speed-ups over the naive non-tree-based implementation of correlation functions. We further discuss the use of controlled approximations within the computation which allows for further acceleration. In summary, our algorithm now makes it possible to compute exact, all-pairs, measurements of the 2, 3 and 4-Point correlation functions for cosmological data sets like the Sloan Digital Sky Survey (SDSS; York et al. 2000) and the next generation of Cosmic Microwave Background experiments (see Szapudi et al. 2000).

@inproceedings{moore2000npt, author = "A. Moore and A. Connolly and C. Genovese and Alexander G. Gray and L. Grone and N. Kanidoris and R. Nichol and J. Schneider and A. Szalay and I. Szapudi and L. Wasserman", Title = "{Fast Algorithms and Efficient Statistics: $n$-point Correlation Functions}", Booktitle = "{Proceedings of MPA/MPE/ESO Conference Mining the Sky}", Year = "2000" }
dual tree 'N-Body' Problems in Statistical Learning
Alexander G. Gray and Andrew W. Moore
Neural Information Processing Systems (NIPS) 2000, appeared 2001

We identify a type of common computational problem, which we call (informally) generalized N-body problems, and a generic algorithmic approach based on using two or more space-partitioning trees, which can be encapsulated as dual-tree or multi-tree methods. This introduced the ideas of N-body methods of computational physics to the machine learning community. Results include the first efficient algorithm for exact all-nearest-neighbors (beyond repeated runs of a single-query algorithm) and the first efficient algorithm for exact n-point correlation functions, which was in turn the first efficient algorithm for any problem involving more than pairwise comparisons. [pdf]

Abstract: We present efficient algorithms for all-point-pairs problems, or 'N-body'-like problems, which are ubiquitous in statistical learning. We focus on six examples, including nearest-neighbor classification, kernel density estimation, outlier detection, and the two-point correlation. These include any problem which abstractly requires a comparison of each of the N points in a dataset with each other point and would naively be solved using N^2 distance computations. In practice N is often large enough to make this infeasible. We present a suite of new geometric techniques which are applicable in principle to any 'N-body' computation including large-scale mixtures of Gaussians, RBF neural networks, and HMM's. Our algorithms exhibit favorable asymptotic scaling and are empirically several orders of magnitude faster than the naive computation, even for small datasets. We are aware of no exact algorithms for these problems which are more efficient either empirically or theoretically. In addition, our framework yields simple and elegant algorithms. It also permits two important generalizations beyond the standard all-point-pairs problems, which are more difficult. These are represented by our final examples, the multiple two-point correlation and the notorious n-point correlation.

@Inproceedings{gray2001nbody, author = "Alexander G. Gray and Andrew W. Moore", Title = "{$N$-Body Problems in Statistical Learning}", booktitle = "Advances in Neural Information Processing Systems (NIPS) 13 (Dec 2000)", Year = "2001", editor = {Leen, Todd K. and Dietterich, Thomas G. and Tresp, Volker}, publisher = {MIT Press} }