
Generalized NBody Problems
Computations that involve pairwise similarities or distances, and generalizations thereof, are responsible for
much of the expense encountered in statistical methods and scientific simulations.



Lineartime Algorithms for Pairwise Statistical Problems
Parikshit Ram, Dongryeol Lee, William March, and Alexander G. Gray
Neural Information Processing Systems (NIPS) 2009, appeared 2010
We show that dualtree algorithms run in linear time for problems involving all pairwise distances which are naively quadratic in cost. This includes the first rigorous analysis of general (bichromatic) batch nearestnearest neighbors (aka allnearestneighbors) on cover trees and the first rigorous analysis of the famous Fast Multipole Method and its relatives.
[pdf]
Abstract:
Several key computational bottlenecks in machine learning involve pairwise distance computations, including allnearestneighbors (finding the nearest neighbor(s) for each point, e.g. in manifold learning) and kernel summations (e.g. in
kernel density estimation or kernel machines). We consider the general, bichromatic case for these problems, in addition to the scientific problem of Nbody simulation. In this paper we show for the first time O(N) worst case runtimes for
practical algorithms for these problems based on the cover tree data structure.
@inproceedings{ram2010ltaps,
Author = "Parikshit Ram and Dongryeol Lee and William March and Alexander G. Gray",
Title = "{Lineartime Algorithms for Pairwise Statistical Problems}",
booktitle = "Advances in Neural Information Processing Systems (NIPS) 22 (Dec 2009)",
Year = "2010",
publisher = {MIT Press}
}

See also
Proximity Problems
Proximity problems such as nearestneighbor search, range search, and their variants represent special cases of generalized Nbody problems.
[see webpage here]
NBody Problems with Continuous Functions
Kernel summations represent a special case of generalized Nbody problem containing continuous functions.
[see webpage here]
Data Structures for NBody Problems
The generalized Nbody framework can utilize a large variety of existing hierarchical data structures.
[see webpage here]
In preparation
Formalization of Generalized NBody Problems and Methods
We have formalized a general class of problems which can be efficiently solved using fast NBodylike algorithms, as well as a single
generic NBody algorithm which can be specialized to each such problem. We have also developed a MapReducelike parallel programming model for
such problems based on this formalization.
Efficient Molecular Dynamics with the AxilrodTeller Potential
A fast algorithm for force computation using the 3body AxilrodTeller potential function,
with a demonstration of its use in molecular dynamics.
Fast Monte Carlo and MultiBody Potential Evaluation
An ultrafast approach to evaluating potentials combining Monte Carlo and multipole techniques, which generalizes to
multibody (more than pairwise) potentials.
Fast HartreeFock Calculation
A new approach to computing HartreeFock exchange and Coulomb matrices, based on a fourtree algorithm.


Fast Algorithms and Efficient Statistics: nPoint Correlation Functions
Andrew W. Moore, Andy J. Connolly, Chris Genovese, Alexander G. Gray, Larry Grone, Nick Kanidoris II, Robert C. Nichol, Jeff Schneider, Alex S. Szalay, Istvan Szapudi, and Larry Wasserman
Mining the Sky, 2001
The paper which introduced our fast algorithm for npoint correlation functions to astronomy, and explained our algorithm in more detail.
[pdf]
Abstract:
We present here a new algorithm for the fast computation of NPoint correlation functions in large astronomical data sets. The algorithm is based on kdtrees which are decorated with cached sufficient statistics thus allowing for orders of magnitude speedups over the naive nontreebased implementation of correlation functions. We further discuss the use of controlled approximations within the computation which allows for further acceleration. In summary, our algorithm now makes it possible to compute exact, allpairs, measurements of the 2, 3 and 4Point correlation functions for cosmological data sets like the Sloan Digital Sky Survey (SDSS; York et al. 2000) and the next generation of Cosmic Microwave Background experiments (see Szapudi et al. 2000).
@inproceedings{moore2000npt,
author = "A. Moore and A. Connolly and C. Genovese and Alexander G. Gray and
L. Grone and N. Kanidoris and R. Nichol and J. Schneider and
A. Szalay and I. Szapudi and L. Wasserman",
Title = "{Fast Algorithms and Efficient Statistics:
$n$point Correlation Functions}",
Booktitle = "{Proceedings of MPA/MPE/ESO Conference Mining the Sky}",
Year = "2000"
}


'NBody' Problems in Statistical Learning
Alexander G. Gray and Andrew W. Moore
Neural Information Processing Systems (NIPS) 2000, appeared 2001
We identify a type of common computational problem, which we call (informally) generalized Nbody problems, and a generic
algorithmic approach based on using two or more spacepartitioning trees, which can be encapsulated as dualtree or multitree methods. This introduced the ideas of Nbody methods of computational physics to the machine learning community. Results include the first efficient algorithm for exact allnearestneighbors (beyond repeated runs of a singlequery algorithm) and the first efficient algorithm for exact npoint correlation
functions, which was in turn the first efficient algorithm for any problem involving more than pairwise comparisons.
[pdf]
Abstract:
We present efficient algorithms for allpointpairs problems, or 'Nbody'like problems, which are ubiquitous in statistical learning. We focus on six examples, including nearestneighbor classification, kernel density estimation, outlier detection, and the twopoint correlation. These include any problem which abstractly requires a comparison of each of the N points in a dataset with each other point and would naively be solved using N^2 distance computations. In practice N is often large enough to make this infeasible. We present a suite of new geometric techniques which are applicable in principle to any 'Nbody' computation including largescale mixtures of Gaussians, RBF neural networks, and HMM's. Our algorithms exhibit favorable asymptotic scaling and are empirically several orders of magnitude faster than the naive computation, even for small datasets. We are aware of no exact algorithms for these problems which are more efficient either empirically or theoretically. In addition, our framework yields simple and elegant algorithms. It also permits two important generalizations beyond the standard allpointpairs problems, which are more difficult. These are represented by our final examples, the multiple twopoint correlation and the notorious npoint correlation.
@Inproceedings{gray2001nbody,
author = "Alexander G. Gray and Andrew W. Moore",
Title = "{$N$Body Problems in Statistical Learning}",
booktitle = "Advances in Neural Information Processing Systems (NIPS) 13 (Dec 2000)",
Year = "2001",
editor = {Leen, Todd K. and Dietterich, Thomas G. and Tresp, Volker},
publisher = {MIT Press}
}

