density function
Kernel Summations and Multipole Methods
Summations over a large number of kernel functions (such as Gaussians) appear throughout statistics and scientific computing.
FASTlab logo GT

FASTlab Home Papers/Code Team
Fast High-dimensional Kernel Summations Using the Monte Carlo Multipole Method
Dongryeol Lee and Alexander Gray
Neural Information Processing Systems (NIPS) 2008, appeared 2009

A multipole-like method that can work effectively in high dimensions, by incorporating Monte Carlo ideas and a new kind of data structure, subspace trees. [pdf]

Abstract: We propose a new fast Gaussian summation algorithm for high-dimensional datasets with high accuracy. First, we extend the original fast multipole-type methods to use approximation schemes with both hard and probabilistic error. Second, we utilize a new data structure called subspace tree which maps each data point in the node to its lower dimensional mapping as determined by any linear dimension reduction method such as PCA. This new data structure is suitable for reducing the cost of each pairwise distance computation, the most dominant cost in many kernel methods. Our algorithm guarantees probabilistic relative error on each kernel sum, and can be applied to high-dimensional Gaussian summations which are ubiquitous inside many kernel methods as the key computational bottleneck. We provide empirical speedup results on low to high-dimensional datasets up to 89 dimensions.

@Inproceedings{lee2009mcmm, author = "Dongryeol Lee and Alexander Gray", Title = "{Fast High-dimensional Kernel Summations Using the Monte Carlo Multipole Method}", booktitle = "Advances in Neural Information Processing Systems (NIPS) 21 (Dec 2008)", Year = "2009", publisher = {MIT Press} }
See also

Analysis and Parallelization
Kernel summations in both statistics and computational physics are special cases of generalized N-body problems, for which general analysis and parallelization approaches are being developed. [see webpage here]

Data Structures for Kernel Summations
A large variety of hierarchical data structures can be used for kernel summation problems. [see webpage here]

Uses in Linear Algebra
Certain kinds of matrix-vector multiplications are really kernel summations. We pointed this out for the statistical community in a technical report. [see full entry here]
mean shift trajectories Fast Mean Shift with Accurate and Stable Convergence
Ping Wang, Dongryeol Lee, Alexander Gray, and James Rehg
Artificial Intelligence and Statistics (AISTATS) 2007

A fast and stable algorithm for mean shift clustering and tracking. [pdf]

Abstract: Mean shift is a powerful but computationally expensive method for nonparametric clustering and optimization. It iteratively moves each data point to its local mean until convergence. We introduce a fast algorithm for computing mean shift based on the dual-tree. Unlike previous speed-up attempts, our algorithm maintains a relative error bound at each iteration, resulting in significantly more stable and accurate convergence. We demonstrate the benefit of our method in clustering experiments with real and synthetic data.

@Inproceedings{wang2007ms, Author = "Ping Wang and Dongryeol Lee and Alexander G. Gray and James Rehg", Title = "{Fast Mean Shift with Accurate and Stable Convergence}", Booktitle = "Proceedings of the Eleventh Workshop on Artificial Intelligence and Statistics", Year = "2007" }
summation Ultrafast Monte Carlo for Kernel Estimators and Generalized Statistical Summations
Michael Holmes, Alexander Gray, and Charles Isbell
Neural Information Processing Systems (NIPS) 2007, appeared 2008

A ultra-fast Monte Carlo based approach for a general class of nested forms that includes bandwidth learning for all kernel estimators. [pdf]

Abstract: Machine learning contains many computational bottlenecks in the form of nested summations over datasets. Kernel estimators and other methods are burdened by these expensive computations. Exact evaluation is typically O(n^2) or higher, which severely limits application to large datasets. We present a multi-stage stratified Monte Carlo method for approximating such summations with probabilistic relative error control. The essential idea is fast approximation by sampling in trees. This method differs from many previous scalability techniques (such as standard multi-tree methods) in that its error is stochastic, but we derive conditions for error control and demonstrate that they work. Further, we give a theoretical sam- ple complexity for the method that is independent of dataset size, and show that this appears to hold in experiments, where speedups reach as high as 10^14 , many orders of magnitude beyond the previous state of the art.

@Inproceedings{holmes2008mckde, author = "Michael Holmes and Alexander Gray and Charles Isbell", Title = "{Ultrafast Monte Carlo for Kernel Estimators and Generalized Statistical Summations}", booktitle = "Advances in Neural Information Processing Systems (NIPS) 20 (Dec 2007)", Year = "2008", publisher = {MIT Press} }
Gaussian Faster Gaussian Summation: Theory and Experiment
Dongryeol Lee and Alexander Gray
Uncertainty in Artificial Intelligence 2006

An improved multipole-like method that is the fastest algorithm for Gaussian summation problems in moderate dimensionality. [pdf]

Abstract: We provide faster algorithms for the problem of Gaussian summation, which occurs in many machine learning methods. We develop two new extensions - an O(D^p ) Taylor expansion for the Gaussian kernel with rigorous error bounds and a new error control scheme integrating any arbitrary approximation method - within the best discrete-algorithmic framework using adaptive hierarchical data structures. We rigorously evaluate these techniques empirically in the context of optimal bandwidth selection in kernel density estimation, revealing the strengths and weaknesses of current state-of-the-art approaches for the first time. Our results demonstrate that the new error control scheme yields improved performance, whereas the series expansion approach is only effective in low dimensions (five or less).

@inproceedings{lee2006fgt2, author = "Dongryeol Lee and Alexander Gray", title = "{Faster Gaussian Summation: Theory and Experiment}", booktitle = "Proceedings of the Twenty-second Conference on Uncertainty in Artificial Intelligence", year={2006} }
Hermite polynomials Dual-tree Fast Gauss Transforms
Dongryeol Lee, Alexander Gray, and Andrew Moore
Neural Information Processing Systems (NIPS) 2005, appeared 2006

The first fast Gauss transforms using hierarchical data structures, i.e. the closest analog to the breakthrough Fast Multipole Method (FMM) for statistical problems, as well as the fastest method in low dimensions, at the time of introduction. This paper shows for the first time the three translation operators needed for a true FMM-like method using Hermite expansions, along with the three error bounds needed. [pdf]

Abstract: In previous work we presented an efficient approach to computing kernel summations which arise in many machine learning methods such as kernel density estimation. This approach, dual-tree recursion with finite-difference approximation, generalized existing methods for similar problems arising in computational physics in two ways appropriate for statistical problems: toward distribution sensitivity and general dimension, partly by avoiding series expansions. While this proved to be the fastest practical method for multivariate kernel density estimation at the optimal bandwidth, it is much less efficient at larger-than-optimal bandwidths. In this work, we explore the extent to which the dual-tree approach can be integrated with multipole-like Hermite expansions in order to achieve reasonable efficiency across all bandwidth scales, though only for low dimensionalities. In the process, we derive and demonstrate the first truly hierarchical fast Gauss transforms, effectively combining the best tools from discrete algorithms and continuous approximation theory.

@inproceedings{lee2006fgt1, author = "Dongryeol Lee and Alexander Gray and Andrew W. Moore", title = "{Dual-Tree Fast Gauss Transforms}", booktitle = "Advances in Neural Information Processing Systems 18 (Dec 2005)", editor = {Y. Weiss and B. Scholkopf and J. Platt}, publisher = {MIT Press}, year = {2006} }
multiple densities Rapid Evaluation of Multiple Density Models
Alexander Gray and Andrew Moore
Artificial Intelligence and Statistics 2003

A way to do fast kernel density estimation for multiple bandwidths simultaneously, which is needed in the learning phase, which selects the optimum bandwidth using cross-validation. Many schemes purporting to be useful for kernel density estimation ignore realistic bandwidth settings and learning procedures. [pdf]

Abstract: When highly-accurate and/or assumption-free density estimation is needed, nonparametric methods are often called upon - most notably the popular kernel density estimation (KDE) method. However, the practitioner is instantly faced with the formidable computational cost of KDE for appreciable dataset sizes, which becomes even more prohibitive when many models with different kernel scales (bandwidths) must be evaluated - this is necessary for finding the optimal model, among other reasons. In previous work we presented an algorithm for fast KDE which addresses large dataset sizes and large dimensionalities, but assumes only a single bandwidth. In this paper we present a generalization of that algorithm allowing multiple models with different bandwidths to be computed simultaneously, in substantially less time than either running the single-bandwidth algorithm for each model independently, or running the standard exhaustive method. We show examples of computing the likelihood curve for 100,000 data and 100 models ranging across 3 orders of magnitude in scale, in minutes or seconds.

@Inproceedings{gray2003kde3, Author = "Alexander Gray and Andrew W. Moore", Title = "{Rapid Evaluation of Multiple Density Models}", Booktitle = "Proceedings of the Ninth Conference on Artificial Intelligence and Statistics (AISTATS)", Year = "2003" }
nonparametric density estimate Nonparametric Density Estimation: Toward Computational Tractability
Alexander Gray and Andrew Moore
SIAM Data Mining 2003

The first efficient algorithm for kernel summation with automatic error control, i.e. no arbitrary tuning parameters to be chosen by the user. This paper explored a tree traversal scheme based on priority queues, which is general and interesting, but in our later work we settled on more memory-efficient schemes. [pdf]

Abstract: Density estimation is a core operation of virtually all probabilistic learning methods (as opposed to discriminative methods). Approaches to density estimation can be divided into two principal classes, parametric methods, such as Bayesian networks, and non-parametric methods such as kernel density estimation and smoothing splines. While neither choice should be universally preferred for all situations, a well-known benefit of nonparametric methods is their ability to achieve estimation optimality for ANY input distribution as more data are observed, a property that no model with a parametric assumption can have, and one of great importance in exploratory data analysis and mining where the underlying distribution is decidedly unknown. To date, however, despite a wealth of advanced underlying statistical theory, the use of nonparametric methods has been limited by their computational intractibility for all but the smallest datasets. In this paper, we present an algorithm for kernel density estimation, the chief nonparametric approach, which is dramatically faster than previous algorithmic approaches in terms of both dataset size and dimensionality. Furthermore, the algorithm provides arbitrarily tight accuracy guarantees, provides anytime convergence, works for all common kernel choices, and requires no parameter tuning. The algorithm is an instance of a new principle of algorithm design: multi-recursion, or higher-order divide-and-conquer.

@Inproceedings{gray2003kde1, Author = "Alexander Gray and Andrew W. Moore", Title = "{Nonparametric Density Estimation: Toward Computational Tractability}", Booktitle = "SIAM International Conference on Data Mining (SDM)", Year = "2003" }