
Kernel Summations and Multipole Methods
Summations over a large number of kernel functions (such as Gaussians) appear throughout statistics and scientific computing.



Fast Highdimensional Kernel Summations Using the
Monte Carlo Multipole Method
Dongryeol Lee and Alexander Gray
Neural Information Processing Systems (NIPS) 2008, appeared 2009
A multipolelike method that can work effectively in high dimensions, by incorporating Monte Carlo ideas and a new kind of data structure, subspace trees.
[pdf]
Abstract:
We propose a new fast Gaussian summation algorithm for highdimensional
datasets with high accuracy. First, we extend the original fast multipoletype
methods to use approximation schemes with both hard and probabilistic error. Second,
we utilize a new data structure called subspace tree which maps each data point in
the node to its lower dimensional mapping as determined by any linear dimension
reduction method such as PCA. This new data structure is suitable for reducing
the cost of each pairwise distance computation, the most dominant cost in many
kernel methods. Our algorithm guarantees probabilistic relative error on each
kernel sum, and can be applied to highdimensional Gaussian summations which are
ubiquitous inside many kernel methods as the key computational bottleneck. We
provide empirical speedup results on low to highdimensional datasets up to 89
dimensions.
@Inproceedings{lee2009mcmm,
author = "Dongryeol Lee and Alexander Gray",
Title = "{Fast Highdimensional Kernel Summations Using the Monte Carlo
Multipole Method}",
booktitle = "Advances in Neural Information Processing Systems (NIPS) 21 (Dec 2008)",
Year = "2009",
publisher = {MIT Press}
}

See also
Analysis and Parallelization
Kernel summations in both statistics and computational physics are special cases of generalized Nbody problems, for which
general analysis and parallelization approaches are being developed. [see webpage here]
Data Structures for Kernel Summations
A large variety of hierarchical data structures can be used for kernel summation problems. [see webpage here]
Uses in Linear Algebra
Certain kinds of matrixvector multiplications are really kernel summations. We pointed this out for the statistical community in
a technical report. [see full entry here]


Fast Mean Shift with Accurate and Stable Convergence
Ping Wang, Dongryeol Lee, Alexander Gray, and James Rehg
Artificial Intelligence and Statistics (AISTATS) 2007
A fast and stable algorithm for mean shift clustering and tracking.
[pdf]
Abstract:
Mean shift is a powerful but computationally
expensive method for nonparametric clustering
and optimization. It iteratively moves
each data point to its local mean until convergence.
We introduce a fast algorithm for
computing mean shift based on the dualtree.
Unlike previous speedup attempts, our algorithm
maintains a relative error bound at
each iteration, resulting in significantly more
stable and accurate convergence. We demonstrate
the benefit of our method in clustering
experiments with real and synthetic data.
@Inproceedings{wang2007ms,
Author = "Ping Wang and Dongryeol Lee and Alexander G. Gray and James Rehg",
Title = "{Fast Mean Shift with Accurate and Stable Convergence}",
Booktitle = "Proceedings of the Eleventh Workshop on Artificial Intelligence and Statistics",
Year = "2007"
}


Ultrafast Monte Carlo for Kernel Estimators and Generalized Statistical Summations
Michael Holmes, Alexander Gray, and Charles Isbell
Neural Information Processing Systems (NIPS) 2007, appeared 2008
A ultrafast Monte Carlo based approach for a general class of nested forms that includes bandwidth learning for all kernel estimators.
[pdf]
Abstract:
Machine learning contains many computational bottlenecks in the form of nested
summations over datasets. Kernel estimators and other methods are burdened
by these expensive computations. Exact evaluation is typically O(n^2) or higher,
which severely limits application to large datasets. We present a multistage
stratified Monte Carlo method for approximating such summations with probabilistic
relative error control. The essential idea is fast approximation by sampling in trees.
This method differs from many previous scalability techniques (such as standard
multitree methods) in that its error is stochastic, but we derive conditions for
error control and demonstrate that they work. Further, we give a theoretical sam
ple complexity for the method that is independent of dataset size, and show that
this appears to hold in experiments, where speedups reach as high as 10^14 , many
orders of magnitude beyond the previous state of the art.
@Inproceedings{holmes2008mckde,
author = "Michael Holmes and Alexander Gray and Charles Isbell",
Title = "{Ultrafast Monte Carlo for Kernel Estimators
and Generalized Statistical Summations}",
booktitle = "Advances in Neural Information Processing Systems (NIPS) 20 (Dec 2007)",
Year = "2008",
publisher = {MIT Press}
}


Faster Gaussian Summation: Theory and Experiment
Dongryeol Lee and Alexander Gray
Uncertainty in Artificial Intelligence 2006
An improved multipolelike method that is the fastest algorithm for Gaussian summation problems in moderate dimensionality.
[pdf]
Abstract:
We provide faster algorithms for the problem of Gaussian summation, which occurs
in many machine learning methods. We develop two new extensions  an O(D^p )
Taylor expansion for the Gaussian kernel with rigorous error bounds and a new error
control scheme integrating any arbitrary approximation method  within the best
discretealgorithmic framework using adaptive hierarchical data structures. We
rigorously evaluate these techniques empirically in the context of optimal bandwidth
selection in kernel density estimation, revealing the strengths and weaknesses of
current stateoftheart approaches for the first time. Our results demonstrate that
the new error control scheme yields improved performance, whereas the series expansion
approach is only effective in low dimensions (five or less).
@inproceedings{lee2006fgt2,
author = "Dongryeol Lee and Alexander Gray",
title = "{Faster Gaussian Summation: Theory and Experiment}",
booktitle = "Proceedings of the Twentysecond Conference on Uncertainty in Artificial Intelligence",
year={2006}
}


Dualtree Fast Gauss Transforms
Dongryeol Lee, Alexander Gray, and Andrew Moore
Neural Information Processing Systems (NIPS) 2005, appeared 2006
The first fast Gauss transforms using hierarchical data structures, i.e. the closest analog
to the breakthrough Fast Multipole Method (FMM) for statistical problems, as well as the fastest method in low dimensions,
at the time of introduction. This paper shows for the first time the three translation operators needed for a true FMMlike method
using Hermite expansions, along with the three error bounds needed.
[pdf]
Abstract: In previous work we presented an efficient approach to computing kernel
summations which arise in many machine learning methods such as
kernel density estimation. This approach, dualtree recursion with finitedifference
approximation, generalized existing methods for similar problems arising in computational
physics in two ways appropriate for statistical problems:
toward distribution sensitivity and general dimension,
partly by avoiding series expansions. While this proved to be the fastest
practical method for multivariate kernel density estimation at the optimal
bandwidth, it is much less efficient at largerthanoptimal bandwidths.
In this work, we explore the extent to which the dualtree approach can
be integrated with multipolelike Hermite expansions in order to achieve
reasonable efficiency across all bandwidth scales, though only for low
dimensionalities. In the process, we derive and demonstrate the first truly
hierarchical fast Gauss transforms, effectively combining the best tools
from discrete algorithms and continuous approximation theory.
@inproceedings{lee2006fgt1,
author = "Dongryeol Lee and Alexander Gray and Andrew W. Moore",
title = "{DualTree Fast Gauss Transforms}",
booktitle = "Advances in Neural Information Processing Systems 18 (Dec 2005)",
editor = {Y. Weiss and B. Scholkopf and J. Platt},
publisher = {MIT Press},
year = {2006}
}


Rapid Evaluation of Multiple Density Models
Alexander Gray and Andrew Moore
Artificial Intelligence and Statistics 2003
A way to do fast kernel density estimation for multiple bandwidths simultaneously, which is needed in the learning phase, which selects the optimum bandwidth using crossvalidation. Many schemes purporting to be useful for kernel density estimation ignore realistic bandwidth
settings and learning procedures.
[pdf]
Abstract:
When highlyaccurate and/or assumptionfree density estimation is needed, nonparametric methods are often called upon  most notably the popular kernel density estimation (KDE) method. However, the practitioner is instantly faced with the formidable computational cost of KDE for appreciable dataset sizes, which becomes even more prohibitive when many models with different kernel scales (bandwidths) must be evaluated  this is necessary for finding the optimal model, among other reasons. In previous work we presented an algorithm for fast KDE which addresses large dataset sizes and large dimensionalities, but assumes only a single bandwidth. In this paper we present a generalization of that algorithm allowing multiple models with different bandwidths to be computed simultaneously, in substantially less time than either running the singlebandwidth algorithm for each model independently, or running the standard exhaustive method. We show examples of computing the likelihood curve for 100,000 data and 100 models ranging across 3 orders of magnitude in scale, in minutes or seconds.
@Inproceedings{gray2003kde3,
Author = "Alexander Gray and Andrew W. Moore",
Title = "{Rapid Evaluation of Multiple Density Models}",
Booktitle = "Proceedings of the Ninth Conference on Artificial Intelligence and Statistics (AISTATS)",
Year = "2003"
}


Nonparametric Density Estimation: Toward Computational Tractability
Alexander Gray and Andrew Moore
SIAM Data Mining 2003
The first efficient algorithm for kernel summation with automatic error control, i.e. no arbitrary tuning parameters to be chosen by the user. This paper explored a tree traversal scheme based on priority queues, which is general and interesting, but in our later work we settled on more memoryefficient schemes.
[pdf]
Abstract:
Density estimation is a core operation of virtually all probabilistic
learning methods (as opposed to discriminative methods).
Approaches to density estimation can be divided into two principal
classes, parametric methods, such as Bayesian networks, and nonparametric
methods such as kernel density estimation and smoothing splines.
While neither choice should be universally preferred
for all situations, a wellknown benefit of nonparametric methods
is their ability to achieve estimation optimality for ANY input
distribution as more data are observed, a property that no model with
a parametric assumption can have, and one of great importance in
exploratory data analysis and mining where the underlying
distribution is decidedly unknown. To date, however, despite a wealth
of advanced underlying statistical theory, the use of nonparametric
methods has been limited by their computational intractibility for
all but the smallest datasets. In this paper, we present an algorithm
for kernel density estimation, the chief nonparametric approach,
which is dramatically faster than previous algorithmic approaches
in terms of both dataset size and dimensionality. Furthermore, the
algorithm provides arbitrarily tight accuracy guarantees, provides
anytime convergence, works for all common kernel choices, and
requires no parameter tuning. The algorithm is an instance of a
new principle of algorithm design: multirecursion, or higherorder
divideandconquer.
@Inproceedings{gray2003kde1,
Author = "Alexander Gray and Andrew W. Moore",
Title = "{Nonparametric Density Estimation: Toward Computational
Tractability}",
Booktitle = "SIAM International Conference on Data Mining (SDM)",
Year = "2003"
}

