curse
Curses of Dimensionality and Geometry
There are both statistical and computational curses of dimensionality.
FASTlab logo GT

FASTlab Home Papers/Code Team
curse Submanifold Density Estimation
Arkadas Ozakin and Alexander Gray
Neural Information Processing Systems (NIPS) 2009, appeared 2010


Worst-case theoretical results have led to the common wisdom that kernel density estimation is ineffective in more than a few dimensions. We show that for realistic high-dimensional data, which due to correlations between variables can be modeled as lying on a much lower-dimensional submanifold embedded in the original space, a corrected kernel density estimate is in fact subject to the manifold's dimension. Thus, we show that kernel estimation can be much more effective in high-dimensional settings than previously thought. [pdf]

Abstract: Kernel density estimation is the most widely-used practical method for accurate nonparametric density estimation. However, long-standing worst-case theoretical results showing that its performance worsens exponentially with the dimension of the data have quashed its application to modern high-dimensional datasets for decades. In practice, it has been recognized that often such data have a much lower-dimensional intrinsic structure. We propose a small modification to kernel density estimation for estimating probability density functions on Riemannian submanifolds of Euclidean space. Using ideas from Riemannian geometry, we prove the consistency of this modified estimator and show that the convergence rate is determined by the intrinsic dimension of the submanifold. We conclude with empirical results demonstrating the behavior predicted by our theory.

@inproceedings{ozakin2010subkde, Author = "Arkadas Ozakin and Alexander G. Gray", Title = "{Submanifold Density Estimation}", booktitle = "Advances in Neural Information Processing Systems (NIPS) 22 (Dec 2009)", Year = "2010", publisher = {MIT Press} }
See also


Curses of Dimensionality in Computational Geometry
The expansion constant is a notion of the intrinsic dimension of a dataset, which we have extended to create related data properties for the analysis of all-nearest-neighbors and hierarchical clustering algorithms. [see webpage here]


In preparation


Adaptive and Parameterized Analysis of Proximity Data Structures
We are pursuing the development of theoretical techniques for more realistic runtime analysis of practical algorithms, partly based on more detailed notions of intrinsic dimensionality.


Volumetric Manifold Learning
We have developed a new approach to manifold learning based on ideas from Riemannian geometry.