the Matrix
Matrix Computations and Optimization
Matrix computations are some of the core sources of computational expense in machine learning.
FASTlab logo GT

FASTlab Home Papers/Code Team
fixed points Fast Stochastic Frank-Wolfe Algorithms for Nonlinear SVMs
Hua Ouyang and Alexander G. Gray
SIAM International Conference on Data Mining (SDM) 2010


While fast online algorithms for linear support vector machines exist, they do not perform as well for nonlinear SVMs. We show the fastest online algorithm for nonlinear SVMs (using squared loss), based on the classic Frank-Wolfe method. [pdf]

Abstract: The high computational cost of nonlinear support vector machines has limited their usability for large-scale problems. We propose two novel stochastic algorithms to tackle this problem. These algorithms are based on a simple and classic optimization method: the Frank-Wolfe method, which is known to be fast for problems with a large number of linear constraints. Formulating the nonlinear SVM problem to take advantage of this method, we achieve a provable time complexity of O(dQ^2 / epsilon^2). The proposed algorithms achieve accuracies that are comparable to or even better than the state-of-the-art methods, and are significantly faster.

@inproceedings{ouyang2010sfw, Author = "Hua Ouyang and Alexander G. Gray", Title = "{Fast Stochastic Frank-Wolfe Algorithms for Nonlinear SVMs}", booktitle = "SIAM International Conference on Data Mining (SDM)", Year = "2010" }
See also


Methods for Kernel Matrix-Vector Multiplications
Kernel matrix-vector multiplications can be done quickly using efficient kernel summation methods. [see webpage here]


Type-theoretic Formalization of Optimization
We have developed a type system for mathematical programs, toward automatic reformulation of optimization problems. [see full entry here]


Scalable Semidefinite Programming
We showed an approach for scalable learning based on semidefinite programming based on convex relaxations, in the context of manifold learning. [see full entry here]


In preparation


Efficient Optimization for Fractional-Norm SVMs
We have developed faster optimization methods for L_q SVMs, where 0 < q < 1, based on recent ideas in optimization theory.


Kernel PCA and Gaussian Process Regression
While our fast PCA/SVD method can be used for kernel PCA and Gaussian process regression, we are developing a methodology which does not have to explicitly form the N x N kernel matrix, circumventing significant computational and storage costs.
Mixed Integer Support Vector Machines
Wei Guan and Alexander Gray
Neural Information Processing Systems (NIPS) 2009 Workshop on Optimization in Machine Learning


An approach to L0 SVMs exploiting recent perspective cuts method of mixed integer programming, toward more aggressive and accurate feature selection. [pdf]

Abstract: In this paper, we propose a formulation of a feature selecting support vector machine based on the L0-norm. We explore two convex relaxations of the optimization problem and solve them using mixed-integer nonlinear programming (MINLP) techniques. Given a training set of labeled data instances, we construct a max-margin classifier that minimizes the hinge loss as well as the cardinality of the weight vector of the separating hyperplane || w ||0 , effectively performing feature selection and classification simultaneously in one optimization. We compare these two relaxations with the standard L2-norm SVM, L1-norm SVM, and recursive feature elimination (RFE) method, and show promising results on real-world datasets in terms of accuracy and sparsity.

@inproceedings{guan2010misvm, Author = "Wei Guan and Alexander G. Gray and Sven Leyffer", Title = "{Mixed Integer Support Vector Machines}", booktitle = "Neural Information Processing Systems (NIPS) Workshop on Optimization in Machine Learning", Year = "2009" }
cosine QUIC-SVD: Fast SVD Using Cosine Trees
Michael Holmes, Alexander Gray, and Charles Isbell
Neural Information Processing Systems (NIPS) 2008, appeared 2009


Fast singular value decomposition (SVD) using multi-resolution Monte Carlo based on cosine trees, a new data structure. Unlike previous Monte Carlo linear algebra approaches, it is not limited to a fixed-rank approximation and is thus automatically as accurate as the user requires, without the need for trying many different ranks or other parameters. It is also more efficient for the same number of samples than previous sampling schemes. [pdf]

Abstract: The Singular Value Decomposition is a key operation in many machine learning methods. Its computational cost, however, makes it unscalable and impractical for the massive-sized datasets becoming common in applications. We present a new method, QUIC-SVD, for fast approximation of the full SVD with automatic sample size minimization and empirical relative error control. Previous Monte Carlo approaches have not addressed the full SVD nor benefited from the efficiency of automatic, empirically-driven sample sizing. Our empirical tests show speedups of several orders of magnitude over exact SVD. Such scalability should enable QUIC-SVD to meet the needs of a wide array of methods and applications.

@Inproceedings{holmes2009quicsvd, author = "Michael Holmes and Alexander Gray and Charles Isbell", Title = "{QUIC-SVD: Fast SVD Using Cosine Trees}", booktitle = "Advances in Neural Information Processing Systems 21 (Dec 2008)", Year = "2009", publisher = {MIT Press} }
linear algebra Fast Kernel Matrix-Vector Multiplication with Application to Gaussian Process Learning
Alexander G. Gray
Carnegie Mellon University Technical Report, 2004


An example of using our dual-tree, automatically-error-controlled kernel summation methods for fast matrix-vector multiplication with kernel matrices, as arises in Gaussian process learning. The idea of using kernel summation methods inside linear algebraic methods has been used in some computational physics problems, but was introduced for statistical problems for the first time here. [pdf]

Abstract: A number of core computational problems in machine learning, both old and new, can be cast as a matrix-vector multiplication between a kernel matrix or class-probability matrix and a vector of weights. This arises prominently, for example, in the kernel estimation methods of nonparametric statistics, many common probabilistic graphical models, and the more recent kernel machines. After highlighting the existence of this computational problem in several well-known machine learning methods, we focus on a solution for one specific example for clarity, Gaussian process (GP) prediction - one whose applicability has been particularly hindered by this computational barrier. We demonstrate the application of a recent N-body approach developed specically for statistical problems, employing adaptive computational geometry and finite-difference approximation. This core algorithm reduces the O(N^2) matrix-vector multiplications within GP learning to O(N), making the resulting overall learning algorithm O(N). GP learning for N = 1 million points is demonstrated.

@techreport{gray2004kermatvec, title = "{Fast Kernel Matrix-Vector Multiplication with Application to Gaussian Process Regression}", author = "Alexander G. Gray", institution = "{Carnegie Mellon University}", series = "{Computer Science Department Technical Report, CMU-CS-04-110}", year = "2004" }