Source author record

Haim Avron

Haim Avron appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.NA Numerical Analysis Data Structures and Algorithms math.OC Distributed, Parallel, and Cluster Computing Computation Computational Geometry eess.SP math.DG

Catalog footprint

What is connected

16works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Manifold Free Riemannian Optimization

Riemannian optimization is a principled framework for solving optimization problems where the desired optimum is constrained to a smooth manifold $\mathcal{M}$. Algorithms designed in this framework usually require some geometrical description of the manifold, which typically includes tangent spaces, retractions, and gradients of the cost function. However, in many cases, only a subset (or none at all) of these elements can be accessed due to lack of information or intractability. In this paper, we propose a novel approach that can perform approximate Riemannian optimization in such cases, where the constraining manifold is a submanifold of $\R^{D}$. At the bare minimum, our method requires only a noiseless sample set of the cost function $(\x_{i}, y_{i})\in {\mathcal{M}} \times \mathbb{R}$ and the intrinsic dimension of the manifold $\mathcal{M}$. Using the samples, and utilizing the Manifold-MLS framework (Sober and Levin 2020), we construct approximations of the missing components entertaining provable guarantees and analyze their computational costs. In case some of the components are given analytically (e.g., if the cost function and its gradient are given explicitly, or if the tangent spaces can be computed), the algorithm can be easily adapted to use the accurate expressions instead of the approximations. We analyze the global convergence of Riemannian gradient-based methods using our approach, and we demonstrate empirically the strength of this method, together with a conjugate-gradients type method based upon similar principles.

preprint2022arXiv

Near Optimal Reconstruction of Spherical Harmonic Expansions

We propose an algorithm for robust recovery of the spherical harmonic expansion of functions defined on the d-dimensional unit sphere $\mathbb{S}^{d-1}$ using a near-optimal number of function evaluations. We show that for any $f \in L^2(\mathbb{S}^{d-1})$, the number of evaluations of $f$ needed to recover its degree-$q$ spherical harmonic expansion equals the dimension of the space of spherical harmonics of degree at most $q$ up to a logarithmic factor. Moreover, we develop a simple yet efficient algorithm to recover degree-$q$ expansion of $f$ by only evaluating the function on uniformly sampled points on $\mathbb{S}^{d-1}$. Our algorithm is based on the connections between spherical harmonics and Gegenbauer polynomials and leverage score sampling methods. Unlike the prior results on fast spherical harmonic transform, our proposed algorithm works efficiently using a nearly optimal number of samples in any dimension d. We further illustrate the empirical performance of our algorithm on numerical examples.

preprint2022arXiv

On the Convergence of Inexact Predictor-Corrector Methods for Linear Programming

Interior point methods (IPMs) are a common approach for solving linear programs (LPs) with strong theoretical guarantees and solid empirical performance. The time complexity of these methods is dominated by the cost of solving a linear system of equations at each iteration. In common applications of linear programming, particularly in machine learning and scientific computing, the size of this linear system can become prohibitively large, requiring the use of iterative solvers, which provide an approximate solution to the linear system. However, approximately solving the linear system at each iteration of an IPM invalidates the theoretical guarantees of common IPM analyses. To remedy this, we theoretically and empirically analyze (slightly modified) predictor-corrector IPMs when using approximate linear solvers: our approach guarantees that, when certain conditions are satisfied, the number of IPM iterations does not increase and that the final solution remains feasible. We also provide practical instantiations of approximate linear solvers that satisfy these conditions for special classes of constraint matrices using randomized linear algebra.

preprint2022arXiv

Random Gegenbauer Features for Scalable Kernel Methods

We propose efficient random features for approximating a new and rich class of kernel functions that we refer to as Generalized Zonal Kernels (GZK). Our proposed GZK family, generalizes the zonal kernels (i.e., dot-product kernels on the unit sphere) by introducing radial factors in their Gegenbauer series expansion, and includes a wide range of ubiquitous kernel functions such as the entirety of dot-product kernels as well as the Gaussian and the recently introduced Neural Tangent kernels. Interestingly, by exploiting the reproducing property of the Gegenbauer polynomials, we can construct efficient random features for the GZK family based on randomly oriented Gegenbauer kernels. We prove subspace embedding guarantees for our Gegenbauer features which ensures that our features can be used for approximately solving learning problems such as kernel k-means clustering, kernel ridge regression, etc. Empirical results show that our proposed features outperform recent kernel approximation methods.

preprint2022arXiv

Solving Trust Region Subproblems Using Riemannian Optimization

The Trust Region Subproblem is a fundamental optimization problem that takes a pivotal role in Trust Region Methods. However, the problem, and variants of it, also arise in quite a few other applications. In this article, we present a family of iterative Riemannian optimization algorithms for a variant of the Trust Region Subproblem that replaces the inequality constraint with an equality constraint, and converge to a global optimum. Our approach uses either a trivial or a non-trivial Riemannian geometry of the search-space, and requires only minimal spectral information about the quadratic component of the objective function. We further show how the theory of Riemannian optimization promotes a deeper understanding of the Trust Region Subproblem and its difficulties, e.g., a deep connection between the Trust Region Subproblem and the problem of finding affine eigenvectors, and a new examination of the so-called hard case in light of the condition number of the Riemannian Hessian operator at a global optimum. Finally, we propose to incorporate preconditioning via a careful selection of a variable Riemannian metric, and establish bounds on the asymptotic convergence rate in terms of how well the preconditioner approximates the input matrix.

preprint2021arXiv

Gauss-Legendre Features for Gaussian Process Regression

Gaussian processes provide a powerful probabilistic kernel learning framework, which allows learning high quality nonparametric regression models via methods such as Gaussian process regression. Nevertheless, the learning phase of Gaussian process regression requires massive computations which are not realistic for large datasets. In this paper, we present a Gauss-Legendre quadrature based approach for scaling up Gaussian process regression via a low rank approximation of the kernel matrix. We utilize the structure of the low rank approximation to achieve effective hyperparameter learning, training and prediction. Our method is very much inspired by the well-known random Fourier features approach, which also builds low-rank approximations via numerical integration. However, our method is capable of generating high quality approximation to the kernel using an amount of features which is poly-logarithmic in the number of training points, while similar guarantees will require an amount that is at the very least linear in the number of training points when random Fourier features. Furthermore, the structure of the low-rank approximation that our method builds is subtly different from the one generated by random Fourier features, and this enables much more efficient hyperparameter learning. The utility of our method for learning with low-dimensional datasets is demonstrated using numerical experiments.

preprint2020arXiv

Polynomial Tensor Sketch for Element-wise Function of Low-Rank Matrix

This paper studies how to sketch element-wise functions of low-rank matrices. Formally, given low-rank matrix A = [Aij] and scalar non-linear function f, we aim for finding an approximated low-rank representation of the (possibly high-rank) matrix [f(Aij)]. To this end, we propose an efficient sketching-based algorithm whose complexity is significantly lower than the number of entries of A, i.e., it runs without accessing all entries of [f(Aij)] explicitly. The main idea underlying our method is to combine a polynomial approximation of f with the existing tensor sketch scheme for approximating monomials of entries of A. To balance the errors of the two approximation components in an optimal manner, we propose a novel regression formula to find polynomial coefficients given A and f. In particular, we utilize a coreset-based regression with a rigorous approximation guarantee. Finally, we demonstrate the applicability and superiority of the proposed scheme under various machine learning tasks.

preprint2020arXiv

Speeding up Linear Programming using Randomized Linear Algebra

Linear programming (LP) is an extremely useful tool and has been successfully applied to solve various problems in a wide range of areas, including operations research, engineering, economics, or even more abstract mathematical areas such as combinatorics. It is also used in many machine learning applications, such as $\ell_1$-regularized SVMs, basis pursuit, nonnegative matrix factorization, etc. Interior Point Methods (IPMs) are one of the most popular methods to solve LPs both in theory and in practice. Their underlying complexity is dominated by the cost of solving a system of linear equations at each iteration. In this paper, we consider \emph{infeasible} IPMs for the special case where the number of variables is much larger than the number of constraints. Using tools from Randomized Linear Algebra, we present a preconditioning technique that, when combined with the Conjugate Gradient iterative solver, provably guarantees that infeasible IPM algorithms (suitably modified to account for the error incurred by the approximate solver), converge to a feasible, approximately optimal solution, without increasing their iteration complexity. Our empirical evaluations verify our theoretical results on both real-world and synthetic data.

preprint2019arXiv

Tensor-Tensor Products for Optimal Representation and Compression

In this era of big data, data analytics and machine learning, it is imperative to find ways to compress large data sets such that intrinsic features necessary for subsequent analysis are not lost. The traditional workhorse for data dimensionality reduction and feature extraction has been the matrix SVD, which presupposes that the data has been arranged in matrix format. Our main goal in this study is to show that high-dimensional data sets are more compressible when treated as tensors (aka multiway arrays) and compressed via tensor-SVDs under the tensor-tensor product structures in (Kilmer and Martin, 2011; Kernfeld et al., 2015). We begin by proving Eckart Young optimality results for families of tensor-SVDs under two different truncation strategies. As such optimality properties can be proven in both matrix and tensor-based algebras, a fundamental question arises: does the tensor construct subsume the matrix construct in terms of representation efficiency? The answer is yes, as shown when we prove that a tensor-tensor representation of an equal dimensional spanning space can be superior to its matrix counterpart. We then investigate how the compressed representation provided by the truncated tensor-SVD is related both theoretically and in compression performance to its closest tensor-based analogue, truncated HOSVD (De Lathauwer et al., 2000; De Lathauwer and Vandewalle, 2004), thereby showing the potential advantages of our tensor-based algorithms. Finally, we propose new tensor truncated SVD variants, namely multi-way tensor SVDs, provide further approximated representation efficiency and discuss under which conditions they are considered optimal. We conclude with a numerical study demonstrating the utility of the theory.

preprint2015arXiv

High-performance Kernel Machines with Implicit Distributed Optimization and Randomization

In order to fully utilize "big data", it is often required to use "big models". Such models tend to grow with the complexity and size of the training data, and do not make strong parametric assumptions upfront on the nature of the underlying statistical dependencies. Kernel methods fit this need well, as they constitute a versatile and principled statistical methodology for solving a wide range of non-parametric modelling problems. However, their high computational costs (in storage and time) pose a significant barrier to their widespread adoption in big data applications. We propose an algorithmic framework and high-performance implementation for massive-scale training of kernel-based statistical models, based on combining two key technical ingredients: (i) distributed general purpose convex optimization, and (ii) the use of randomization to improve the scalability of kernel methods. Our approach is based on a block-splitting variant of the Alternating Directions Method of Multipliers, carefully reconfigured to handle very large random feature matrices, while exploiting hybrid parallelism typically found in modern clusters of multicore machines. Our implementation supports a variety of statistical learning tasks by enabling several loss functions, regularization schemes, kernels, and layers of randomized approximations for both dense and sparse datasets, in a highly extensible framework. We evaluate the ability of our framework to learn models on data from applications, and provide a comparison against existing sequential and parallel libraries.

preprint2015arXiv

Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels

We consider the problem of improving the efficiency of randomized Fourier feature maps to accelerate training and testing speed of kernel methods on large datasets. These approximate feature maps arise as Monte Carlo approximations to integral representations of shift-invariant kernel functions (e.g., Gaussian kernel). In this paper, we propose to use Quasi-Monte Carlo (QMC) approximations instead, where the relevant integrands are evaluated on a low-discrepancy sequence of points as opposed to random point sets as in the Monte Carlo approach. We derive a new discrepancy measure called box discrepancy based on theoretical characterizations of the integration error with respect to a given sequence. We then propose to learn QMC sequences adapted to our setting based on explicit box discrepancy minimization. Our theoretical analyses are complemented with empirical results that demonstrate the effectiveness of classical and adaptive QMC techniques for this problem.

preprint2015arXiv

Revisiting Asynchronous Linear Solvers: Provable Convergence Rate Through Randomization

Asynchronous methods for solving systems of linear equations have been researched since Chazan and Miranker's pioneering 1969 paper on chaotic relaxation. The underlying idea of asynchronous methods is to avoid processor idle time by allowing the processors to continue to make progress even if not all progress made by other processors has been communicated to them. Historically, the applicability of asynchronous methods for solving linear equations was limited to certain restricted classes of matrices, such as diagonally dominant matrices. Furthermore, analysis of these methods focused on proving convergence in the limit. Comparison of the asynchronous convergence rate with its synchronous counterpart and its scaling with the number of processors were seldom studied, and are still not well understood. In this paper, we propose a randomized shared-memory asynchronous method for general symmetric positive definite matrices. We rigorously analyze the convergence rate and prove that it is linear, and is close to that of the method's synchronous counterpart if the processor count is not excessive relative to the size and sparsity of the matrix. We also present an algorithm for unsymmetric systems and overdetermined least-squares. Our work presents a significant improvement in the applicability of asynchronous linear solvers as well as in their convergence analysis, and suggests randomization as a key paradigm to serve as a foundation for asynchronous methods.

preprint2014arXiv

Effective Stiffness: Generalizing Effective Resistance Sampling to Finite Element Matrices

We define the notion of effective stiffness and show that it can used to build sparsifiers, algorithms that sparsify linear systems arising from finite-element discretizations of PDEs. In particular, we show that sampling $O(n\log n)$ elements according to probabilities derived from effective stiffnesses yields a high quality preconditioner that can be used to solve the linear system in a small number of iterations. Effective stiffness generalizes the notion of effective resistance, a key ingredient of recent progress in developing nearly linear symmetric diagonally dominant (SDD) linear solvers. Solving finite elements problems is of considerably more interest than the solution of SDD linear systems, since the finite element method is frequently used to numerically solve PDEs arising in scientific and engineering applications. Unlike SDD systems, which are relatively easy to solve, there has been limited success in designing fast solvers for finite element systems, and previous algorithms usually target discretization of limited class of PDEs like scalar elliptic or 2D trusses. Our sparsifier is general; it applies to a wide range of finite-element discretizations. A sparsifier does not constitute a complete linear solver. To construct a solver, one needs additional components (e.g., an efficient elimination or multilevel scheme for the sparsified system). Still, sparsifiers have been a critical tools in efficient SDD solvers, and we believe that our sparsifier will become a key ingredient in future fast finite-element solvers.

preprint2013arXiv

Efficient Dimensionality Reduction for Canonical Correlation Analysis

We present a fast algorithm for approximate Canonical Correlation Analysis (CCA). Given a pair of tall-and-thin matrices, the proposed algorithm first employs a randomized dimensionality reduction transform to reduce the size of the input matrices, and then applies any CCA algorithm to the new pair of matrices. The algorithm computes an approximate CCA to the original pair of matrices with provable guarantees, while requiring asymptotically less operations than the state-of-the-art exact algorithms.

preprint2013arXiv

Faster Subset Selection for Matrices and Applications

We study subset selection for matrices defined as follows: given a matrix $\matX \in \R^{n \times m}$ ($m > n$) and an oversampling parameter $k$ ($n \le k \le m$), select a subset of $k$ columns from $\matX$ such that the pseudo-inverse of the subsampled matrix has as smallest norm as possible. In this work, we focus on the Frobenius and the spectral matrix norms. We describe several novel (deterministic and randomized) approximation algorithms for this problem with approximation bounds that are optimal up to constant factors. Additionally, we show that the combinatorial problem of finding a low-stretch spanning tree in an undirected graph corresponds to subset selection, and discuss various implications of this reduction.

preprint2012arXiv

Efficient and Practical Stochastic Subgradient Descent for Nuclear Norm Regularization

We describe novel subgradient methods for a broad class of matrix optimization problems involving nuclear norm regularization. Unlike existing approaches, our method executes very cheap iterations by combining low-rank stochastic subgradients with efficient incremental SVD updates, made possible by highly optimized and parallelizable dense linear algebra operations on small matrices. Our practical algorithms always maintain a low-rank factorization of iterates that can be conveniently held in memory and efficiently multiplied to generate predictions in matrix completion settings. Empirical comparisons confirm that our approach is highly competitive with several recently proposed state-of-the-art solvers for such problems.

Haim Avron

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

Manifold Free Riemannian Optimization

Near Optimal Reconstruction of Spherical Harmonic Expansions

On the Convergence of Inexact Predictor-Corrector Methods for Linear Programming

Random Gegenbauer Features for Scalable Kernel Methods

Solving Trust Region Subproblems Using Riemannian Optimization

Gauss-Legendre Features for Gaussian Process Regression

Polynomial Tensor Sketch for Element-wise Function of Low-Rank Matrix

Speeding up Linear Programming using Randomized Linear Algebra

Tensor-Tensor Products for Optimal Representation and Compression

High-performance Kernel Machines with Implicit Distributed Optimization and Randomization

Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels

Revisiting Asynchronous Linear Solvers: Provable Convergence Rate Through Randomization

Effective Stiffness: Generalizing Effective Resistance Sampling to Finite Element Matrices

Efficient Dimensionality Reduction for Canonical Correlation Analysis

Faster Subset Selection for Matrices and Applications

Efficient and Practical Stochastic Subgradient Descent for Nuclear Norm Regularization