Source author record

Per-Gunnar Martinsson

Per-Gunnar Martinsson appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.NA Numerical Analysis Mathematical Software Artificial Intelligence Computation Computation and Language Computational Engineering, Finance, and Science Data Structures and Algorithms Distributed, Parallel, and Cluster Computing math.PR

Catalog footprint

What is connected

21works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Randomized Algorithms for Low-Rank Matrix and Tensor Decompositions

This paper surveys randomized algorithms in numerical linear algebra for low-rank decompositions of matrices and tensors. The survey begins with a review of classical matrix algorithms that can be accelerated by randomized dimensionality reduction, such as the singular value decomposition (SVD) or interpolative (ID) and CUR decompositions. Recent advances in randomized dimensionality reduction are discussed, including new methods of fast matrix sketching and sampling techniques, which are incorporated into classical matrix algorithms for fast low-rank matrix approximations. The extension of randomized matrix algorithms to tensors is then explored for several low-rank tensor decompositions in the CP and Tucker formats, including the higher-order SVD, ID, and CUR decomposition.

preprint2022arXiv

Randomized Algorithms for Scientific Computing (RASC)

Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and scalability. This report summarizes the outcomes of that workshop, "Randomized Algorithms for Scientific Computing (RASC)," held virtually across four days in December 2020 and January 2021.

preprint2022arXiv

Simpler is better: A comparative study of randomized algorithms for computing the CUR decomposition

The CUR decomposition is a technique for low-rank approximation that selects small subsets of the columns and rows of a given matrix to use as bases for its column and rowspaces. It has recently attracted much interest, as it has several advantages over traditional low rank decompositions based on orthonormal bases. These include the preservation of properties such as sparsity or non-negativity, the ability to interpret data, and reduced storage requirements. The problem of finding the skeleton sets that minimize the norm of the residual error is known to be NP-hard, but classical pivoting schemes such as column pivoted QR work tend to work well in practice. When combined with randomized dimension reduction techniques, classical pivoting based methods become particularly effective, and have proven capable of very rapidly computing approximate CUR decompositions of large, potentially sparse, matrices. Another class of popular algorithms for computing CUR de-compositions are based on drawing the columns and rows randomly from the full index sets, using specialized probability distributions based on leverage scores. Such sampling based techniques are particularly appealing for very large scale problems, and are well supported by theoretical performance guarantees. This manuscript provides a comparative study of the various randomized algorithms for computing CUR decompositions that have recently been proposed. Additionally, it proposes some modifications and simplifications to the existing algorithms that leads to faster execution times.

preprint2022arXiv

Solving Linear Systems on a GPU with Hierarchically Off-Diagonal Low-Rank Approximations

We are interested in solving linear systems arising from three applications: (1) kernel methods in machine learning, (2) discretization of boundary integral equations from mathematical physics, and (3) Schur complements formed in the factorization of many large sparse matrices. The coefficient matrices are often data-sparse in the sense that their off-diagonal blocks have low numerical ranks; specifically, we focus on "hierarchically off-diagonal low-rank (HODLR)" matrices. We introduce algorithms for factorizing HODLR matrices and for applying the factorizations on a GPU. The algorithms leverage the efficiency of batched dense linear algebra, and they scale nearly linearly with the matrix size when the numerical ranks are fixed. The accuracy of the HODLR-matrix approximation is a tunable parameter, so we can construct high-accuracy fast direct solvers or low-accuracy robust preconditioners. Numerical results show that we can solve problems with several millions of unknowns in a couple of seconds on a single GPU.

preprint2020arXiv

An accelerated, high-order accurate direct solver for the Lippmann-Schwinger equation for acoustic scattering in the plane

An efficient direct solver for solving the Lippmann-Schwinger integral equation modeling acoustic scattering in the plane is presented. For a problem with $N$ degrees of freedom, the solver constructs an approximate inverse in $\mathcal{O}(N^{3/2})$ operations and then, given an incident field, can compute the scattered field in $\mathcal{O}(N \log N)$ operations. The solver is based on a previously published direct solver for integral equations that relies on rank-deficiencies in the off-diagonal blocks; specifically, the so-called Hierarchically Block Separable format is used. The particular solver described here has been reformulated in a way that improves numerical stability and robustness, and exploits the particular structure of the kernel in the Lippmann-Schwinger equation to accelerate the computation of an approximate inverse. The solver is coupled with a Nyström discretization on a regular square grid, using a quadrature method developed by Ran Duan and Vladimir Rokhlin that attains high-order accuracy despite the singularity in the kernel of the integral equation. A particularly efficient solver is obtained when the direct solver is run at four digits of accuracy, and is used as a preconditioner to GMRES, with each forwards application of the integral operators accelerated by the FFT. Extensive numerical experiments are presented that illustrate the high performance of the method in challenging environments. Using the $10^{\rm th}$-order accurate version of the Duan-Rokhlin quadrature rule, the scheme is capable of solving problems on domains that are over 500 wavelengths wide to residual error below $10^{-10}$ in a couple of hours on a workstation, using 26M degrees of freedom.

preprint2020arXiv

Computing rank-revealing factorizations of matrices stored out-of-core

This paper describes efficient algorithms for computing rank-revealing factorizations of matrices that are too large to fit in RAM, and must instead be stored on slow external memory devices such as solid-state or spinning disk hard drives (out-of-core or out-of-memory). Traditional algorithms for computing rank revealing factorizations, such as the column pivoted QR factorization, or techniques for computing a full singular value decomposition of a matrix, are very communication intensive. They are naturally expressed as a sequence of matrix-vector operations, which become prohibitively expensive when data is not available in main memory. Randomization allows these methods to be reformulated so that large contiguous blocks of the matrix can be processed in bulk. The paper describes two distinct methods. The first is a blocked version of column pivoted Householder QR, organized as a "left-looking" method to minimize the number of write operations (which are more expensive than read operations on a spinning disk drive). The second method results in a so called UTV factorization which expresses a matrix $A$ as $A = U T V^*$ where $U$ and $V$ are unitary, and $T$ is triangular. This method is organized as an algorithm-by-blocks, in which floating point operations overlap read and write operations. The second method incorporates power iterations, and is exceptionally good at revealing the numerical rank; it can often be used as a substitute for a full singular value decomposition. Numerical experiments demonstrate that the new algorithms are almost as fast when processing data stored on a hard drive as traditional algorithms are for data stored in main memory. To be precise, the computational time for fully factorizing an $n\times n$ matrix scales as $cn^{3}$, with a scaling constant $c$ that is only marginally larger when the matrix is stored out of core.

preprint2020arXiv

Corrected Trapezoidal Rules for Boundary Integral Equations in Three Dimensions

The manuscript describes a quadrature rule that is designed for the high order discretization of boundary integral equations (BIEs) using the Nyström method. The technique is designed for surfaces that can naturally be parameterized using a uniform grid on a rectangle, such as deformed tori, or channels with periodic boundary conditions. When a BIE on such a geometry is discretized using the Nyström method based on the Trapezoidal quadrature rule, the resulting scheme tends to converge only slowly, due to the singularity in the kernel function. The key finding of the manuscript is that the convergence order can be greatly improved by modifying only a very small number of elements in the coefficient matrix. Specifically, it is demonstrated that by correcting only the diagonal entries in the coefficient matrix, $O(h^{3})$ convergence can be attained for the single and double layer potentials associated with both the Laplace and the Helmholtz kernels. A nine-point correction stencil leads to an $O(h^5)$ scheme. The method proposed can be viewed as a generalization of the quadrature rule of Duan and Rokhlin, which was designed for the 2D Lippmann-Schwinger equation in the plane. The techniques proposed are supported by a rigorous error analysis that relies on Wigner-type limits involving the Epstein zeta function and its parametric derivatives.

preprint2016arXiv

Efficient Algorithms for CUR and Interpolative Matrix Decompositions

The manuscript describes efficient algorithms for the computation of the CUR and ID decompositions. The methods used are based on simple modifications to the classical truncated pivoted QR decomposition, which means that highly optimized library codes can be utilized for implementation. For certain applications, further acceleration can be attained by incorporating techniques based on randomized projections. Numerical experiments demonstrate advantageous performance compared to existing techniques for computing CUR factorizations.

preprint2016arXiv

RSVDPACK: An implementation of randomized algorithms for computing the singular value, interpolative, and CUR decompositions of matrices on multi-core and GPU architectures

RSVDPACK is a library of functions for computing low rank approximations of matrices. The library includes functions for computing standard (partial) factorizations such as the Singular Value Decomposition (SVD), and also so called "structure preserving" factorizations such as the Interpolative Decomposition (ID) and the CUR decomposition. The ID and CUR factorizations pick subsets of the rows/columns of a matrix to use as bases for its row/column space. Such factorizations preserve properties of the matrix such as sparsity or non-negativity, are helpful in data interpretation, and require in certain contexts less memory than a partial SVD. The package implements highly efficient computational algorithms based on randomized sampling, as described and analyzed in [N. Halko, P.G. Martinsson, J. Tropp, "Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions," SIAM Review, 53(2), 2011], and subsequent papers. This manuscript presents some modifications to the basic algorithms that improve performance and ease of use. The library is written in C and supports both multi-core CPU and GPU architectures.

preprint2015arXiv

A randomized blocked algorithm for efficiently computing rank-revealing factorizations of matrices

This manuscript describes a technique for computing partial rank-revealing factorizations, such as, e.g, a partial QR factorization or a partial singular value decomposition. The method takes as input a tolerance $\varepsilon$ and an $m\times n$ matrix $A$, and returns an approximate low rank factorization of $A$ that is accurate to within precision $\varepsilon$ in the Frobenius norm (or some other easily computed norm). The rank $k$ of the computed factorization (which is an output of the algorithm) is in all examples we examined very close to the theoretically optimal $\varepsilon$-rank. The proposed method is inspired by the Gram-Schmidt algorithm, and has the same $O(mnk)$ asymptotic flop count. However, the method relies on randomized sampling to avoid column pivoting, which allows it to be blocked, and hence accelerates practical computations by reducing communication. Numerical experiments demonstrate that the accuracy of the scheme is for every matrix that was tried at least as good as column-pivoted QR, and is sometimes much better. Computational speed is also improved substantially, in particular on GPU architectures.

preprint2015arXiv

Compressing rank-structured matrices via randomized sampling

Randomized sampling has recently been proven a highly efficient technique for computing approximate factorizations of matrices that have low numerical rank. This paper describes an extension of such techniques to a wider class of matrices that are not themselves rank-deficient, but have off-diagonal blocks that are; specifically, the classes of so called \textit{Hierarchically Off-Diagonal Low Rank (HODLR)} matrices and \textit{Hierarchically Block Separable (HBS)} matrices. Such matrices arise frequently in numerical analysis and signal processing, in particular in the construction of fast methods for solving differential and integral equations numerically. These structures admit algebraic operations (matrix-vector multiplications, matrix factorizations, matrix inversion, etc.) to be performed very rapidly; but only once a data-sparse representation of the matrix has been constructed. How to rapidly compute this representation in the first place is much less well understood. The present paper demonstrates that if an $N\times N$ matrix can be applied to a vector in $O(N)$ time, and if the ranks of the off-diagonal blocks are bounded by an integer $k$, then the cost for constructing a HODLR representation is $O(k^{2}\,N\,(\log N)^{2})$, and the cost for constructing an HBS representation is $O(k^{2}\,N\,\log N)$. The point is that when legacy codes (based on, e.g., the Fast Multipole Method) can be used for the fast matrix-vector multiply, the proposed algorithm can be used to obtain the data-sparse representation of the matrix, and then well-established techniques for HODLR/HBS matrices can be used to invert or factor the matrix. The proposed scheme is also useful in simplifying the implementation of certain operations on rank-structured matrices such as the matrix-matrix multiplication, low-rank update, addition, etc.

preprint2014arXiv

An efficient and highly accurate solver for multi-body acoustic scattering problems involving rotationally symmetric scatterers

A numerical method for solving the equations modeling acoustic scattering is presented. The method is capable of handling several dozen scatterers, each of which is several wave-lengths long, on a personal work station. Even for geometries involving cavities, solutions accurate to seven digits or better were obtained. The method relies on a Boundary Integral Equation formulation of the scattering problem, discretized using a high-order accurate Nyström method. A hybrid iterative/direct solver is used in which a local scattering matrix for each body is computed, and then GMRES, accelerated by the Fast Multipole Method, is used to handle reflections between the scatterers. The main limitation of the method described is that it currently applies only to scattering bodies that are rotationally symmetric.

preprint2013arXiv

A high-order accurate accelerated direct solver for acoustic scattering from surfaces

We describe an accelerated direct solver for the integral equations which model acoustic scattering from curved surfaces. Surfaces are specified via a collection of smooth parameterizations given on triangles, a setting which generalizes the typical one of triangulated surfaces, and the integral equations are discretized via a high-order Nystrom method. This allows for rapid convergence in cases in which high-order surface information is available. The high-order discretization technique is coupled with a direct solver based on the recursive construction of scattering matrices. The result is a solver which often attains $O(N^{1.5})$ complexity in the number of discretization nodes $N$ and which is resistant to many of the pathologies which stymie iterative solvers in the numerical simulation of scattering. The performance of the algorithm is illustrated with numerical experiments which involve the simulation of scattering from a variety of domains, including one consisting of a collection of 1000 ellipsoids with randomly oriented semiaxes arranged in a grid, and a domain whose boundary has 12 curved edges and 8 corner points.

preprint2013arXiv

A spectrally accurate direct solution technique for frequency-domain scattering problems with variable media

This paper presents a direct solution technique for the scattering of time-harmonic waves from a bounded region of the plane in which the wavenumber varies smoothly in space.The method constructs the interior Dirichlet-to-Neumann (DtN) map for the bounded region via bottom-up recursive merges of (discretization of) certain boundary operators on a quadtree of boxes.These operators take the form of impedance-to-impedance (ItI) maps. Since ItI maps are unitary, this formulation is inherently numerically stable, and is immune to problems of artificial internal resonances. The ItI maps on the smallest (leaf) boxes are built by spectral collocation on tensor-product grids of Chebyshev nodes. At the top level the DtN map is recovered from the ItI map and coupled to a boundary integral formulation of the free space exterior problem, to give a provably second kind equation.Numerical results indicate that the scheme can solve challenging problems 70 wavelengths on a side to 9-digit accuracy with 4 million unknowns, in under 5 minutes on a desktop workstation. Each additional solve corresponding to a different incident wave (right-hand side) then requires only 0.04 seconds.

preprint2013arXiv

An O(N) algorithm for constructing the solution operator to 2D elliptic boundary value problems in the absence of body loads

The large sparse linear systems arising from the finite element or finite difference discretization of elliptic PDEs can be solved directly via, e.g., nested dissection or multifrontal methods. Such techniques reorder the nodes in the grid to reduce the asymptotic complexity of Gaussian elimination from $O(N^{2})$ to $O(N^{1.5})$ for typical problems in two dimensions. It has recently been demonstrated that the complexity can be further reduced to O(N) by exploiting structure in the dense matrices that arise in such computations (using, e.g., $\mathcal{H}$-matrix arithmetic). This paper demonstrates that such \textit{accelerated} nested dissection techniques become particularly effective for boundary value problems without body loads when the solution is sought for several different sets of boundary data, and the solution is required only near the boundary (as happens, e.g., in the computational modeling of scattering problems, or in engineering design of linearly elastic solids.

preprint2013arXiv

An O(N) Direct Solver for Integral Equations on the Plane

An efficient direct solver for volume integral equations with O(N) complexity for a broad range of problems is presented. The solver relies on hierarchical compression of the discretized integral operator, and exploits that off-diagonal blocks of certain dense matrices have numerically low rank. Technically, the solver is inspired by previously developed direct solvers for integral equations based on "recursive skeletonization" and "Hierarchically Semi-Separable" (HSS) matrices, but it improves on the asymptotic complexity of existing solvers by incorporating an additional level of compression. The resulting solver has optimal O(N) complexity for all stages of the computation, as demonstrated by both theoretical analysis and numerical examples. The computational examples further display good practical performance in terms of both speed and memory usage. In particular, it is demonstrated that even problems involving 10^{7} unknowns can be solved to precision 10^{-10} using a simple Matlab implementation of the algorithm executed on a single core.

preprint2011arXiv

A direct solver with O(N) complexity for integral equations on one-dimensional domains

An algorithm for the direct inversion of the linear systems arising from Nystrom discretization of integral equations on one-dimensional domains is described. The method typically has O(N) complexity when applied to boundary integral equations (BIEs) in the plane with non-oscillatory kernels such as those associated with the Laplace and Stokes' equations. The scaling coefficient suppressed by the "big-O" notation depends logarithmically on the requested accuracy. The method can also be applied to BIEs with oscillatory kernels such as those associated with the Helmholtz and Maxwell equations; it is efficient at long and intermediate wave-lengths, but will eventually become prohibitively slow as the wave-length decreases. To achieve linear complexity, rank deficiencies in the off-diagonal blocks of the coefficient matrix are exploited. The technique is conceptually related to the H and H^2 matrix arithmetic of Hackbusch and co-workers, and is closely related to previous work on Hierarchically Semi-Separable matrices.

preprint2011arXiv

A high-order accurate discretization scheme for variable coefficient elliptic PDEs in the plane with smooth solutions

A discretization scheme for variable coefficient elliptic PDEs in the plane is presented. The scheme is based on high-order Gaussian quadratures and is designed for problems with smooth solutions, such as scattering problems involving soft scatterers. The resulting system of linear equations is very well suited to efficient direct solvers such as nested dissection and the more recently proposed accelerated nested dissection schemes with O(N) complexity.

preprint2011arXiv

An algorithm for the principal component analysis of large data sets

Recently popularized randomized methods for principal component analysis (PCA) efficiently and reliably produce nearly optimal accuracy --- even on parallel processors --- unlike the classical (deterministic) alternatives. We adapt one of these randomized methods for use with data sets that are too large to be stored in random-access memory (RAM). (The traditional terminology is that our procedure works efficiently "out-of-core.") We illustrate the performance of the algorithm via several numerical examples. For example, we report on the PCA of a data set stored on disk that is so large that less than a hundredth of it can fit in our computer's RAM.

preprint2010arXiv

A Direct Solver for the Rapid Solution of Boundary Integral Equations on Axisymmetric Surfaces in Three Dimensions

A scheme for rapidly and accurately computing solutions to boundary integral equations (BIEs) on rotationally symmetric surfaces in three dimensions is presented. The scheme uses the Fourier transform to reduce the original BIE defined on a surface to a sequence of BIEs defined on a generating curve for the surface. It can handle loads that are not necessarily rotationally symmetric. Nystrom discretization is used to discretize the BIEs on the generating curve. The quadrature used is a high-order Gaussian rule that is modified near the diagonal to retain high-order accuracy for singular kernels. The reduction in dimensionality, along with the use of high-order accurate quadratures, leads to small linear systems that can be inverted directly via, e.g., Gaussian elimination. This makes the scheme particularly fast in environments involving multiple right hand sides. It is demonstrated that for BIEs associated with Laplace's equation, the kernel in the reduced equations can be evaluated very rapidly by exploiting recursion relations for Legendre functions. Numerical examples illustrate the performance of the scheme; in particular, it is demonstrated that for a BIE associated with Laplace's equation on a surface discretized using 320 000 points, the set-up phase of the algorithm takes 2 minutes on a standard desktop, and then solves can be executed in 0.5 seconds.

preprint2010arXiv

Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions

Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed---either explicitly or implicitly---to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis.

Per-Gunnar Martinsson

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

Randomized Algorithms for Low-Rank Matrix and Tensor Decompositions

Randomized Algorithms for Scientific Computing (RASC)

Simpler is better: A comparative study of randomized algorithms for computing the CUR decomposition

Solving Linear Systems on a GPU with Hierarchically Off-Diagonal Low-Rank Approximations

An accelerated, high-order accurate direct solver for the Lippmann-Schwinger equation for acoustic scattering in the plane

Computing rank-revealing factorizations of matrices stored out-of-core

Corrected Trapezoidal Rules for Boundary Integral Equations in Three Dimensions

Efficient Algorithms for CUR and Interpolative Matrix Decompositions

RSVDPACK: An implementation of randomized algorithms for computing the singular value, interpolative, and CUR decompositions of matrices on multi-core and GPU architectures

A randomized blocked algorithm for efficiently computing rank-revealing factorizations of matrices

Compressing rank-structured matrices via randomized sampling

An efficient and highly accurate solver for multi-body acoustic scattering problems involving rotationally symmetric scatterers

A high-order accurate accelerated direct solver for acoustic scattering from surfaces

A spectrally accurate direct solution technique for frequency-domain scattering problems with variable media

An O(N) algorithm for constructing the solution operator to 2D elliptic boundary value problems in the absence of body loads

An O(N) Direct Solver for Integral Equations on the Plane

A direct solver with O(N) complexity for integral equations on one-dimensional domains

A high-order accurate discretization scheme for variable coefficient elliptic PDEs in the plane with smooth solutions

An algorithm for the principal component analysis of large data sets

A Direct Solver for the Rapid Solution of Boundary Integral Equations on Axisymmetric Surfaces in Three Dimensions

Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions