Researcher profile

Per-Gunnar Martinsson

Per-Gunnar Martinsson contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2025arXiv

Randomized Algorithms for Low-Rank Matrix and Tensor Decompositions

This paper surveys randomized algorithms in numerical linear algebra for low-rank decompositions of matrices and tensors. The survey begins with a review of classical matrix algorithms that can be accelerated by randomized dimensionality reduction, such as the singular value decomposition (SVD) or interpolative (ID) and CUR decompositions. Recent advances in randomized dimensionality reduction are discussed, including new methods of fast matrix sketching and sampling techniques, which are incorporated into classical matrix algorithms for fast low-rank matrix approximations. The extension of randomized matrix algorithms to tensors is then explored for several low-rank tensor decompositions in the CP and Tucker formats, including the higher-order SVD, ID, and CUR decomposition.

preprint2022arXiv

Randomized Algorithms for Scientific Computing (RASC)

Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and scalability. This report summarizes the outcomes of that workshop, "Randomized Algorithms for Scientific Computing (RASC)," held virtually across four days in December 2020 and January 2021.

preprint2022arXiv

Simpler is better: A comparative study of randomized algorithms for computing the CUR decomposition

The CUR decomposition is a technique for low-rank approximation that selects small subsets of the columns and rows of a given matrix to use as bases for its column and rowspaces. It has recently attracted much interest, as it has several advantages over traditional low rank decompositions based on orthonormal bases. These include the preservation of properties such as sparsity or non-negativity, the ability to interpret data, and reduced storage requirements. The problem of finding the skeleton sets that minimize the norm of the residual error is known to be NP-hard, but classical pivoting schemes such as column pivoted QR work tend to work well in practice. When combined with randomized dimension reduction techniques, classical pivoting based methods become particularly effective, and have proven capable of very rapidly computing approximate CUR decompositions of large, potentially sparse, matrices. Another class of popular algorithms for computing CUR de-compositions are based on drawing the columns and rows randomly from the full index sets, using specialized probability distributions based on leverage scores. Such sampling based techniques are particularly appealing for very large scale problems, and are well supported by theoretical performance guarantees. This manuscript provides a comparative study of the various randomized algorithms for computing CUR decompositions that have recently been proposed. Additionally, it proposes some modifications and simplifications to the existing algorithms that leads to faster execution times.

preprint2022arXiv

Solving Linear Systems on a GPU with Hierarchically Off-Diagonal Low-Rank Approximations

We are interested in solving linear systems arising from three applications: (1) kernel methods in machine learning, (2) discretization of boundary integral equations from mathematical physics, and (3) Schur complements formed in the factorization of many large sparse matrices. The coefficient matrices are often data-sparse in the sense that their off-diagonal blocks have low numerical ranks; specifically, we focus on "hierarchically off-diagonal low-rank (HODLR)" matrices. We introduce algorithms for factorizing HODLR matrices and for applying the factorizations on a GPU. The algorithms leverage the efficiency of batched dense linear algebra, and they scale nearly linearly with the matrix size when the numerical ranks are fixed. The accuracy of the HODLR-matrix approximation is a tunable parameter, so we can construct high-accuracy fast direct solvers or low-accuracy robust preconditioners. Numerical results show that we can solve problems with several millions of unknowns in a couple of seconds on a single GPU.

preprint2020arXiv

An accelerated, high-order accurate direct solver for the Lippmann-Schwinger equation for acoustic scattering in the plane

An efficient direct solver for solving the Lippmann-Schwinger integral equation modeling acoustic scattering in the plane is presented. For a problem with $N$ degrees of freedom, the solver constructs an approximate inverse in $\mathcal{O}(N^{3/2})$ operations and then, given an incident field, can compute the scattered field in $\mathcal{O}(N \log N)$ operations. The solver is based on a previously published direct solver for integral equations that relies on rank-deficiencies in the off-diagonal blocks; specifically, the so-called Hierarchically Block Separable format is used. The particular solver described here has been reformulated in a way that improves numerical stability and robustness, and exploits the particular structure of the kernel in the Lippmann-Schwinger equation to accelerate the computation of an approximate inverse. The solver is coupled with a Nyström discretization on a regular square grid, using a quadrature method developed by Ran Duan and Vladimir Rokhlin that attains high-order accuracy despite the singularity in the kernel of the integral equation. A particularly efficient solver is obtained when the direct solver is run at four digits of accuracy, and is used as a preconditioner to GMRES, with each forwards application of the integral operators accelerated by the FFT. Extensive numerical experiments are presented that illustrate the high performance of the method in challenging environments. Using the $10^{\rm th}$-order accurate version of the Duan-Rokhlin quadrature rule, the scheme is capable of solving problems on domains that are over 500 wavelengths wide to residual error below $10^{-10}$ in a couple of hours on a workstation, using 26M degrees of freedom.

preprint2020arXiv

Computing rank-revealing factorizations of matrices stored out-of-core

This paper describes efficient algorithms for computing rank-revealing factorizations of matrices that are too large to fit in RAM, and must instead be stored on slow external memory devices such as solid-state or spinning disk hard drives (out-of-core or out-of-memory). Traditional algorithms for computing rank revealing factorizations, such as the column pivoted QR factorization, or techniques for computing a full singular value decomposition of a matrix, are very communication intensive. They are naturally expressed as a sequence of matrix-vector operations, which become prohibitively expensive when data is not available in main memory. Randomization allows these methods to be reformulated so that large contiguous blocks of the matrix can be processed in bulk. The paper describes two distinct methods. The first is a blocked version of column pivoted Householder QR, organized as a "left-looking" method to minimize the number of write operations (which are more expensive than read operations on a spinning disk drive). The second method results in a so called UTV factorization which expresses a matrix $A$ as $A = U T V^*$ where $U$ and $V$ are unitary, and $T$ is triangular. This method is organized as an algorithm-by-blocks, in which floating point operations overlap read and write operations. The second method incorporates power iterations, and is exceptionally good at revealing the numerical rank; it can often be used as a substitute for a full singular value decomposition. Numerical experiments demonstrate that the new algorithms are almost as fast when processing data stored on a hard drive as traditional algorithms are for data stored in main memory. To be precise, the computational time for fully factorizing an $n\times n$ matrix scales as $cn^{3}$, with a scaling constant $c$ that is only marginally larger when the matrix is stored out of core.

preprint2020arXiv

Corrected Trapezoidal Rules for Boundary Integral Equations in Three Dimensions

The manuscript describes a quadrature rule that is designed for the high order discretization of boundary integral equations (BIEs) using the Nyström method. The technique is designed for surfaces that can naturally be parameterized using a uniform grid on a rectangle, such as deformed tori, or channels with periodic boundary conditions. When a BIE on such a geometry is discretized using the Nyström method based on the Trapezoidal quadrature rule, the resulting scheme tends to converge only slowly, due to the singularity in the kernel function. The key finding of the manuscript is that the convergence order can be greatly improved by modifying only a very small number of elements in the coefficient matrix. Specifically, it is demonstrated that by correcting only the diagonal entries in the coefficient matrix, $O(h^{3})$ convergence can be attained for the single and double layer potentials associated with both the Laplace and the Helmholtz kernels. A nine-point correction stencil leads to an $O(h^5)$ scheme. The method proposed can be viewed as a generalization of the quadrature rule of Duan and Rokhlin, which was designed for the 2D Lippmann-Schwinger equation in the plane. The techniques proposed are supported by a rigorous error analysis that relies on Wigner-type limits involving the Epstein zeta function and its parametric derivatives.