Source author record

Andreas Frommer

Andreas Frommer appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

hep-lat math.NA Numerical Analysis Distributed, Parallel, and Cluster Computing math-ph math.MP

Catalog footprint

What is connected

18works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Performance-Portable Optimization and Analysis of Multiple Right-Hand Sides in a Lattice QCD Solver

Managing the high computational cost of iterative solvers for sparse linear systems is a known challenge in scientific computing. Moreover, scientific applications often face memory bandwidth constraints, making it critical to optimize data locality and enhance the efficiency of data transport. We extend the lattice QCD solver DD-$α$AMG to incorporate multiple right-hand sides (rhs) for both the Wilson-Dirac operator evaluation and the GMRES solver, with and without odd-even preconditioning. To optimize auto-vectorization, we introduce a flexible interface that supports various data layouts and implement a new data layout for better SIMD utilization. We evaluate our optimizations on both x86 and Arm clusters, demonstrating performance portability with similar speedups. A key contribution of this work is the performance analysis of our optimizations, which reveals the complexity introduced by architectural constraints and compiler behavior. Additionally, we explore different implementations leveraging a new matrix instruction set for Arm called SME and provide an early assessment of its potential benefits.

preprint2022arXiv

A flexible short recurrence Krylov subspace method for matrices arising in the time integration of port Hamiltonian systems and ODEs/DAEs with a dissipative Hamiltonian

For several classes of mathematical models that yield linear systems, the splitting of the matrix into its Hermitian and skew Hermitian parts is naturally related to properties of the underlying model. This is particularly so for discretizations of dissipative Hamiltonian ODEs, DAEs and port Hamiltonian systems where, in addition, the Hermitian part is positive definite or semi-definite. It is then possible to develop short recurrence optimal Krylov subspace methods in which the Hermitian part is used as a preconditioner. In this paper we develop new, right preconditioned variants of this approach which as their crucial new feature allow the systems with the Hermitian part to be solved only approximately in each iteration while keeping the short recurrences. This new class of methods is particularly efficient as it allows, for example, to use few steps of a multigrid solver or a (preconditioned) CG method for the Hermitian part in each iteration. We illustrate this with several numerical experiments for large scale systems.

preprint2022arXiv

On the Convergence of Randomized and Greedy Relaxation Schemes for Solving Nonsingular Linear Systems of Equations

We extend results known for the randomized Gauss-Seidel and the Gauss-Southwell methods for the case of a Hermitian and positive definite matrix to certain classes of non-Hermitian matrices. We obtain convergence results for a whole range of parameters describing the probabilities in the randomized method or the greedy choice strategy in the Gauss-Southwell-type methods. We identify those choices which make our convergence bounds best possible. Our main tool is to use weighted l1-norms to measure the residuals. A major result is that the best convergence bounds that we obtain for the expected values in the randomized algorithm are as good as the best for the deterministic, but more costly algorithms of Gauss-Southwell type. Numerical experiments illustrate the convergence of the method and the bounds obtained. Comparisons with the randomized Kaczmarz method are also presented.

preprint2021arXiv

Analysis of probing techniques for sparse approximation and trace estimation of decaying matrix functions

The computation of matrix functions $f(A)$, or related quantities like their trace, is an important but challenging task, in particular for large and sparse matrices $A$. In recent years, probing methods have become an often considered tool in this context, as they allow to replace the computation of $f(A)$ or $\text{tr}(f(A))$ by the evaluation of (a small number of) quantities of the form $f(A)v$ or $v^Tf(A)v$, respectively. These tasks can then efficiently be solved by standard techniques like, e.g., Krylov subspace methods. It is well-known that probing methods are particularly efficient when $f(A)$ is approximately sparse, e.g., when the entries of $f(A)$ show a strong off-diagonal decay, but a rigorous error analysis is lacking so far. In this paper we develop new theoretical results on the existence of sparse approximations for $f(A)$ and error bounds for probing methods based on graph colorings. As a by-product, by carefully inspecting the proofs of these error bounds, we also gain new insights into when to stop the Krylov iteration used for approximating $f(A)v$ or $v^Tf(A)v$, thus allowing for a practically efficient implementation of the probing methods.

preprint2020arXiv

Asynchronous Richardson iterations

We consider asynchronous versions of the first and second order Richardson methods for solving linear systems of equations. These methods depend on parameters whose values are chosen a priori. We explore the parameter values that can be proven to give convergence of the asynchronous methods. This is the first such analysis for asynchronous second order methods. We find that for the first order method, the optimal parameter value for the synchronous case also gives an asynchronously convergent method. For the second order method, the parameter ranges for which we can prove asynchronous convergence do not contain the optimal parameter values for the synchronous iteration. In practice, however, the asynchronous second order iterations may still converge using the optimal parameter values, or parameter values close to the optimal ones, despite this result. We explore this behavior with a multithreaded parallel implementation of the asynchronous methods.

preprint2016arXiv

Adaptive Aggregation-based Domain Decomposition Multigrid for Twisted Mass Fermions

The Adaptive Aggregation-based Domain Decomposition Multigrid method (arXiv:1303.1377) is extended for two degenerate flavors of twisted mass fermions. By fine-tuning the parameters we achieve a speed-up of the order of hundred times compared to the conjugate gradient algorithm for the physical value of the pion mass. A thorough analysis of the aggregation parameters is presented, which provides a novel insight into multigrid methods for lattice QCD independently of the fermion discretization.

preprint2016arXiv

DDalphaAMG for Twisted Mass Fermions

We present the Adaptive Aggregation-based Domain Decomposition Multigrid method extended to the twisted mass fermion discretization action. We show comparisons of results as a function of tuning the parameters that enter the twisted mass version of the DDalphaAMG library (https://github.com/sbacchio/DDalphaAMG). Moreover, we linked the DDalphaAMG library to the tmLQCD software package and give details on the performance of the multigrid solver during HMC simulations at the physical point.

preprint2015arXiv

(Approximate) Low-Mode Averaging with a new Multigrid Eigensolver

We present a multigrid based eigensolver for computing low-modes of the Hermitian Wilson Dirac operator. For the non-Hermitian case multigrid methods have already replaced conventional Krylov subspace solvers in many lattice QCD computations. Since the $γ_5$-preserving aggregation based interpolation used in our multigrid method is valid for both, the Hermitian and the non-Hermitian case, inversions of very ill-conditioned shifted systems with the Hermitian operator become feasible. This enables the use of multigrid within shift-and-invert type eigensolvers. We show numerical results from our MPI-C implementation of a Rayleigh quotient iteration with multigrid. For state-of-the-art lattice sizes and moderate numbers of desired low-modes we achieve speed-ups of an order of magnitude and more over PARPACK. We show results and develop strategies how to make use of our eigensolver for calculating disconnected contributions to hadronic quantities that are noisy and still computationally challenging. Here, we explore the possible benefits, using our eigensolver for low-mode averaging and related methods with high and low accuracy eigenvectors. We develop a low-mode averaging type method using only a few of the smallest eigenvectors with low accuracy. This allows us to avoid expensive exact eigensolves, still benefitting from reduced statistical errors.

preprint2015arXiv

On short recurrence Krylov type methods for linear systems with many right-hand sides

Block and global Krylov subspace methods have been proposed as methods adapted to the situation where one iteratively solves systems with the same matrix and several right hand sides. These methods are advantageous, since they allow to cast the major part of the arithmetic in terms of matrix-block vector products, and since, in the block case, they take their iterates from a potentially richer subspace. In this paper we consider the most established Krylov subspace methods which rely on short recurrencies, i.e. BiCG, QMR and BiCGStab. We propose modifications of their block variants which increase numerical stability, thus at least partly curing a problem previously observed by several authors. Moreover, we develop modifications of the "global" variants which almost halve the number of matrix-vector multiplications. We present a discussion as well as numerical evidence which both indicate that the additional work present in the block methods can be substantial, and that the new "economic" versions of the "global" BiCG and QMR method can be considered as good alternatives to the BiCGStab variants.

preprint2014arXiv

Adaptive Aggregation Based Domain Decomposition Multigrid for the Lattice Wilson Dirac Operator

In lattice QCD computations a substantial amount of work is spent in solving discretized versions of the Dirac equation. Conventional Krylov solvers show critical slowing down for large system sizes and physically interesting parameter regions. We present a domain decomposition adaptive algebraic multigrid method used as a precondtioner to solve the "clover improved" Wilson discretization of the Dirac equation. This approach combines and improves two approaches, namely domain decomposition and adaptive algebraic multigrid, that have been used seperately in lattice QCD before. We show in extensive numerical test conducted with a parallel production code implementation that considerable speed-up over conventional Krylov subspace methods, domain decomposition methods and other hierarchical approaches for realistic system sizes can be achieved.

preprint2014arXiv

Multigrid Preconditioning for the Overlap Operator in Lattice QCD

The overlap operator is a lattice discretization of the Dirac operator of quantum chromodynamics, the fundamental physical theory of the strong interaction between the quarks. As opposed to other discretizations it preserves the important physical property of chiral symmetry, at the expense of requiring much more effort when solving systems with this operator. We present a preconditioning technique based on another lattice discretization, the Wilson-Dirac operator. The mathematical analysis precisely describes the effect of this preconditioning in the case that the Wilson-Dirac operator is normal. Although this is not exactly the case in realistic settings, we show that current smearing techniques indeed drive the Wilson-Dirac operator towards normality, thus providing a motivation why our preconditioner works well in computational practice. Results of numerical experiments in physically relevant settings show that our preconditioning yields accelerations of up to one order of magnitude.

preprint2012arXiv

A CG Method for Multiple Right Hand Sides and Multiple Shifts in Lattice QCD Calculations

We consider the task of computing solutions of linear systems that only differ by a shift with the identity matrix as well as linear systems with several different right hand sides. In the past Krylov subspace methods have been developed which exploit either the need for solutions to multiple right hand sides (e.g. deflation type methods and block methods) or multiple shifts (e.g. shifted CG) with some success. In this paper we present a block Krylov subspace method which, based on a block Lanczos process, exploits both features - shifts and multiple right hand sides - at once. Such situations arise, for example, in lattice QCD simulations within the Rational Hybrid Monte Carlo algorithm. We give numerical evidence that our method is superior to applying other iterative methods to each of the systems individually as well as, in some cases, to shifted or block Krylov subspace methods.

preprint2012arXiv

Aggregation-based Multilevel Methods for Lattice QCD

In Lattice QCD computations a substantial amount of work is spent in solving the Dirac equation. In the recent past it has been observed that conventional Krylov solvers tend to critically slow down for large lattices and small quark masses. We present a Schwarz alternating procedure (SAP) multilevel method as a solver for the Clover improved Wilson discretization of the Dirac equation. This approach combines two components (SAP and algebraic multigrid) that have separately been used in lattice QCD before. In combination with a bootstrap setup procedure we show that considerable speed-up over conventional Krylov subspace methods for realistic configurations can be achieved.

preprint2012arXiv

Deflation and Flexible SAP-Preconditioning of GMRES in Lattice QCD Simulation

The simulation of lattice QCD on massively parallel computers stimulated the development of scalable algorithms for the solution of sparse linear systems. We tackle the problem of the Wilson-Dirac operator inversion by combining a Schwarz alternating procedure (SAP) in multiplicative form with a flexible variant of the GMRES-DR algorithm. We show that restarted GMRES is not able to converge when the system is poorly conditioned. By adding deflation in the form of the FGMRES-DR algorithm, an important fraction of the information produced by the iterates is kept between successive restarts leading to convergence in cases in which FGMRES stagnates.

preprint2011arXiv

Error Bounds for the Sign Function

The Overlap operator fulfills the Ginsparg-Wilson relation exactly and therefore represents an optimal discretization of the QCD Dirac operator with respect to chiral symmetry. When computing propagators or in HMC simulations, where one has to invert the overlap operator using some iterative solver, one has to approxomate the action of the sign function of the (symmetrized) Wilson fermion matrix Q on a vector b in each iteration. This is usually done iteratively using a "primary" Lanczos iteration. In this process, it is very important to have good stopping criteria which allow to reliably assess the quality of the approximation to the action of the sign function computed so far. In this work we show how to cheaply recover a secondary Lanczos process, starting at an arbitrary Lanczos vector of the primary process and how to use this secondary process to efficiently obtain computable error estimates and error bounds for the Lanczos approximations to sign(Q)b, where the sign function is approximated by the Zolotarev rational approximation.

preprint2010arXiv

Short-recurrence Krylov subspace methods for the overlap Dirac operator at nonzero chemical potential

The overlap operator in lattice QCD requires the computation of the sign function of a matrix, which is non-Hermitian in the presence of a quark chemical potential. In previous work we introduced an Arnoldi-based Krylov subspace approximation, which uses long recurrences. Even after the deflation of critical eigenvalues, the low efficiency of the method restricts its application to small lattices. Here we propose new short-recurrence methods which strongly enhance the efficiency of the computational method. Using rational approximations to the sign function we introduce two variants, based on the restarted Arnoldi process and on the two-sided Lanczos method, respectively, which become very efficient when combined with multishift solvers. Alternatively, in the variant based on the two-sided Lanczos method the sign function can be evaluated directly. We present numerical results which compare the efficiencies of a restarted Arnoldi-based method and the direct two-sided Lanczos approximation for various lattice sizes. We also show that our new methods gain substantially when combined with deflation.

preprint2009arXiv

Krylov subspace methods and the sign function: multishifts and deflation in the non-Hermitian case

Rational approximations of the matrix sign function lead to multishift methods. For non-Hermitian matrices long recurrences can cause storage problems, which can be circumvented with restarts. Together with deflation we obtain efficient iterative methods, as we show in numerical experiments for the overlap Dirac operator at non-vanishing quark chemical potential for lattices up to size 10^4.

preprint1995arXiv

Many Masses on One Stroke: Economic Computation of Quark Propagators

The computational effort in the calculation of Wilson fermion quark propagators in Lattice Quantum Chromodynamics can be considerably reduced by exploiting the Wilson fermion matrix structure in inversion algorithms based on the non-symmetric Lanczos process. We consider two such methods: QMR (quasi minimal residual) and BCG (biconjugate gradients). Based on the decomposition $M/κ={\bf 1}/κ-D$ of the Wilson mass matrix, using QMR, one can carry out inversions on a {\em whole} trajectory of masses simultaneously, merely at the computational expense of a single propagator computation. In other words, one has to compute the propagator corresponding to the lightest mass only, while all the heavier masses are given for free, at the price of extra storage. Moreover, the symmetry $γ_5\, M= M^{\dagger}\,γ_5$ can be used to cut the computational effort in QMR and BCG by a factor of two. We show that both methods then become---in the critical regime of small quark masses---competitive to BiCGStab and significantly better than the standard MR method, with optimal relaxation factor, and CG as applied to the normal equations.

Andreas Frommer

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Performance-Portable Optimization and Analysis of Multiple Right-Hand Sides in a Lattice QCD Solver

A flexible short recurrence Krylov subspace method for matrices arising in the time integration of port Hamiltonian systems and ODEs/DAEs with a dissipative Hamiltonian

On the Convergence of Randomized and Greedy Relaxation Schemes for Solving Nonsingular Linear Systems of Equations

Analysis of probing techniques for sparse approximation and trace estimation of decaying matrix functions

Asynchronous Richardson iterations

Adaptive Aggregation-based Domain Decomposition Multigrid for Twisted Mass Fermions

DDalphaAMG for Twisted Mass Fermions

(Approximate) Low-Mode Averaging with a new Multigrid Eigensolver

On short recurrence Krylov type methods for linear systems with many right-hand sides

Adaptive Aggregation Based Domain Decomposition Multigrid for the Lattice Wilson Dirac Operator

Multigrid Preconditioning for the Overlap Operator in Lattice QCD

A CG Method for Multiple Right Hand Sides and Multiple Shifts in Lattice QCD Calculations

Aggregation-based Multilevel Methods for Lattice QCD

Deflation and Flexible SAP-Preconditioning of GMRES in Lattice QCD Simulation

Error Bounds for the Sign Function

Short-recurrence Krylov subspace methods for the overlap Dirac operator at nonzero chemical potential

Krylov subspace methods and the sign function: multishifts and deflation in the non-Hermitian case

Many Masses on One Stroke: Economic Computation of Quark Propagators