Researcher profile

Martin Kronbichler

Martin Kronbichler contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2022arXiv

Efficient distributed matrix-free multigrid methods on locally refined meshes for FEM computations

This work studies three multigrid variants for matrix-free finite-element computations on locally refined meshes: geometric local smoothing, geometric global coarsening, and polynomial global coarsening. We have integrated the algorithms into the same framework-the open-source finite-element library deal.II-, which allows us to make fair comparisons regarding their implementation complexity, computational efficiency, and parallel scalability as well as to compare the measurements with theoretically derived performance models. Serial simulations and parallel weak and strong scaling on up to 147,456 CPU cores on 3,072 compute nodes are presented. The results obtained indicate that global coarsening algorithms show a better parallel behavior for comparable smoothers due to the better load balance particularly on the expensive fine levels. In the serial case, the costs of applying hanging-node constraints might be significant, leading to advantages of local smoothing, even though the number of solver iterations needed is slightly higher.

preprint2022arXiv

Enhancing data locality of the conjugate gradient method for high-order matrix-free finite-element implementations

This work investigates a variant of the conjugate gradient (CG) method and embeds it into the context of high-order finite-element schemes with fast matrix-free operator evaluation and cheap preconditioners like the matrix diagonal. Relying on a data-dependency analysis and appropriate enumeration of degrees of freedom, we interleave the vector updates and inner products in a CG iteration with the matrix-vector product with only minor organizational overhead. As a result, around 90% of the vector entries of the three active vectors of the CG method are transferred from slow RAM memory exactly once per iteration, with all additional access hitting fast cache memory. Node-level performance analyses and scaling studies on up to 147k cores show that the CG method with the proposed performance optimizations is around two times faster than a standard CG solver as well as optimized pipelined CG and s-step CG methods for large sizes that exceed processor caches, and provides similar performance near the strong scaling limit.

preprint2022arXiv

Stage-parallel fully implicit Runge-Kutta implementations with optimal multilevel preconditioners at the scaling limit

We present an implementation of a fully stage-parallel preconditioner for Radau IIA type fully implicit Runge--Kutta methods, which approximates the inverse of $A_Q$ from the Butcher tableau by the lower triangular matrix resulting from an LU decomposition and diagonalizes the system with as many blocks as stages. For the transformed system, we employ a block preconditioner where each block is distributed and solved by a subgroup of processes in parallel. For combination of partial results, we either use a communication pattern resembling Cannon's algorithm or shared memory. A performance model and a large set of performance studies (including strong scaling runs with up to 150k processes on 3k compute nodes) conducted for a time-dependent heat problem, using matrix-free finite element methods, indicate that the stage-parallel implementation can reach higher throughputs when the block solvers operate at lower parallel efficiencies, which occurs near the scaling limit. Achievable speedup increases linearly with number of stages and are bounded by the number of stages. Furthermore, we show that the presented stage-parallel concepts are also applicable to the case that $A_Q$ is directly diagonalized, which requires complex arithmetic or the solution of two-by-two blocks and sequentializes parts of the algorithm. Alternatively to distributing stages and assigning them to distinct processes, we discuss the possibility of batching operations from different stages together.

preprint2021arXiv

Efficient parallel 3D computation of the compressible Euler equations with an invariant-domain preserving second-order finite-element scheme

We discuss the efficient implementation of a high-performance second-order collocation-type finite-element scheme for solving the compressible Euler equations of gas dynamics on unstructured meshes. The solver is based on the convex limiting technique introduced by Guermond et al. (SIAM J. Sci. Comput. 40, A3211-A3239, 2018). As such it is invariant-domain preserving, i.e., the solver maintains important physical invariants and is guaranteed to be stable without the use of ad-hoc tuning parameters. This stability comes at the expense of a significantly more involved algorithmic structure that renders conventional high-performance discretizations challenging. We develop an algorithmic design that allows SIMD vectorization of the compute kernel, identify the main ingredients for a good node-level performance, and report excellent weak and strong scaling of a hybrid thread/MPI parallelization.

preprint2021arXiv

On the implementation of a robust and efficient finite element-based parallel solver for the compressible Navier-Stokes equations

This paper describes in detail the implementation of a finite element technique for solving the compressible Navier-Stokes equations that is provably robust and demonstrates excellent performance on modern computer hardware. The method is second-order accurate in time and space. Robustness here means that the method is proved to be invariant domain preserving under the hyperbolic CFL time step restriction, and the method delivers results that are reproducible. The proposed technique is shown to be accurate on challenging 2D and 3D realistic benchmarks.

preprint2020arXiv

A weakly compressible hybridizable discontinuous Galerkin formulation for fluid-structure interaction problems

A scheme for the solution of fluid-structure interaction (FSI) problems with weakly compressible flows is proposed in this work. A novel hybridizable discontinuous Galerkin (HDG) method is derived for the discretization of the fluid equations, while the standard continuous Galerkin (CG) approach is adopted for the structural problem. The chosen HDG solver combines robustness of discontinuous Galerkin (DG) approaches in advection-dominated flows with higher order accuracy and efficient implementations. Two coupling strategies are examined in this contribution, namely a partitioned Dirichlet-Neumann scheme in the context of hybrid HDG-CG discretizations and a monolithic approach based on Nitsche's method, exploiting the definition of the numerical flux and the trace of the solution to impose the coupling conditions. Numerical experiments show optimal convergence of the HDG and CG primal and mixed variables and superconvergence of the postprocessed fluid velocity. The robustness and the efficiency of the proposed weakly compressible formulation, in comparison to a fully incompressible one, are also highlighted on a selection of two and three dimensional FSI benchmark problems.

preprint2020arXiv

hyper.deal: An efficient, matrix-free finite-element library for high-dimensional partial differential equations

This work presents the efficient, matrix-free finite-element library hyper.deal for solving partial differential equations in two to six dimensions with high-order discontinuous Galerkin methods. It builds upon the low-dimensional finite-element library deal.II to create complex low-dimensional meshes and to operate on them individually. These meshes are combined via a tensor product on the fly and the library provides new special-purpose highly optimized matrix-free functions exploiting domain decomposition as well as shared memory via MPI-3.0 features. Both node-level performance analyses and strong/weak-scaling studies on up to 147,456 CPU cores confirm the efficiency of the implementation. Results of the library hyper.deal are reported for high-dimensional advection problems and for the solution of the Vlasov--Poisson equation in up to 6D phase space.

preprint2020arXiv

Numerical evidence of anomalous energy dissipation in incompressible Euler flows: Towards grid-converged results for the inviscid Taylor-Green problem

Providing evidence of finite-time singularities of the incompressible Euler equations in three space dimensions is still an unsolved problem. Likewise, the zeroth law of turbulence has not been proven to date by numerical experiments. We address this issue by high-resolution numerical simulations of the inviscid three-dimensional Taylor-Green vortex problem using a novel high-order discontinuous Galerkin discretization approach. Our main finding is that the kinetic energy evolution does not tend towards exact energy conservation for increasing spatial resolution of the numerical scheme, but instead converges to a solution with nonzero kinetic energy dissipation rate. This implies an energy dissipation anomaly in the absense of viscous dissipation according to Onsager's conjecture, and serves as an indication of finite-time singularities in incompressible inviscid flows. We demonstrate convergence to a dissipative solution for the three-dimensional inviscid Taylor-Green problem with a measured relative $L_2$-error of $0.27 \%$ for the temporal evolution of the kinetic energy and $3.52 \%$ for the kinetic energy dissipation rate.

preprint2020arXiv

Scalability of High-Performance PDE Solvers

Performance tests and analyses are critical to effective HPC software development and are central components in the design and implementation of computational algorithms for achieving faster simulations on existing and future computing architectures for large-scale application problems. In this paper, we explore performance and space-time trade-offs for important compute-intensive kernels of large-scale numerical solvers for PDEs that govern a wide range of physical applications. We consider a sequence of PDE- motivated bake-off problems designed to establish best practices for efficient high-order simulations across a variety of codes and platforms. We measure peak performance (degrees of freedom per second) on a fixed number of nodes and identify effective code optimization strategies for each architecture. In addition to peak performance, we identify the minimum time to solution at 80% parallel efficiency. The performance analysis is based on spectral and p-type finite elements but is equally applicable to a broad spectrum of numerical PDE discretizations, including finite difference, finite volume, and h-type finite elements.

preprint2020arXiv

The deal.II finite element library: design, features, and insights

deal.II is a state-of-the-art finite element library focused on generality, dimension-independent programming, parallelism, and extensibility. Herein, we outline its primary design considerations and its sophisticated features such as distributed meshes, $hp$-adaptivity, support for complex geometries, and matrix-free algorithms. But deal.II is more than just a software library: It is also a diverse and worldwide community of developers and users, as well as an educational platform. We therefore also discuss some of the technical and social challenges and lessons learned in running a large community software project over the course of two decades.

preprint2019arXiv

A hybridizable discontinuous Galerkin method for electromagnetics with a view on subsurface applications

Two Hybridizable Discontinuous Galerkin (HDG) schemes for the solution of Maxwell's equations in the time domain are presented. The first method is based on an electromagnetic diffusion equation, while the second is based on Faraday's and Maxwell--Ampère's laws. Both formulations include the diffusive term depending on the conductivity of the medium. The three-dimensional formulation of the electromagnetic diffusion equation in the framework of HDG methods, the introduction of the conduction current term and the choice of the electric field as hybrid variable in a mixed formulation are the key points of the current study. Numerical results are provided for validation purposes and convergence studies of spatial and temporal discretizations are carried out. The test cases include both simulation in dielectric and conductive media.

preprint2019arXiv

Hybrid multigrid methods for high-order discontinuous Galerkin discretizations

The present work develops hybrid multigrid methods for high-order discontinuous Galerkin discretizations of elliptic problems. Fast matrix-free operator evaluation on tensor product elements is used to devise a computationally efficient PDE solver. The multigrid hierarchy exploits all possibilities of geometric, polynomial, and algebraic coarsening, targeting engineering applications on complex geometries. Additionally, a transfer from discontinuous to continuous function spaces is performed within the multigrid hierarchy. This does not only further reduce the problem size of the coarse-grid problem, but also leads to a discretization most suitable for state-of-the-art algebraic multigrid methods applied as coarse-grid solver. The relevant design choices regarding the selection of optimal multigrid coarsening strategies among the various possibilities are discussed with the metric of computational costs as the driving force for algorithmic selections. We find that a transfer to a continuous function space at highest polynomial degree (or on the finest mesh), followed by polynomial and geometric coarsening, shows the best overall performance. The success of this particular multigrid strategy is due to a significant reduction in iteration counts as compared to a transfer from discontinuous to continuous function spaces at lowest polynomial degree (or on the coarsest mesh). The coarsening strategy with transfer to a continuous function space on the finest level leads to a multigrid algorithm that is robust with respect to the penalty parameter of the SIPG method. Detailed numerical investigations are conducted for a series of examples ranging from academic test cases to more complex, practically relevant geometries. Performance comparisons to state-of-the-art methods from the literature demonstrate the versatility and computational efficiency of the proposed multigrid algorithms.