Researcher profile

Lukas Einkemmer

Lukas Einkemmer contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2023arXiv

LeXInt: Package for Exponential Integrators employing Leja interpolation

We present a publicly available software for exponential integrators that computes the $φ_l(z)$ functions using polynomial interpolation. The interpolation method at Leja points have recently been shown to be competitive with the traditionally-used Krylov subspace method. The developed framework facilitates easy adaptation into any Python software package for time integration.

preprint2022arXiv

Efficient 6D Vlasov simulation using the dynamical low-rank framework Ensign

Running kinetic simulations using grid-based methods is extremely expensive due to the up to six-dimensional phase space. Recently, it has been shown that dynamical low-rank algorithms can drastically reduce the required computational effort, while still accurately resolving important physical features such as filamentation and Landau damping. In this paper, we propose a new second order projector-splitting dynamical low-rank algorithm for the full six-dimensional Vlasov--Poisson equations. An exponential integrator based Fourier spectral method is employed to obtain a numerical scheme that is CFL condition free but still fully explicit. The resulting method is implemented with the aid of Ensign, a software framework which facilitates the efficient implementation of dynamical low-rank algorithms on modern multi-core CPU as well as GPU based systems. Its usage and features are briefly described in the paper as well. The presented numerical results demonstrate that 6D simulations can be run on a single workstation and highlight the significant speedup that can be obtained using GPUs.

preprint2022arXiv

Efficient adaptive step size control for exponential integrators

Traditional step size controllers make the tacit assumption that the cost of a time step is independent of the step size. This is reasonable for explicit and implicit integrators that use direct solvers. In the context of exponential integrators, however, an iterative approach, such as Krylov methods or polynomial interpolation, to compute the action of the required matrix functions is usually employed. In this case, the assumption of constant cost is not valid. This is, in particular, a problem for higher-order exponential integrators, which are able to take relatively large time steps based on accuracy considerations. In this paper, we consider an adaptive step size controller for exponential Rosenbrock methods that determines the step size based on the premise of minimizing computational cost. The largest allowed step size, given by accuracy considerations, merely acts as a constraint. We test this approach on a range of nonlinear partial differential equations. Our results show significant improvements (up to a factor of 4 reduction in the computational cost) over the traditional step size controller for a wide range of tolerances.

preprint2022arXiv

Exponential Integrators for Resistive Magnetohydrodynamics: Matrix-free Leja Interpolation and Efficient Adaptive Time Stepping

We propose a novel algorithm for the temporal integration of the resistive magnetohydrodynamics (MHD) equations. The approach is based on exponential Rosenbrock schemes in combination with Leja interpolation. It naturally preserves Gauss's law for magnetism and is unencumbered by the stability constraints observed for explicit methods. Remarkable progress has been achieved in designing exponential integrators and computing the required matrix functions efficiently. However, employing them in MHD simulations of realistic physical scenarios requires a matrix-free implementation. We show how an efficient algorithm based on Leja interpolation that only uses the right-hand side of the differential equation (i.e. matrix free), can be constructed. We further demonstrate that it outperforms Krylov-based exponential integrators as well as explicit and implicit methods using test models of magnetic reconnection and the Kelvin--Helmholtz instability. Furthermore, an adaptive step-size strategy that gives excellent and predictable performance, particularly in the lenient- to intermediate-tolerance regime that is often of importance in practical applications, is employed.

preprint2021arXiv

A $μ$-mode integrator for solving evolution equations in Kronecker form

In this paper, we propose a $μ$-mode integrator for computing the solution of stiff evolution equations. The integrator is based on a $d$-dimensional splitting approach and uses exact (usually precomputed) one-dimensional matrix exponentials. We show that the action of the exponentials, i.e. the corresponding batched matrix-vector products, can be implemented efficiently on modern computer systems. We further explain how $μ$-mode products can be used to compute spectral transforms efficiently even if no fast transform is available. We illustrate the performance of the new integrator by solving, among the others, three-dimensional linear and nonlinear Schrödinger equations, and we show that the $μ$-mode integrator can significantly outperform numerical methods well established in the field. We also discuss how to efficiently implement this integrator on both multi-core CPUs and GPUs. Finally, the numerical experiments show that using GPUs results in performance improvements between a factor of $10$ and $20$, depending on the problem.

preprint2020arXiv

Semi-Lagrangian Vlasov simulation on GPUs

In this paper, our goal is to efficiently solve the Vlasov equation on GPUs. A semi-Lagrangian discontinuous Galerkin scheme is used for the discretization. Such kinetic computations are extremely expensive due to the high-dimensional phase space. The SLDG code, which is publicly available under the MIT license abstracts the number of dimensions and uses a shared codebase for both GPU and CPU based simulations. We investigate the performance of the implementation on a range of both Tesla (V100, Titan V, K80) and consumer (GTX 1080 Ti) GPUs. Our implementation is typically able to achieve a performance of approximately 470 GB/s on a single GPU and 1600 GB/s on four V100 GPUs connected via NVLink. This results in a speedup of about a factor of ten (comparing a single GPU with a dual socket Intel Xeon Gold node) and approximately a factor of 35 (comparing a single node with and without GPUs). In addition, we investigate the effect of single precision computation on the performance of the SLDG code and demonstrate that a template based dimension independent implementation can achieve good performance regardless of the dimensionality of the problem.

preprint2019arXiv

A low-rank projector-splitting integrator for the Vlasov--Maxwell equations with divergence correction

The Vlasov--Maxwell equations are used for the kinetic description of magnetized plasmas. As they are posed in an up to 3+3 dimensional phase space, solving this problem is extremely expensive from a computational point of view. In this paper, we exploit the low-rank structure in the solution of the Vlasov equation. More specifically, we consider the Vlasov--Maxwell system and propose a dynamic low-rank integrator. The key idea is to approximate the dynamics of the system by constraining it to a low-rank manifold. This is accomplished by a projection onto the tangent space. There, the dynamics is represented by the low-rank factors, which are determined by solving lower-dimensional partial differential equations. The proposed scheme performs well in numerical experiments and succeeds in capturing the main features of the plasma dynamics. We demonstrate this good behavior for a range of test problems. The coupling of the Vlasov equation with the Maxwell system, however, introduces additional challenges. In particular, the divergence of the electric field resulting from Maxwell's equations is not consistent with the charge density computed from the Vlasov equation. We propose a correction based on Lagrange multipliers which enforces Gauss' law up to machine precision.

preprint2019arXiv

Performance optimization and modeling of fine-grained irregular communication in UPC

The UPC programming language offers parallelism via logically partitioned shared memory, which typically spans physically disjoint memory sub-systems. One convenient feature of UPC is its ability to automatically execute between-thread data movement, such that the entire content of a shared data array appears to be freely accessible by all the threads. The programmer friendliness, however, can come at the cost of substantial performance penalties. This is especially true when indirectly indexing the elements of a shared array, for which the induced between-thread data communication can be irregular and have a fine-grained pattern. In this paper we study performance enhancement strategies specifically targeting such fine-grained irregular communication in UPC. Starting from explicit thread privatization, continuing with block-wise communication, and arriving at message condensing and consolidation, we obtained considerable performance improvement of UPC programs that originally require fine-grained irregular communication. Besides the performance enhancement strategies, the main contribution of the present paper is to propose performance models for the different scenarios, in form of quantifiable formulas that hinge on the actual volumes of various data movements plus a small number of easily obtainable hardware characteristic parameters. These performance models help to verify the enhancements obtained, while also providing insightful predictions of similar parallel implementations, not limited to UPC, that also involve between-thread or between-process irregular communication. As a further validation, we also apply our performance modeling methodology and hardware characteristic parameters to an existing UPC code for solving a 2D heat equation on a uniform mesh.