Source author record

Peter Messmer

Peter Messmer appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

astro-ph.HE physics.comp-ph physics.plasm-ph

Catalog footprint

What is connected

2works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

GPU-Acceleration of the ELPA2 Distributed Eigensolver for Dense Symmetric and Hermitian Eigenproblems

The solution of eigenproblems is often a key computational bottleneck that limits the tractable system size of numerical algorithms, among them electronic structure theory in chemistry and in condensed matter physics. Large eigenproblems can easily exceed the capacity of a single compute node, thus must be solved on distributed-memory parallel computers. We here present GPU-oriented optimizations of the ELPA two-stage tridiagonalization eigensolver (ELPA2). On top of cuBLAS-based GPU offloading, we add a CUDA kernel to speed up the back-transformation of eigenvectors, which can be the computationally most expensive part of the two-stage tridiagonalization algorithm. We benchmark the performance of this GPU-accelerated eigensolver on two hybrid CPU-GPU architectures, namely a compute cluster based on Intel Xeon Gold CPUs and NVIDIA Volta GPUs, and the Summit supercomputer based on IBM POWER9 CPUs and NVIDIA Volta GPUs. Consistent with previous benchmarks on CPU-only architectures, the GPU-accelerated two-stage solver exhibits a parallel performance superior to the one-stage counterpart. Finally, we demonstrate the performance of the GPU-accelerated eigensolver developed in this work for routine semi-local KS-DFT calculations comprising thousands of atoms.

preprint2013arXiv

Apar-T: code, validation, and physical interpretation of particle-in-cell results

We present the parallel particle-in-cell (PIC) code Apar-T and, more importantly, address the fundamental question of the relations between the PIC model, the Vlasov-Maxwell theory, and real plasmas. First, we present four validation tests: spectra from simulations of thermal plasmas, linear growth rates of the relativistic tearing instability and of the filamentation instability, and non-linear filamentation merging phase. For the filamentation instability we show that the effective growth rates measured on the total energy can differ by more than 50% from the linear cold predictions and from the fastest modes of the simulation. Second, we detail a new method for initial loading of Maxwell-Jüttner particle distributions with relativistic bulk velocity and relativistic temperature, and explain why the traditional method with individual particle boosting fails. Third, we scrutinize the question of what description of physical plasmas is obtained by PIC models. These models rely on two building blocks: coarse-graining, i.e., grouping of the order of p~10^10 real particles into a single computer superparticle, and field storage on a grid with its subsequent finite superparticle size. We introduce the notion of coarse-graining dependent quantities, i.e., quantities depending on p. They derive from the PIC plasma parameter Lambda^{PIC}, which we show to scale as 1/p. We explore two implications. One is that PIC collision- and fluctuation-induced thermalization times are expected to scale with the number of superparticles per grid cell, and thus to be a factor p~10^10 smaller than in real plasmas. The other is that the level of electric field fluctuations scales as 1/Lambda^{PIC} ~ p. We provide a corresponding exact expression. Fourth, we compare the Vlasov-Maxwell theory, which describes a phase-space fluid with infinite Lambda, to the PIC model and its relatively small Lambda.