Source author record

Tobias Weinzierl

Tobias Weinzierl appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Mathematical Software Distributed, Parallel, and Cluster Computing math.NA Numerical Analysis astro-ph.CO Performance

Catalog footprint

What is connected

9works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Spherical accretion of collisional gas in modified gravity I: self-similar solutions and a new cosmological hydrodynamical code

The spherical collapse scenario has great importance in cosmology since it captures several crucial aspects of structure formation. The presence of self-similar solutions in the Einstein-de Sitter (EdS) model greatly simplifies its analysis, making it a powerful tool to gain valuable insights into the real and more complicated physical processes involved in galaxy formation. While there has been a large body of research to incorporate various additional physical processes into spherical collapse, the effect of modified gravity (MG) models, which are popular alternatives to the $ΛCDM$ paradigm to explain the cosmic acceleration, is still not well understood in this scenario. In this paper, we study the spherical accretion of collisional gas in a particular MG model, which is a rare case that also admits self-similar solutions. The model displays interesting behaviours caused by the enhanced gravity and a screening mechanism. Despite the strong effects of MG, we find that its self-similar solution agrees well with that of the EdS model. These results are used to assess a new cosmological hydrodynamical code for spherical collapse simulations introduced here, which is based on the hyperbolic partial differential equation engine ExaHyPE 2. Its good agreement with the theoretical predictions confirms the reliability of this code in modelling astrophysical processes in spherical collapse. We will use this code to study the evolution of gas in more realistic MG models in future work.

preprint2021arXiv

Task inefficiency patterns for a wave equation solver

The orchestration of complex algorithms demands high levels of automation to use modern hardware efficiently. Task-based programming with OpenMP 5.0 is a prominent candidate to accomplish this goal. We study OpenMP 5.0's tasking in the context of a wave equation solver (ExaHyPE) using three different architectures and runtimes. We describe several task-scheduling flaws present in currently available runtimes, demonstrate how they impact performance and show how to work around them. Finally, we propose extensions to the OpenMP standard.

preprint2020arXiv

Delayed approximate matrix assembly in multigrid with dynamic precisions

The accurate assembly of the system matrix is an important step in any code that solves partial differential equations on a mesh. We either explicitly set up a matrix, or we work in a matrix-free environment where we have to be able to quickly return matrix entries upon demand. Either way, the construction can become costly due to non-trivial material parameters entering the equations, multigrid codes requiring cascades of matrices that depend upon each other, or dynamic adaptive mesh refinement that necessitates the recomputation of matrix entries or the whole equation system throughout the solve. We propose that these constructions can be performed concurrently with the multigrid cycles. Initial geometric matrices and low accuracy integrations kickstart the multigrid, while improved assembly data is fed to the solver as and when it becomes available. The time to solution is improved as we eliminate an expensive preparation phase traditionally delaying the actual computation. We eliminate algorithmic latency. Furthermore, we desynchronise the assembly from the solution process. This anarchic increase of the concurrency level improves the scalability. Assembly routines are notoriously memory- and bandwidth-demanding. As we work with iteratively improving operator accuracies, we finally propose the use of a hierarchical, lossy compression scheme such that the memory footprint is brought down aggressively where the system matrix entries carry little information or are not yet available with high accuracy.

preprint2020arXiv

Enclave Tasking for Discontinuous Galerkin Methods on Dynamically Adaptive Meshes

High-order Discontinuous Galerkin (DG) methods promise to be an excellent discretisation paradigm for partial differential equation solvers by combining high arithmetic intensity with localised data access. They also facilitate dynamic adaptivity without the need for conformal meshes. A parallel evaluation of DG's weak formulation within a mesh traversal is non-trivial, as dependency graphs over dynamically adaptive meshes change, as causal constraints along resolution transitions have to be preserved, and as data sends along MPI domain boundaries have to be triggered in the correct order. We propose to process mesh elements subject to constraints with high priority or, where needed, serially throughout a traversal. The remaining cells form enclaves and are spawned into a task system. This introduces concurrency, mixes memory-intensive DG integrations with compute-bound Riemann solves, and overlaps computation and communication. We discuss implications on MPI and show that MPI parallelisation improves by a factor of three through enclave tasking, while we obtain an additional factor of two from shared memory if grids are dynamically adaptive.

preprint2020arXiv

ExaHyPE: An Engine for Parallel Dynamically Adaptive Simulations of Wave Problems

ExaHyPE ("An Exascale Hyperbolic PDE Engine") is a software engine for solving systems of first-order hyperbolic partial differential equations (PDEs). Hyperbolic PDEs are typically derived from the conservation laws of physics and are useful in a wide range of application areas. Applications powered by ExaHyPE can be run on a student's laptop, but are also able to exploit thousands of processor cores on state-of-the-art supercomputers. The engine is able to dynamically increase the accuracy of the simulation using adaptive mesh refinement where required. Due to the robustness and shock capturing abilities of ExaHyPE's numerical methods, users of the engine can simulate linear and non-linear hyperbolic PDEs with very high accuracy. Users can tailor the engine to their particular PDE by specifying evolved quantities, fluxes, and source terms. A complete simulation code for a new hyperbolic PDE can often be realised within a few hours - a task that, traditionally, can take weeks, months, often years for researchers starting from scratch. In this paper, we showcase ExaHyPE's workflow and capabilities through real-world scenarios from our two main application areas: seismology and astrophysics.

preprint2020arXiv

Lightweight Task Offloading Exploiting MPI Wait Times for Parallel Adaptive Mesh Refinement

Balancing the workload of sophisticated simulations is inherently difficult, since we have to balance both computational workload and memory footprint over meshes that can change any time or yield unpredictable cost per mesh entity, while modern supercomputers and their interconnects start to exhibit fluctuating performance. We propose a novel lightweight balancing technique for MPI+X to accompany traditional, prediction-based load balancing. It is a reactive diffusion approach that uses online measurements of MPI idle time to migrate tasks temporarily from overloaded to underemployed ranks. Tasks are deployed to ranks which otherwise would wait, processed with high priority, and made available to the overloaded ranks again. This migration is non-persistent. Our approach hijacks idle time to do meaningful work and is totally non-blocking, asynchronous and distributed without a global data view. Tests with a seismic simulation code developed in the ExaHyPE engine uncover the method's potential. We found speed-ups of up to 2-3 for ill-balanced scenarios without logical modifications of the code base and show that the strategy is capable to react quickly to temporarily changing workload or node performance.

preprint2020arXiv

Stabilised Asynchronous Fast Adaptive Composite Multigrid using Additive Damping

Multigrid solvers face multiple challenges on parallel computers. Two fundamental ones read as follows: Multiplicative solvers issue coarse grid solves which exhibit low concurrency and many multigrid implementations suffer from an expensive coarse grid identification phase plus adaptive mesh refinement overhead. We propose a new additive multigrid variant for spacetrees, i.e. meshes as they are constructed from octrees and quadtrees: It is an additive scheme, i.e. all multigrid resolution levels are updated concurrently. This ensures a high concurrency level, while the transfer operators between the mesh levels can still be constructed algebraically. The novel flavour of the additive scheme is an augmentation of the solver with an additive, auxiliary damping parameter per grid level per vertex that is in turn constructed through the next coarser level---an idea which utilises smoothed aggregation principles or the motivation behind AFACx: Per level, we solve an additional equation whose purpose is to damp too aggressive solution updates per vertex which would otherwise, in combination with all the other levels, yield an overcorrection and, eventually, oscillations. This additional equation is constructed additively as well, i.e. is once more solved concurrently to all other equations. This yields improved stability, closer to what is seen with multiplicative schemes, while pipelining techniques help us to write down the additive solver with single-touch semantics for dynamically adaptive meshes.

preprint2020arXiv

TeaMPI -- Replication-based Resilience without the (Performance) Pain

In an era where we can not afford to checkpoint frequently, replication is a generic way forward to construct numerical simulations that can continue to run even if hardware parts fail. Yet, replication often is not employed on larger scales, as naïvely mirroring a computation once effectively halves the machine size, and as keeping replicated simulations consistent with each other is not trivial. We demonstrate for the ExaHyPE engine -- a task-based solver for hyperbolic equation systems -- that it is possible to realise resiliency without major code changes on the user side, while we introduce a novel algorithmic idea where replication reduces the time-to-solution. The redundant CPU cycles are not burned "for nothing". Our work employs a weakly consistent data model where replicas run independently yet inform each other through heartbeat messages whether they are still up and running. Our key performance idea is to let the tasks of the replicated simulations share some of their outcomes, while we shuffle the actual task execution order per replica. This way, replicated ranks can skip some local computations and automatically start to synchronise with each other. Our experiments with a production-level seismic wave-equation solver provide evidence that this novel concept has the potential to make replication affordable for large-scale simulations in high-performance computing.

preprint2016arXiv

Form Follows Function -- Do algorithms and applications challenge or drag behind the hardware evolution?

We summarise some of the key statements made at the workshop Form Follows Function at ISC High Performance 2016. The summary highlights what type of co-design the presented projects experience; often in the absence of an explicit co-design agenda. Their software development picks up hardware trends but it also influences the hardware development. Observations illustrate that this cycle not always is optimal for both sides as it is not proactively steered. Key statements characterise ideas how it might be possible to integrate both hardware and software creation closer to the best of both worlds---again even without classic co-design in mind where new pieces of hardware are created. The workshop finally identified three development idioms that might help to improve software and system design with respect to emerging hardware.

Tobias Weinzierl

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Spherical accretion of collisional gas in modified gravity I: self-similar solutions and a new cosmological hydrodynamical code

Task inefficiency patterns for a wave equation solver

Delayed approximate matrix assembly in multigrid with dynamic precisions

Enclave Tasking for Discontinuous Galerkin Methods on Dynamically Adaptive Meshes

ExaHyPE: An Engine for Parallel Dynamically Adaptive Simulations of Wave Problems

Lightweight Task Offloading Exploiting MPI Wait Times for Parallel Adaptive Mesh Refinement

Stabilised Asynchronous Fast Adaptive Composite Multigrid using Additive Damping

TeaMPI -- Replication-based Resilience without the (Performance) Pain

Form Follows Function -- Do algorithms and applications challenge or drag behind the hardware evolution?