Researcher profile

Benjamin Hazelwood

Benjamin Hazelwood contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2020arXiv

Enclave Tasking for Discontinuous Galerkin Methods on Dynamically Adaptive Meshes

High-order Discontinuous Galerkin (DG) methods promise to be an excellent discretisation paradigm for partial differential equation solvers by combining high arithmetic intensity with localised data access. They also facilitate dynamic adaptivity without the need for conformal meshes. A parallel evaluation of DG's weak formulation within a mesh traversal is non-trivial, as dependency graphs over dynamically adaptive meshes change, as causal constraints along resolution transitions have to be preserved, and as data sends along MPI domain boundaries have to be triggered in the correct order. We propose to process mesh elements subject to constraints with high priority or, where needed, serially throughout a traversal. The remaining cells form enclaves and are spawned into a task system. This introduces concurrency, mixes memory-intensive DG integrations with compute-bound Riemann solves, and overlaps computation and communication. We discuss implications on MPI and show that MPI parallelisation improves by a factor of three through enclave tasking, while we obtain an additional factor of two from shared memory if grids are dynamically adaptive.

preprint2020arXiv

TeaMPI -- Replication-based Resilience without the (Performance) Pain

In an era where we can not afford to checkpoint frequently, replication is a generic way forward to construct numerical simulations that can continue to run even if hardware parts fail. Yet, replication often is not employed on larger scales, as naïvely mirroring a computation once effectively halves the machine size, and as keeping replicated simulations consistent with each other is not trivial. We demonstrate for the ExaHyPE engine -- a task-based solver for hyperbolic equation systems -- that it is possible to realise resiliency without major code changes on the user side, while we introduce a novel algorithmic idea where replication reduces the time-to-solution. The redundant CPU cycles are not burned "for nothing". Our work employs a weakly consistent data model where replicas run independently yet inform each other through heartbeat messages whether they are still up and running. Our key performance idea is to let the tasks of the replicated simulations share some of their outcomes, while we shuffle the actual task execution order per replica. This way, replicated ranks can skip some local computations and automatically start to synchronise with each other. Our experiments with a production-level seismic wave-equation solver provide evidence that this novel concept has the potential to make replication affordable for large-scale simulations in high-performance computing.