Researcher profile

Thomas Sterling

Thomas Sterling contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - Baseline
5works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2020arXiv

Fully-Asynchronous Fully-Implicit Variable-Order Variable-Timestep Simulation of Neural Networks

State-of-the-art simulations of detailed neural models follow the Bulk Synchronous Parallel execution model. Execution is divided in equidistant communication intervals, equivalent to the shortest synaptic delay in the network. Neurons stepping is performed independently, with collective communication guiding synchronization and exchange of synaptic events. The interpolation step size is fixed and chosen based on some prior knowledge of the fastest possible dynamics in the system. However, simulations driven by stiff dynamics or a wide range of time scales - such as multiscale simulations of neural networks - struggle with fixed step interpolation methods, yielding excessive computation of intervals of quasi-constant activity, inaccurate interpolation of periods of high volatility solution, and being incapable of handling unknown or distinct time constants. A common alternative is the usage of adaptive stepping methods, however they have been deemed inefficient in parallel executions due to computational load imbalance at the synchronization barriers that characterize the BSP execution model. We introduce a distributed fully-asynchronous execution model that removes global synchronization, allowing for longer variable timestep interpolations. Asynchronicity is provided by active point-to-point communication notifying neurons' time advancement to synaptic connectivities. Time stepping is driven by scheduled neuron advancements based on synaptic delays across neurons, yielding an "exhaustive yet not speculative" adaptive-step execution. Execution benchmarks on 64 Cray XE6 compute nodes demonstrate a reduced number of interpolation steps, higher numerical accuracy and lower time to solution, compared to state-of-the-art methods. Efficiency is shown to be activity-dependent, with scaling of the algorithm demonstrated on a simulation of a laboratory experiment.

preprint2012arXiv

Neutron Star Evolutions using Tabulated Equations of State with a New Execution Model

The addition of nuclear and neutrino physics to general relativistic fluid codes allows for a more realistic description of hot nuclear matter in neutron star and black hole systems. This additional microphysics requires that each processor have access to large tables of data, such as equations of state, and in large simulations the memory required to store these tables locally can become excessive unless an alternative execution model is used. In this work we present relativistic fluid evolutions of a neutron star obtained using a message driven multi-threaded execution model known as ParalleX. These neutron star simulations would require substantial memory overhead dedicated entirely to the equation of state table if using a more traditional execution model. We introduce a ParalleX component based on Futures for accessing large tables of data, including out-of-core sized tables, which does not require substantial memory overhead and effectively hides any increased network latency.

preprint2011arXiv

Adaptive Mesh Refinement for Astrophysics Applications with ParalleX

Several applications in astrophysics require adequately resolving many physical and temporal scales which vary over several orders of magnitude. Adaptive mesh refinement techniques address this problem effectively but often result in constrained strong scaling performance. The ParalleX execution model is an experimental execution model that aims to expose new forms of program parallelism and eliminate any global barriers present in a scaling-impaired application such as adaptive mesh refinement. We present two astrophysics applications using the ParalleX execution model: a tabulated equation of state component for neutron star evolutions and a cosmology model evolution. Performance and strong scaling results from both simulations are presented. The tabulated equation of state data are distributed with transparent access over the nodes of the cluster. This allows seamless overlapping of computation with the latencies introduced by the remote access to the table. Because of the expected size increases to the equation of state table, this type of table partitioning for neutron star simulations is essential while the implementation is greatly simplified by ParalleX semantics.

preprint2011arXiv

An Application Driven Analysis of the ParalleX Execution Model

Exascale systems, expected to emerge by the end of the next decade, will require the exploitation of billion-way parallelism at multiple hierarchical levels in order to achieve the desired sustained performance. The task of assessing future machine performance is approached by identifying the factors which currently challenge the scalability of parallel applications. It is suggested that the root cause of these challenges is the incoherent coupling between the current enabling technologies, such as Non-Uniform Memory Access of present multicore nodes equipped with optional hardware accelerators and the decades older execution model, i.e., the Communicating Sequential Processes (CSP) model best exemplified by the message passing interface (MPI) application programming interface. A new execution model, ParalleX, is introduced as an alternative to the CSP model. In this paper, an overview of the ParalleX execution model is presented along with details about a ParalleX-compliant runtime system implementation called High Performance ParalleX (HPX). Scaling and performance results for an adaptive mesh refinement numerical relativity application developed using HPX are discussed. The performance results of this HPX-based application are compared with a counterpart MPI-based mesh refinement code. The overheads associated with HPX are explored and hardware solutions are introduced for accelerating the runtime system.

preprint2011arXiv

Improving the scalability of parallel N-body applications with an event driven constraint based execution model

The scalability and efficiency of graph applications are significantly constrained by conventional systems and their supporting programming models. Technology trends like multicore, manycore, and heterogeneous system architectures are introducing further challenges and possibilities for emerging application domains such as graph applications. This paper explores the space of effective parallel execution of ephemeral graphs that are dynamically generated using the Barnes-Hut algorithm to exemplify dynamic workloads. The workloads are expressed using the semantics of an Exascale computing execution model called ParalleX. For comparison, results using conventional execution model semantics are also presented. We find improved load balancing during runtime and automatic parallelism discovery improving efficiency using the advanced semantics for Exascale computing.