Source author record

Pedro Gonnet

Pedro Gonnet appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing astro-ph.IM math.NA Numerical Analysis astro-ph.CO Mathematical Software Computation and Language Data Structures and Algorithms Machine Learning physics.comp-ph

Catalog footprint

What is connected

10works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

A Hybrid MPI+Threads Approach to Particle Group Finding Using Union-Find

The Friends-of-Friends (FoF) algorithm is a standard technique used in cosmological $N$-body simulations to identify structures. Its goal is to find clusters of particles (called groups) that are separated by at most a cut-off radius. $N$-body simulations typically use most of the memory present on a node, leaving very little free for a FoF algorithm to run on-the-fly. We propose a new method that utilises the common Union-Find data structure and a hybrid MPI+threads approach. The algorithm can also be expressed elegantly in a task-based formalism if such a framework is used in the rest of the application. We have implemented our algorithm in the open-source cosmological code, SWIFT. Our implementation displays excellent strong- and weak-scaling behaviour on realistic problems and compares favourably (speed-up of 18x) over other methods commonly used in the $N$-body community.

preprint2020arXiv

Fast Multi-language LSTM-based Online Handwriting Recognition

We describe an online handwriting system that is able to support 102 languages using a deep neural network architecture. This new system has completely replaced our previous Segment-and-Decode-based system and reduced the error rate by 20%-40% relative for most languages. Further, we report new state-of-the-art results on IAM-OnDB for both the open and closed dataset setting. The system combines methods from sequence recognition with a new input encoding using Bézier curves. This leads to up to 10x faster recognition times compared to our previous system. Through a series of experiments we determine the optimal configuration of our models and report the results of our setup on a number of additional public datasets.

preprint2016arXiv

QuickSched: Task-based parallelism with dependencies and conflicts

This paper describes QuickSched, a compact and efficient Open-Source C-language library for task-based shared-memory parallel programming. QuickSched extends the standard dependency-only scheme of task-based programming with the concept of task conflicts, i.e.~sets of tasks that can be executed in any order, yet not concurrently. These conflicts are modelled using exclusively lockable hierarchical resources. The scheduler itself prioritizes tasks along the critical path of execution and is shown to perform and scale well on a 64-core parallel shared-memory machine for two example problems: A tiled QR decomposition and a task-based Barnes-Hut tree code.

preprint2016arXiv

SWIFT: Using task-based parallelism, fully asynchronous communication, and graph partition-based domain decomposition for strong scaling on more than 100,000 cores

We present a new open-source cosmological code, called SWIFT, designed to solve the equations of hydrodynamics using a particle-based approach (Smooth Particle Hydrodynamics) on hybrid shared/distributed-memory architectures. SWIFT was designed from the bottom up to provide excellent strong scaling on both commodity clusters (Tier-2 systems) and Top100-supercomputers (Tier-0 systems), without relying on architecture-specific features or specialized accelerator hardware. This performance is due to three main computational approaches: (1) Task-based parallelism for shared-memory parallelism, which provides fine-grained load balancing and thus strong scaling on large numbers of cores. (2) Graph-based domain decomposition, which uses the task graph to decompose the simulation domain such that the work, as opposed to just the data, as is the case with most partitioning schemes, is equally distributed across all nodes. (3) Fully dynamic and asynchronous communication, in which communication is modelled as just another task in the task-based scheme, sending data whenever it is ready and deferring on tasks that rely on data from other nodes until it arrives. In order to use these approaches, the code had to be re-written from scratch, and the algorithms therein adapted to the task-based paradigm. As a result, we can show upwards of 60% parallel efficiency for moderate-sized problems when increasing the number of cores 512-fold, on both x86-based and Power8-based architectures.

preprint2015arXiv

SWIFT: task-based hydrodynamics and gravity for cosmological simulations

Simulations of galaxy formation follow the gravitational and hydrodynamical interactions between gas, stars and dark matter through cosmic time. The huge dynamic range of such calculations severely limits strong scaling behaviour of the community codes in use, with load-imbalance, cache inefficiencies and poor vectorisation limiting performance. The new swift code exploits task-based parallelism designed for many-core compute nodes interacting via MPI using asynchronous communication to improve speed and scaling. A graph-based domain decomposition schedules interdependent tasks over available resources. Strong scaling tests on realistic particle distributions yield excellent parallel efficiency, and efficient cache usage provides a large speed-up compared to current codes even on a single core. SWIFT is designed to be easy to use by shielding the astronomer from computational details such as the construction of the tasks or MPI communication. The techniques and algorithms used in SWIFT may benefit other computational physics areas as well, for example that of compressible hydrodynamics. For details of this open-source project, see www.swiftsim.com

preprint2014arXiv

Efficient and Scalable Algorithms for Smoothed Particle Hydrodynamics on Hybrid Shared/Distributed-Memory Architectures

This paper describes a new fast and implicitly parallel approach to neighbour-finding in multi-resolution Smoothed Particle Hydrodynamics (SPH) simulations. This new approach is based on hierarchical cell decompositions and sorted interactions, within a task-based formulation. It is shown to be faster than traditional tree-based codes, and to scale better than domain decomposition-based approaches on hybrid shared/distributed-memory parallel architectures, e.g. clusters of multi-cores, achieving a $40\times$ speedup over the Gadget-2 simulation code.

preprint2013arXiv

SWIFT: Fast algorithms for multi-resolution SPH on multi-core architectures

This paper describes a novel approach to neighbour-finding in Smoothed Particle Hydrodynamics (SPH) simulations with large dynamic range in smoothing length. This approach is based on hierarchical cell decompositions, sorted interactions, and a task-based formulation. It is shown to be faster than traditional tree-based codes, and to scale better than domain decomposition-based approaches on shared-memory parallel architectures such as multi-cores.

preprint2010arXiv

A Review of Error Estimation in Adaptive Quadrature

The most critical component of any adaptive numerical quadrature routine is the estimation of the integration error. Since the publication of the first algorithms in the 1960s, many error estimation schemes have been presented, evaluated and discussed. This paper presents a review of existing error estimation techniques and discusses their differences and their common features. Some common shortcomings of these algorithms are discussed and a new general error estimation technique is presented.

preprint2010arXiv

Efficient Construction, Update and Downdate Of The Coefficients Of Interpolants Based On Polynomials Satisfying A Three-Term Recurrence Relation

In this paper, we consider methods to compute the coefficients of interpolants relative to a basis of polynomials satisfying a three-term recurrence relation. Two new algorithms are presented: the first constructs the coefficients of the interpolation incrementally and can be used to update the coefficients whenever a nodes is added to or removed from the interpolation. The second algorithm, which constructs the interpolation coefficients by decomposing the Vandermonde-like matrix iteratively, can not be used to update or downdate an interpolation, yet is more numerically stable than the first algorithm and is more efficient when the coefficients of multiple interpolations are to be computed over the same set of nodes.

preprint2010arXiv

Increasing the Reliability of Adaptive Quadrature Using Explicit Interpolants

We present two new adaptive quadrature routines. Both routines differ from previously published algorithms in many aspects, most significantly in how they represent the integrand, how they treat non-numerical values of the integrand, how they deal with improper divergent integrals and how they estimate the integration error. The main focus of these improvements is to increase the reliability of the algorithms without significantly impacting their efficiency. Both algorithms are implemented in Matlab and tested using both the "families" suggested by Lyness and Kaganove and the battery test used by Gander and Gautschi and Kahaner. They are shown to be more reliable, albeit in some cases less efficient, than other commonly-used adaptive integrators.

Pedro Gonnet

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

A Hybrid MPI+Threads Approach to Particle Group Finding Using Union-Find

Fast Multi-language LSTM-based Online Handwriting Recognition

QuickSched: Task-based parallelism with dependencies and conflicts

SWIFT: Using task-based parallelism, fully asynchronous communication, and graph partition-based domain decomposition for strong scaling on more than 100,000 cores

SWIFT: task-based hydrodynamics and gravity for cosmological simulations

Efficient and Scalable Algorithms for Smoothed Particle Hydrodynamics on Hybrid Shared/Distributed-Memory Architectures

SWIFT: Fast algorithms for multi-resolution SPH on multi-core architectures

A Review of Error Estimation in Adaptive Quadrature

Efficient Construction, Update and Downdate Of The Coefficients Of Interpolants Based On Polynomials Satisfying A Three-Term Recurrence Relation

Increasing the Reliability of Adaptive Quadrature Using Explicit Interpolants