Source author record

David Goz

David Goz appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

astro-ph.IM astro-ph.GA Performance Distributed, Parallel, and Cluster Computing

Catalog footprint

What is connected

6works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Accelerating cosmological simulations on GPUs: a step towards sustainability and green-awareness

The increasing complexity and scale of cosmological N-body simulations, driven by astronomical surveys like Euclid, call for a paradigm shift towards more sustainable and energy-efficient high-performance computing (HPC). The rising energy consumption of supercomputing facilities poses a significant environmental and financial challenge. In this work, we build upon a recently developed GPU implementation of pinocchio, a widely-used tool for the fast generation of dark matter (DM) halo catalogues, to investigate energy consumption. Using a different resource configuration, we confirmed the time-to-solution behavior observed in a companion study, and we use these runs to compare time-to-solution with energy-to-solution. By profiling the code on various HPC platforms with a newly developed implementation of the Power Measurement Toolkit (PMT), we demonstrate an 8x reduction in energy-to-solution and 8x speed-up in time-to-solution compared to the CPU-only version. Taken together, these gains translate into an overall efficiency improvement of up to 64x. Our results show that the GPU-accelerated pinocchio not only achieves substantial speed-up, making the generation of large-scale mock catalogues more tractable, but also significantly reduces the energy footprint of the simulations. This work represents an step towards ``green-aware" scientific computing in cosmology, proving that performance and sustainability can be simultaneously achieved.

preprint2023arXiv

High Performance W-stacking for Imaging Radio Astronomy Data: a Parallel and Accelerated Solution

Current and upcoming radio-interferometers are expected to produce volumes of data of increasing size that need to be processed in order to generate the corresponding sky brightness distributions through imaging. This represents an outstanding computational challenge, especially when large fields of view and/or high resolution observations are processed. We have investigated the adoption of modern High Performance Computing systems specifically addressing the gridding, FFT-transform and w-correction of imaging, combining parallel and accelerated solutions. We have demonstrated that the code we have developed can support dataset and images of any size compatible with the available hardware, efficiently scaling up to thousands of cores or hundreds of GPUs, keeping the time to solution below one hour even when images of the size of the order of billion or tens of billion of pixels are generated. In addition, portability has been targeted as a primary objective, both in terms of usability on different computing platforms and in terms of performance. The presented results have been obtained on two different state-of-the-art High Performance Computing architectures.

preprint2020arXiv

Gadget3 on GPUs with OpenACC

We present preliminary results of a GPU porting of all main Gadget3 modules (gravity computation, SPH density computation, SPH hydrodynamic force, and thermal conduction) using OpenACC directives. Here we assign one GPU to each MPI rank and exploit both the host and accellerator capabilities by overlapping computations on the CPUs and GPUs: while GPUs asynchronously compute interactions between particles within their MPI ranks, CPUs perform tree-walks and MPI communications of neighbouring particles. We profile various portions of the code to understand the origin of our speedup, where we find that a peak speedup is not achieved because of time-steps with few active particles. We run a hydrodynamic cosmological simulation from the Magneticum project, with $2\cdot10^{7}$ particles, where we find a final total speedup of $\approx 2.$ We also present the results of an encouraging scaling test of a preliminary gravity-only OpenACC porting, run in the context of the EuroHack17 event, where the prototype of the porting proved to keep a constant speedup up to $1024$ GPUs.

preprint2020arXiv

Performance and energy footprint assessment of FPGAs and GPUs on HPC systems using Astrophysics application

New challenges in Astronomy and Astrophysics (AA) are urging the need for a large number of exceptionally computationally intensive simulations. "Exascale" (and beyond) computational facilities are mandatory to address the size of theoretical problems and data coming from the new generation of observational facilities in AA. Currently, the High Performance Computing (HPC) sector is undergoing a profound phase of innovation, in which the primary challenge to the achievement of the "Exascale" is the power-consumption. The goal of this work is to give some insights about performance and energy footprint of contemporary architectures for a real astrophysical application in an HPC context. We use a state-of-the-art N-body application that we re-engineered and optimized to exploit the heterogeneous underlying hardware fully. We quantitatively evaluate the impact of computation on energy consumption when running on four different platforms. Two of them represent the current HPC systems (Intel-based and equipped with NVIDIA GPUs), one is a micro-cluster based on ARM-MPSoC, and one is a "prototype towards Exascale" equipped with ARM-MPSoCs tightly coupled with FPGAs. We investigate the behavior of the different devices where the high-end GPUs excel in terms of time-to-solution while MPSoC-FPGA systems outperform GPUs in power consumption. Our experience reveals that considering FPGAs for computationally intensive application seems very promising, as their performance is improving to meet the requirements of scientific applications. This work can be a reference for future platforms development for astrophysics applications where computationally intensive calculations are required.

preprint2014arXiv

Properties of barred spiral disks in hydrodynamical cosmological simulations

We present a quantification of the properties of bars in two N-body+SPH cosmological simulations of spiral galaxies, named GA and AqC. The initial conditions were obtained using the zoom-in technique and represent two dark matter (DM) halos of $2-3\times10^{12}\ {\rm M}_\odot$, available at two different resolutions. The resulting galaxies are presented in the companion paper of Murante et al. (2014). We find that the GA galaxy has a bar of length $8.8$ kpc, present at the two resolution levels even though with a slightly different strength. Classical bar signatures (e.g. pattern of streaming motions, high $m=2$ Fourier mode with roughly constant phase) are consistently found at both resolutions. Though a close encounter with a merging satellite at $z\sim0.6$ (mass ratio $1:50$) causes a strong, transient spiral pattern and some heating of the disk, we find that bar instability is due to secular process, caused by a low Toomre parameter $Q\lesssim1$ due to accumulation of mass in the disk. The AqC galaxy has a slightly different history: it suffers a similar tidal disturbance due to a merging satellite at $z\sim0.5$ but with a mass ratio of $1:32$, that triggers a bar in the high-resolution simulation, while at low resolution the merging is found to take place at a later time, so that both secular evolution and merging are plausible triggers for bar instability.

preprint2014arXiv

Simulating realistic disk galaxies with a novel sub-resolution ISM model

We present results of cosmological simulations of disk galaxies carried out with the GADGET-3 TreePM+SPH code, where star formation and stellar feedback are described using our MUlti Phase Particle Integrator (MUPPI) model. This description is based on simple multi-phase model of the interstellar medium at unresolved scales, where mass and energy flows among the components are explicitly followed by solving a system of ordinary differential equations. Thermal energy from SNe is injected into the local hot phase, so as to avoid that it is promptly radiated away. A kinetic feedback prescription generates the massive outflows needed to avoid the over-production of stars. We use two sets of zoomed-in initial conditions of isolated cosmological halos with masses (2-3) * 10^{12} Msun, both available at several resolution levels. In all cases we obtain spiral galaxies with small bulge-over-total stellar mass ratios (B/T \approx 0.2), extended stellar and gas disks, flat rotation curves and realistic values of stellar masses. Gas profiles are relatively flat, molecular gas is found to dominate at the centre of galaxies, with star formation rates following the observed Schmidt-Kennicutt relation. Stars kinematically belonging to the bulge form early, while disk stars show a clear inside-out formation pattern and mostly form after redshift z=2. However, the baryon conversion efficiencies in our simulations differ from the relation given by Moster et al. (2010) at a 3 sigma level, thus indicating that our stellar disks are still too massive for the Dark Matter halo in which they reside. Results are found to be remarkably stable against resolution. This further demonstrates the feasibility of carrying out simulations producing a realistic population of galaxies within representative cosmological volumes, at a relatively modest resolution.

David Goz

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Accelerating cosmological simulations on GPUs: a step towards sustainability and green-awareness

High Performance W-stacking for Imaging Radio Astronomy Data: a Parallel and Accelerated Solution

Gadget3 on GPUs with OpenACC

Performance and energy footprint assessment of FPGAs and GPUs on HPC systems using Astrophysics application

Properties of barred spiral disks in hydrodynamical cosmological simulations

Simulating realistic disk galaxies with a novel sub-resolution ISM model