Source author record

Maciej Cytowski

Maciej Cytowski appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

astro-ph.CO astro-ph.GA astro-ph.IM Distributed, Parallel, and Cluster Computing

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Accelerating cosmological simulations on GPUs: a step towards sustainability and green-awareness

The increasing complexity and scale of cosmological N-body simulations, driven by astronomical surveys like Euclid, call for a paradigm shift towards more sustainable and energy-efficient high-performance computing (HPC). The rising energy consumption of supercomputing facilities poses a significant environmental and financial challenge. In this work, we build upon a recently developed GPU implementation of pinocchio, a widely-used tool for the fast generation of dark matter (DM) halo catalogues, to investigate energy consumption. Using a different resource configuration, we confirmed the time-to-solution behavior observed in a companion study, and we use these runs to compare time-to-solution with energy-to-solution. By profiling the code on various HPC platforms with a newly developed implementation of the Power Measurement Toolkit (PMT), we demonstrate an 8x reduction in energy-to-solution and 8x speed-up in time-to-solution compared to the CPU-only version. Taken together, these gains translate into an overall efficiency improvement of up to 64x. Our results show that the GPU-accelerated pinocchio not only achieves substantial speed-up, making the generation of large-scale mock catalogues more tractable, but also significantly reduces the energy footprint of the simulations. This work represents an step towards ``green-aware" scientific computing in cosmology, proving that performance and sustainability can be simultaneously achieved.

preprint2016arXiv

The Copernicus Complexio: a high-resolution view of the small-scale Universe

We introduce Copernicus Complexio (COCO), a high-resolution cosmological N-body simulation of structure formation in the $Λ{\rm CDM}{}$ model. COCO follows an approximately spherical region of radius $\sim 17.4h^{-1}\,{\rm Mpc}$ embedded in a much larger periodic cube that is followed at lower resolution. The high resolution volume has a particle mass of $1.135\times10^5h^{-1}{\rm M}_{\odot}$ (60 times higher than the Millennium-II simulation). COCO gives the dark matter halo mass function over eight orders of magnitude in halo mass; it forms $\sim 60$ haloes of galactic size, each resolved with about 10 million particles. We confirm the power-law character of the subhalo mass function, $\bar{N}(>μ)\proptoμ^{-s}$, down to a reduced subhalo mass $M_{sub}/M_{200}\equivμ=10^{-6}$, with a best-fit power-law index, $s=0.94$, for hosts of mass $\langle M_{200}\rangle=10^{12}h^{-1}{\rm M}_{\odot}$. The concentration-mass relation of COCO haloes deviates from a single power law for masses $M_{200}<\textrm{a few}\times 10^{8}h^{-1}{\rm M}_{\odot}$, where it flattens, in agreement with results by Sanchez-Conde et al. The host mass invariance of the reduced maximum circular velocity function of subhaloes, $ν\equiv V_{max}/V_{200}$, hinted at in previous simulations, is clearly demonstrated over five orders of magnitude in host mass. Similarly, we find that the average, normalised radial distribution of subhaloes is approximately universal (i.e. independent of subhalo mass), as previously suggested by the Aquarius simulations of individual haloes. Finally, we find that at fixed physical subhalo size, subhaloes in lower mass hosts typically have lower central densities than those in higher mass hosts.

preprint2014arXiv

Towards Autotuning of OpenMP Applications on Multicore Architectures

In this paper we describe an autotuning tool for optimization of OpenMP applications on highly multicore and multithreaded architectures. Our work was motivated by in-depth performance analysis of scientific applications and synthetic benchmarks on IBM Power 775 architecture. The tool provides an automatic code instrumentation of OpenMP parallel regions. Based on measurement of chosen hardware performance counters the tool decides on the number of parallel threads that should be used for execution of chosen code fragments.