Researcher profile

Vincent R. Pascuzzi

Vincent R. Pascuzzi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2022arXiv

AI-coupled HPC Workflows

Increasingly, scientific discovery requires sophisticated and scalable workflows. Workflows have become the ``new applications,'' wherein multi-scale computing campaigns comprise multiple and heterogeneous executable tasks. In particular, the introduction of AI/ML models into the traditional HPC workflows has been an enabler of highly accurate modeling, typically reducing computational needs compared to traditional methods. This chapter discusses various modes of integrating AI/ML models to HPC computations, resulting in diverse types of AI-coupled HPC workflows. The increasing need of coupling AI/ML and HPC across scientific domains is motivated, and then exemplified by a number of production-grade use cases for each mode. We additionally discuss the primary challenges of extreme-scale AI-coupled HPC campaigns -- task heterogeneity, adaptivity, performance -- and several framework and middleware solutions which aim to address them. While both HPC workflow and AI/ML computing paradigms are independently effective, we highlight how their integration, and ultimate convergence, is leading to significant improvements in scientific performance across a range of domains, ultimately resulting in scientific explorations otherwise unattainable.

preprint2022arXiv

Benchmarking a Proof-of-Concept Performance Portable SYCL-based Fast Fourier Transformation Library

In this paper, we present an early version of a SYCL-based FFT library, capable of running on all major vendor hardware, including CPUs and GPUs from AMD, ARM, Intel and NVIDIA. Although preliminary, the aim of this work is to seed further developments for a rich set of features for calculating FFTs. It has the advantage over existing portable FFT libraries in that it is single-source, and therefore removes the complexities that arise due to abundant use of pre-process macros and auto-generated kernels to target different architectures. We exercise two SYCL-enabled compilers, Codeplay ComputeCpp and Intel's open-source LLVM project, to evaluate performance portability of our SYCL-based FFT on various heterogeneous architectures. The current limitations of our library is it supports single-dimension FFTs up to $2^{11}$ in length and base-2 input sequences. We compare our results with highly optimized vendor specific FFT libraries and provide a detailed analysis to demonstrate a fair level of performance, as well as potential sources of performance bottlenecks.

preprint2022arXiv

Computationally Efficient Zero Noise Extrapolation for Quantum Gate Error Mitigation

Zero noise extrapolation (ZNE) is a widely used technique for gate error mitigation on near term quantum computers because it can be implemented in software and does not require knowledge of the quantum computer noise parameters. Traditional ZNE requires a significant resource overhead in terms of quantum operations. A recent proposal using a targeted (or random) instead of fixed identity insertion method (RIIM versus FIIM) requires significantly fewer quantum gates for the same formal precision. We start by showing that RIIM can allow for ZNE to be deployed on deeper circuits than FIIM, but requires many more measurements to maintain the same statistical uncertainty. We develop two extensions to FIIM and RIIM. The List Identity Insertion Method (LIIM) allows to mitigate the error from certain CNOT gates, typically those with the largest error. Set Identity Insertion Method (SIIM) naturally interpolates between the measurement-efficient FIIM and the gate-efficient RIIM, allowing to trade off fewer CNOT gates for more measurements. Finally, we investigate a way to boost the number of measurements, namely to run ZNE in parallel, utilizing as many quantum devices as are available. We explore the performance of RIIM in a parallel setting where there is a non-trivial spread in noise across sets of qubits within or across quantum computers.

preprint2022arXiv

Detector and Beamline Simulation for Next-Generation High Energy Physics Experiments

The success of high energy physics programs relies heavily on accurate detector simulations and beam interaction modeling. The increasingly complex detector geometries and beam dynamics require sophisticated techniques in order to meet the demands of current and future experiments. Common software tools used today are unable to fully utilize modern computational resources, while data-recording rates are often orders of magnitude larger than what can be produced via simulation. In this paper, we describe the state, current and future needs of high energy physics detector and beamline simulations and related challenges, and we propose a number of possible ways to address them.

preprint2022arXiv

Portability: A Necessary Approach for Future Scientific Software

Today's world of scientific software for High Energy Physics (HEP) is powered by x86 code, while the future will be much more reliant on accelerators like GPUs and FPGAs. The portable parallelization strategies (PPS) project of the High Energy Physics Center for Computational Excellence (HEP/CCE) is investigating solutions for portability techniques that will allow the coding of an algorithm once, and the ability to execute it on a variety of hardware products from many vendors, especially including accelerators. We think without these solutions, the scientific success of our experiments and endeavors is in danger, as software development could be expert driven and costly to be able to run on available hardware infrastructure. We think the best solution for the community would be an extension to the C++ standard with a very low entry bar for users, supporting all hardware forms and vendors. We are very far from that ideal though. We argue that in the future, as a community, we need to request and work on portability solutions and strive to reach this ideal.

preprint2021arXiv

Achieving near native runtime performance and cross-platform performance portability for random number generation through SYCL interoperability

High-performance computing (HPC) is a major driver accelerating scientific research and discovery, from quantum simulations to medical therapeutics. While the increasing availability of HPC resources is in many cases pivotal to successful science, even the largest collaborations lack the computational expertise required for maximal exploitation of current hardware capabilities. The need to maintain multiple platform-specific codebases further complicates matters, potentially adding constraints on machines that can be utilized. Fortunately, numerous programming models are under development that aim to facilitate portable codes for heterogeneous computing. One in particular is SYCL, an open standard, C++-based single-source programming paradigm. Among SYCL's features is interoperability, a mechanism through which applications and third-party libraries coordinate sharing data and execute collaboratively. In this paper, we leverage the SYCL programming model to demonstrate cross-platform performance portability across heterogeneous resources. We detail our NVIDIA and AMD random number generator extensions to the oneMKL open-source interfaces library. Performance portability is measured relative to platform-specific baseline applications executed on four major hardware platforms using two different compilers supporting SYCL. The utility of our extensions are exemplified in a real-world setting via a high-energy physics simulation application. We show the performance of implementations that capitalize on SYCL interoperability are at par with native implementations, attesting to the cross-platform performance portability of a SYCL-based approach to scientific codes.

preprint2021arXiv

Mitigating depolarizing noise on quantum computers with noise-estimation circuits

A significant problem for current quantum computers is noise. While there are many distinct noise channels, the depolarizing noise model often appropriately describes average noise for large circuits involving many qubits and gates. We present a method to mitigate the depolarizing noise by first estimating its rate with a noise-estimation circuit and then correcting the output of the target circuit using the estimated rate. The method is experimentally validated on the simulation of the Heisenberg model. We find that our approach in combination with readout-error correction, randomized compiling, and zero-noise extrapolation produces results close to exact results even for circuits containing hundreds of CNOT gates.

preprint2021arXiv

Porting HEP Parameterized Calorimeter Simulation Code to GPUs

The High Energy Physics (HEP) experiments, such as those at the Large Hadron Collider (LHC), traditionally consume large amounts of CPU cycles for detector simulations and data analysis, but rarely use compute accelerators such as GPUs. As the LHC is upgraded to allow for higher luminosity, resulting in much higher data rates, purely relying on CPUs may not provide enough computing power to support the simulation and data analysis needs. As a proof of concept, we investigate the feasibility of porting a HEP parameterized calorimeter simulation code to GPUs. We have chosen to use FastCaloSim, the ATLAS fast parametrized calorimeter simulation. While FastCaloSim is sufficiently fast such that it does not impose a bottleneck in detector simulations overall, significant speed-ups in the processing of large samples can be achieved from GPU parallelization at both the particle (intra-event) and event levels; this is especially beneficial in conditions expected at the high-luminosity LHC, where extremely high per-event particle multiplicities will result from the many simultaneous proton-proton collisions. We report our experience with porting FastCaloSim to NVIDIA GPUs using CUDA. A preliminary Kokkos implementation of FastCaloSim for portability to other parallel architectures is also described.

preprint2018arXiv

A Roadmap for HEP Software and Computing R&D for the 2020s

Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade.