Source author record

Paolo Rech

Paolo Rech appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Distributed, Parallel, and Cluster Computing Emerging Technologies Hardware Architecture Machine Learning Neural and Evolutionary Computing quant-ph

Catalog footprint

What is connected

3works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Fault-Aware Design and Training to Enhance DNNs Reliability with Zero-Overhead

Deep Neural Networks (DNNs) enable a wide series of technological advancements, ranging from clinical imaging, to predictive industrial maintenance and autonomous driving. However, recent findings indicate that transient hardware faults may corrupt the models prediction dramatically. For instance, the radiation-induced misprediction probability can be so high to impede a safe deployment of DNNs models at scale, urging the need for efficient and effective hardening solutions. In this work, we propose to tackle the reliability issue both at training and model design time. First, we show that vanilla models are highly affected by transient faults, that can induce a performances drop up to 37%. Hence, we provide three zero-overhead solutions, based on DNN re-design and re-train, that can improve DNNs reliability to transient faults up to one order of magnitude. We complement our work with extensive ablation studies to quantify the gain in performances of each hardening component.

preprint2022arXiv

QuFI: a Quantum Fault Injector to Measure the Reliability of Qubits and Quantum Circuits

Quantum computing is a new technology that is expected to revolutionize the computation paradigm in the next few years. Qubits exploit the quantum physics proprieties to increase the parallelism and speed of computation. Unfortunately, besides being intrinsically noisy, qubits have also been shown to be highly susceptible to external sources of faults, such as ionizing radiation. The latest discoveries highlight a much higher radiation sensitivity of qubits than traditional transistors and identify a much more complex fault model than bit-flip. We propose a framework to identify the quantum circuits sensitivity to radiation-induced faults and the probability for a fault in a qubit to propagate to the output. Based on the latest studies and radiation experiments performed on real quantum machines, we model the transient faults in a qubit as a phase shift with a parametrized magnitude. Additionally, our framework can inject multiple qubit faults, tuning the phase shift magnitude based on the proximity of the qubit to the particle strike location. As we show in the paper, the proposed fault injector is highly flexible, and it can be used on both quantum circuit simulators and real quantum machines. We report the finding of more than 285M injections on the Qiskit simulator and 53K injections on real IBM machines. We consider three quantum algorithms and identify the faults and qubits that are more likely to impact the output. We also consider the fault propagation dependence on the circuit scale, showing that the reliability profile for some quantum algorithms is scale-dependent, with increased impact from radiation-induced faults as we increase the number of qubits. Finally, we also consider multi qubits faults, showing that they are much more critical than single faults. The fault injector and the data presented in this paper are available in a public repository to allow further analysis.

preprint2020arXiv

Estimating Silent Data Corruption Rates Using a Two-Level Model

High-performance and safety-critical system architects must accurately evaluate the application-level silent data corruption (SDC) rates of processors to soft errors. Such an evaluation requires error propagation all the way from particle strikes on low-level state up to the program output. Existing approaches that rely on low-level simulations with fault injection cannot evaluate full applications because of their slow speeds, while application-level accelerated fault testing in accelerated particle beams is often impractical. We present a new two-level methodology for application resilience evaluation that overcomes these challenges. The proposed approach decomposes application failure rate estimation into (1) identifying how particle strikes in low-level unprotected state manifest at the architecture-level, and (2) measuring how such architecture-level manifestations propagate to the program output. We demonstrate the effectiveness of this approach on GPU architectures. We also show that using just one of the two steps can overestimate SDC rates and produce different trends---the composition of the two is needed for accurate reliability modeling.