Researcher profile

Luca Carloni

Luca Carloni contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2022arXiv

Real-time Inference with 2D Convolutional Neural Networks on Field Programmable Gate Arrays for High-rate Particle Imaging Detectors

We present a custom implementation of a 2D Convolutional Neural Network (CNN) as a viable application for real-time data selection in high-resolution and high-rate particle imaging detectors, making use of hardware acceleration in high-end Field Programmable Gate Arrays (FPGAs). To meet FPGA resource constraints, a two-layer CNN is optimized for accuracy and latency with KerasTuner, and network \textit{quantization} is further used to minimize the computing resource utilization of the network. We use "High Level Synthesis for Machine Learning" (\textit{hls4ml}) tools to test CNN deployment on a Xilinx UltraScale+ FPGA, which is a proposed FPGA technology for the front-end readout system of the future Deep Underground Neutrino Experiment (DUNE) far detector. We evaluate network accuracy and estimate latency and hardware resource usage, and comment on the feasibility of applying CNNs for real-time data selection within the proposed DUNE data acquisition system.

preprint2021arXiv

DB4HLS: A Database of High-Level Synthesis Design Space Explorations

High-Level Synthesis (HLS) frameworks allow to easily specify a large number of variants of the same hardware design by only acting on optimization directives. Nonetheless, the hardware synthesis of implementations for all possible combinations of directive values is impractical even for simple designs. Addressing this shortcoming, many HLS Design Space Exploration (DSE) strategies have been proposed to devise directive settings leading to high-quality implementations while limiting the number of synthesis runs. All these works require considerable efforts to validate the proposed strategies and/or to build the knowledge base employed to tune abstract models, as both tasks mandate the syntheses of large collections of implementations. Currently, such data gathering is performed ad-hoc, a) leading to a lack of standardization, hampering comparisons between DSE alternatives, and b) posing a very high burden to researchers willing to develop novel DSE strategies. Against this backdrop, we here introduce DB4HLS, a database of exhaustive HLS explorations comprising more than 100000 design points collected over 4 years of synthesis time. The open structure of DB4HLS allows the incremental integration of new DSEs, which can be easily defined with a dedicated domain-specific language. We think that of our database, available at https://www.db4hls.inf.usi.ch/, will be a valuable tool for the research community investigating automated strategies for the optimization of HLS-based hardware designs.

preprint2021arXiv

Scaling Up Hardware Accelerator Verification using A-QED with Functional Decomposition

Hardware accelerators (HAs) are essential building blocks for fast and energy-efficient computing systems. Accelerator Quick Error Detection (A-QED) is a recent formal technique which uses Bounded Model Checking for pre-silicon verification of HAs. A-QED checks an HA for self-consistency, i.e., whether identical inputs within a sequence of operations always produce the same output. Under modest assumptions, A-QED is both sound and complete. However, as is well-known, large design sizes significantly limit the scalability of formal verification, including A-QED. We overcome this scalability challenge through a new decomposition technique for A-QED, called A-QED with Decomposition (A-QED$^2$). A-QED$^2$ systematically decomposes an HA into smaller, functional sub-modules, called sub-accelerators, which are then verified independently using A-QED. We prove completeness of A-QED$^2$; in particular, if the full HA under verification contains a bug, then A-QED$^2$ ensures detection of that bug during A-QED verification of the corresponding sub-accelerators. Results on over 100 (buggy) versions of a wide variety of HAs with millions of logic gates demonstrate the effectiveness and practicality of A-QED$^2$.