Researcher profile

Nhan Tran

Nhan Tran contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
19works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

19 published item(s)

preprint2026arXiv

Surrogate Neural Architecture Codesign Package (SNAC-Pack)

Neural architecture search (NAS) is a powerful approach for automating model design, but existing methods often optimize for accuracy alone or rely on proxy metrics such as bit operations (BOPs) that correlate poorly with hardware cost. This gap is particularly large for FPGA deployment, where cost is dominated by a multi-dimensional budget of lookup tables, DSPs, flip-flops, BRAM, and latency. We present the Surrogate Neural Architecture Codesign Package (SNAC-Pack), an open-source AutoML framework for hardware-aware neural architecture codesign and end-to-end FPGA deployment. SNAC-Pack runs a multi-objective global search with Optuna and NSGA-II, loading trials to a shared SQLite store that enables parallel workers across compute nodes. A hardware surrogate model outputs per-trial resource and latency estimates, avoiding the synthesis cost that would otherwise dominate the search loop. A local search stage then applies quantization-aware training (QAT) together with iterative magnitude pruning in a combined compression loop, after which the final model is synthesized to FPGA firmware via the hls4ml Python library. A YAML configuration and an optional agentic frontend let users run the pipeline on new datasets without modifying the framework. We demonstrate SNAC-Pack on jet classification at the Large Hadron Collider and superconducting qubit readout, discovering compact architectures that match or exceed strong baselines on the task metric while reducing FPGA resource utilization and, in the qubit readout case, reducing the design space exploration process from months of manual fine-tuning to hours of automated search.

preprint2026arXiv

Towards a Self-Driving Trigger at the LHC: Adaptive Response in Real Time

Real-time data filtering and selection -- or trigger -- systems at high-throughput scientific facilities such as the experiments at the Large Hadron Collider (LHC) must process extremely high-rate data streams under stringent bandwidth, latency, and storage constraints. Yet these systems are typically designed as static, hand-tuned menus of selection criteria grounded in prior knowledge and simulation. In this work, we further explore the concept of a self-driving trigger, an autonomous data-filtering framework that reallocates resources and adjusts thresholds dynamically in real-time to optimize signal efficiency, rate stability, and computational cost as instrumentation and environmental conditions evolve. We introduce a benchmark ecosystem to emulate realistic collider scenarios and demonstrate real-time optimization of a menu including canonical energy sum triggers as well as modern anomaly-detection algorithms that target non-standard event topologies using machine learning. Using simulated data streams and publicly available collision data from the Compact Muon Solenoid (CMS) experiment, we demonstrate the capability to dynamically and automatically optimize trigger performance under specific cost objectives without manual retuning. Our adaptive strategy shifts trigger design from static menus with heuristic tuning to intelligent, automated, data-driven control, unlocking greater flexibility and discovery potential in future high-energy physics analyses.

preprint2023arXiv

Beyond PID Controllers: PPO with Neuralized PID Policy for Proton Beam Intensity Control in Mu2e

We introduce a novel Proximal Policy Optimization (PPO) algorithm aimed at addressing the challenge of maintaining a uniform proton beam intensity delivery in the Muon to Electron Conversion Experiment (Mu2e) at Fermi National Accelerator Laboratory (Fermilab). Our primary objective is to regulate the spill process to ensure a consistent intensity profile, with the ultimate goal of creating an automated controller capable of providing real-time feedback and calibration of the Spill Regulation System (SRS) parameters on a millisecond timescale. We treat the Mu2e accelerator system as a Markov Decision Process suitable for Reinforcement Learning (RL), utilizing PPO to reduce bias and enhance training stability. A key innovation in our approach is the integration of a neuralized Proportional-Integral-Derivative (PID) controller into the policy function, resulting in a significant improvement in the Spill Duty Factor (SDF) by 13.6%, surpassing the performance of the current PID controller baseline by an additional 1.6%. This paper presents the preliminary offline results based on a differentiable simulator of the Mu2e accelerator. It paves the groundwork for real-time implementations and applications, representing a crucial step towards automated proton beam intensity control for the Mu2e experiment.

preprint2023arXiv

Differentiable Earth Mover's Distance for Data Compression at the High-Luminosity LHC

The Earth mover's distance (EMD) is a useful metric for image recognition and classification, but its usual implementations are not differentiable or too slow to be used as a loss function for training other algorithms via gradient descent. In this paper, we train a convolutional neural network (CNN) to learn a differentiable, fast approximation of the EMD and demonstrate that it can be used as a substitute for computing-intensive EMD implementations. We apply this differentiable approximation in the training of an autoencoder-inspired neural network (encoder NN) for data compression at the high-luminosity LHC at CERN. The goal of this encoder NN is to compress the data while preserving the information related to the distribution of energy deposits in particle detectors. We demonstrate that the performance of our encoder NN trained using the differentiable EMD CNN surpasses that of training with loss functions based on mean squared error.

preprint2022arXiv

Dark Sector Physics at High-Intensity Experiments

Is Dark Matter part of a Dark Sector? The possibility of a dark sector neutral under Standard Model (SM) forces furnishes an attractive explanation for the existence of Dark Matter (DM), and is a compelling new-physics direction to explore in its own right, with potential relevance to fundamental questions as varied as neutrino masses, the hierarchy problem, and the Universe's matter-antimatter asymmetry. Because dark sectors are generically weakly coupled to ordinary matter, and because they can naturally have MeV-to-GeV masses and respect the symmetries of the SM, they are only mildly constrained by high-energy collider data and precision atomic measurements. Yet upcoming and proposed intensity-frontier experiments will offer an unprecedented window into the physics of dark sectors, highlighted as a Priority Research Direction in the 2018 Dark Matter New Initiatives (DMNI) BRN report. Support for this program -- in the form of dark-sector analyses at multi-purpose experiments, realization of the intensity-frontier experiments receiving DMNI funds, an expansion of DMNI support to explore the full breadth of DM and visible final-state signatures (especially long-lived particles) called for in the BRN report, and support for a robust dark-sector theory effort -- will enable comprehensive exploration of low-mass thermal DM milestones, and greatly enhance the potential of intensity-frontier experiments to discover dark-sector particles decaying back to SM particles.

preprint2022arXiv

DarkQuest: A dark sector upgrade to SpinQuest at the 120 GeV Fermilab Main Injector

Expanding the mass range and techniques by which we search for dark matter is an important part of the worldwide particle physics program. Accelerator-based searches for dark matter and dark sector particles are a uniquely compelling part of this program as a way to both create and detect dark matter in the laboratory and explore the dark sector by searching for mediators and excited dark matter particles. This paper focuses on developing the DarkQuest experimental concept and gives an outlook on related enhancements collectively referred to as LongQuest. DarkQuest is a proton fixed-target experiment with leading sensitivity to an array of visible dark sector signatures in the MeV-GeV mass range. Because it builds off of existing accelerator and detector infrastructure, it offers a powerful but modest-cost experimental initiative that can be realized on a short timescale.

preprint2022arXiv

FastML Science Benchmarks: Accelerating Real-Time Scientific Edge Machine Learning

Applications of machine learning (ML) are growing by the day for many unique and challenging scientific applications. However, a crucial challenge facing these applications is their need for ultra low-latency and on-detector ML capabilities. Given the slowdown in Moore's law and Dennard scaling, coupled with the rapid advances in scientific instrumentation that is resulting in growing data rates, there is a need for ultra-fast ML at the extreme edge. Fast ML at the edge is essential for reducing and filtering scientific data in real-time to accelerate science experimentation and enable more profound insights. To accelerate real-time scientific edge ML hardware and software solutions, we need well-constrained benchmark tasks with enough specifications to be generically applicable and accessible. These benchmarks can guide the design of future edge ML hardware for scientific applications capable of meeting the nanosecond and microsecond level latency requirements. To this end, we present an initial set of scientific ML benchmarks, covering a variety of ML and embedded system techniques.

preprint2022arXiv

Jets and Jet Substructure at Future Colliders

Even though jet substructure was not an original design consideration for the Large Hadron Collider (LHC) experiments, it has emerged as an essential tool for the current physics program. We examine the role of jet substructure on the motivation for and design of future energy frontier colliders. In particular, we discuss the need for a vibrant theory and experimental research and development program to extend jet substructure physics into the new regimes probed by future colliders. Jet substructure has organically evolved with a close connection between theorists and experimentalists and has catalyzed exciting innovations in both communities. We expect such developments will play an important role in the future energy frontier physics program.

preprint2022arXiv

Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark

We present our development experience and recent results for the MLPerf Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms. We use the open-source hls4ml and FINN workflows, which aim to democratize AI-hardware codesign of optimized neural networks on FPGAs. We present the design and implementation process for the keyword spotting, anomaly detection, and image classification benchmark tasks. The resulting hardware implementations are quantized, configurable, spatial dataflow architectures tailored for speed and efficiency and introduce new generic optimizations and common workflows developed as a part of this work. The full workflow is presented from quantization-aware training to FPGA implementation. The solutions are deployed on system-on-chip (Pynq-Z2) and pure FPGA (Arty A7-100T) platforms. The resulting submissions achieve latencies as low as 20 $μ$s and energy consumption as low as 30 $μ$J per inference. We demonstrate how emerging ML benchmarks on heterogeneous hardware platforms can catalyze collaboration and the development of new techniques and more accessible tools.

preprint2022arXiv

Physics Community Needs, Tools, and Resources for Machine Learning

Machine learning (ML) is becoming an increasingly important component of cutting-edge physics research, but its computational requirements present significant challenges. In this white paper, we discuss the needs of the physics community regarding ML across latency and throughput regimes, the tools and resources that offer the possibility of addressing these needs, and how these can be best utilized and accessed in the coming years.

preprint2022arXiv

Physics Opportunities for the Fermilab Booster Replacement

This white paper presents opportunities afforded by the Fermilab Booster Replacement and its various options. Its goal is to inform the design process of the Booster Replacement about the accelerator needs of the various options, allowing the design to be versatile and enable, or leave the door open to, as many options as possible. The physics themes covered by the paper include searches for dark sectors and new opportunities with muons.

preprint2022arXiv

QONNX: Representing Arbitrary-Precision Quantized Neural Networks

We present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks. We first introduce support for low precision quantization in existing ONNX-based quantization formats by leveraging integer clipping, resulting in two new backward-compatible variants: the quantized operator format with clipping and quantize-clip-dequantize (QCDQ) format. We then introduce a novel higher-level ONNX format called quantized ONNX (QONNX) that introduces three new operators -- Quant, BipolarQuant, and Trunc -- in order to represent uniform quantization. By keeping the QONNX IR high-level and flexible, we enable targeting a wider variety of platforms. We also present utilities for working with QONNX, as well as examples of its usage in the FINN and hls4ml toolchains. Finally, we introduce the QONNX model zoo to share low-precision quantized neural networks.

preprint2022arXiv

Smart sensors using artificial intelligence for on-detector electronics and ASICs

Cutting edge detectors push sensing technology by further improving spatial and temporal resolution, increasing detector area and volume, and generally reducing backgrounds and noise. This has led to a explosion of more and more data being generated in next-generation experiments. Therefore, the need for near-sensor, at the data source, processing with more powerful algorithms is becoming increasingly important to more efficiently capture the right experimental data, reduce downstream system complexity, and enable faster and lower-power feedback loops. In this paper, we discuss the motivations and potential applications for on-detector AI. Furthermore, the unique requirements of particle physics can uniquely drive the development of novel AI hardware and design tools. We describe existing modern work for particle physics in this area. Finally, we outline a number of areas of opportunity where we can advance machine learning techniques, codesign workflows, and future microelectronics technologies which will accelerate design, performance, and implementations for next generation experiments.

preprint2021arXiv

Distance-Weighted Graph Neural Networks on FPGAs for Real-Time Particle Reconstruction in High Energy Physics

Graph neural networks have been shown to achieve excellent performance for several crucial tasks in particle physics, such as charged particle tracking, jet tagging, and clustering. An important domain for the application of these networks is the FGPA-based first layer of real-time data filtering at the CERN Large Hadron Collider, which has strict latency and resource constraints. We discuss how to design distance-weighted graph networks that can be executed with a latency of less than 1$μ\mathrm{s}$ on an FPGA. To do so, we consider a representative task associated to particle reconstruction and identification in a next-generation calorimeter operating at a particle collider. We use a graph network architecture developed for such purposes, and apply additional simplifications to match the computing constraints of Level-1 trigger systems, including weight quantization. Using the $\mathtt{hls4ml}$ library, we convert the compressed models into firmware to be implemented on an FPGA. Performance of the synthesized models is presented both in terms of inference accuracy and resource usage.

preprint2020arXiv

Fast inference of Boosted Decision Trees in FPGAs for particle physics

We describe the implementation of Boosted Decision Trees in the hls4ml library, which allows the translation of a trained model into FPGA firmware through an automated conversion process. Thanks to its fully on-chip implementation, hls4ml performs inference of Boosted Decision Tree models with extremely low latency. With a typical latency less than 100 ns, this solution is suitable for FPGA-based real-time processing, such as in the Level-1 Trigger system of a collider experiment. These developments open up prospects for physicists to deploy BDTs in FPGAs for identifying the origin of jets, better reconstructing the energies of muons, and enabling better selection of rare signal processes.

preprint2020arXiv

Improving Long Handwritten Text Line Recognition with Convolutional Multi-way Associative Memory

Convolutional Recurrent Neural Networks (CRNNs) excel at scene text recognition. Unfortunately, they are likely to suffer from vanishing/exploding gradient problems when processing long text images, which are commonly found in scanned documents. This poses a major challenge to goal of completely solving Optical Character Recognition (OCR) problem. Inspired by recently proposed memory-augmented neural networks (MANNs) for long-term sequential modeling, we present a new architecture dubbed Convolutional Multi-way Associative Memory (CMAM) to tackle the limitation of current CRNNs. By leveraging recent memory accessing mechanisms in MANNs, our architecture demonstrates superior performance against other CRNN counterparts in three real-world long text OCR datasets.

preprint2020arXiv

Lepton-Nucleus Cross Section Measurements for DUNE with the LDMX Detector

We point out that the LDMX (Light Dark Matter eXperiment) detector design, conceived to search for sub-GeV dark matter, will also have very advantageous characteristics to pursue electron-nucleus scattering measurements of direct relevance to the neutrino program at DUNE and elsewhere. These characteristics include a 4-GeV electron beam, a precision tracker, electromagnetic and hadronic calorimeters with near 2$π$ azimuthal acceptance from the forward beam axis out to $\sim$40$^\circ$ angle, and low reconstruction energy threshold. LDMX thus could provide (semi)exclusive cross section measurements, with detailed information about final-state electrons, pions, protons, and neutrons. We compare the predictions of two widely used neutrino generators (GENIE, GiBUU) in the LDMX region of acceptance to illustrate the large modeling discrepancies in electron-nucleus interactions at DUNE-like kinematics. We argue that discriminating between these predictions is well within the capabilities of the LDMX detector.

preprint2018arXiv

Fast inference of deep neural networks in FPGAs for particle physics

Recent results at the Large Hadron Collider (LHC) have pointed to enhanced physics capabilities through the improvement of the real-time event processing techniques. Machine learning methods are ubiquitous and have proven to be very powerful in LHC physics, and particle physics as a whole. However, exploration of the use of such techniques in low-latency, low-power FPGA hardware has only just begun. FPGA-based trigger and data acquisition (DAQ) systems have extremely low, sub-microsecond latency requirements that are unique to particle physics. We present a case study for neural network inference in FPGAs focusing on a classifier for jet substructure which would enable, among many other physics scenarios, searches for new dark sector particles and novel measurements of the Higgs boson. While we focus on a specific example, the lessons are far-reaching. We develop a package based on High-Level Synthesis (HLS) called hls4ml to build machine learning models in FPGAs. The use of HLS increases accessibility across a broad user community and allows for a drastic decrease in firmware development time. We map out FPGA resource usage and latency versus neural network hyperparameters to identify the problems in particle physics that would benefit from performing neural network inference with FPGAs. For our example jet substructure model, we fit well within the available resources of modern FPGAs with a latency on the scale of 100 ns.