Source author record

Sebastian Dittmeier

Sebastian Dittmeier appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

3works
4topics
4close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2026arXiv

AIE4ML: An End-to-End Framework for Compiling Neural Networks for the Next Generation of AMD AI Engines

Efficient AI inference on AMD's Versal AI Engine (AIE) is challenging due to tightly coupled VLIW execution, explicit datapaths, and local memory management. Prior work focused on first-generation AIE kernel optimizations, without tackling full neural network execution across the 2D array. In this work, we present AIE4ML, the first comprehensive framework for converting AI models automatically into optimized firmware targeting the AIE-ML generation devices, also with forward compatibility for the newer AIE-MLv2 architecture. At the single-kernel level, we attain performance close to the architectural peak. At the graph and system levels, we provide a structured parallelization method that can scale across the 2D AIE-ML fabric and exploit its dedicated memory tiles to stay entirely on-chip throughout the model execution. As a demonstration, we designed a generalized and highly efficient linear-layer implementation with intrinsic support for fused bias addition and ReLU activation. Also, as our framework necessitates the generation of multi-layer implementations, our approach systematically derives deterministic, compact, and topology-optimized placements tailored to the physical 2D grid of the device through a novel graph placement and search algorithm. Finally, the framework seamlessly accepts quantized models imported from high-level tools such as hls4ml or PyTorch while preserving bit-exactness. In layer scaling benchmarks, we achieve up to 98.6% efficiency relative to the single-kernel baseline, utilizing 296 of 304 AIE tiles (97.4%) of the device with entirely on-chip data movement. With evaluations across real-world model topologies, we demonstrate that AIE4ML delivers GPU-class throughput under microsecond latency constraints, making it a practical companion for ultra-low-latency environments such as trigger systems in particle physics experiments.

preprint2016arXiv

Feasibility studies for a wireless 60 GHz tracking detector readout

The amount of data produced by highly granular silicon tracking detectors in high energy physics experiments poses a major challenge to readout systems. At high collision rates, e.g. at LHC experiments, only a small fraction of data can be read out with currently used technologies. To cope with the requirements of future or upgraded experiments new data transfer techniques are required which offer high data rates at low power and low material budget. Wireless technologies operating in the 60 GHz band or at higher frequencies offer high data rates and are thus a promising upcoming alternative to conventional data transmission via electrical cables or optical fibers. Using wireless technology, the amount of cables and connectors in detectors can be significantly reduced. Tracking detectors profit most from a reduced material budget as fewer secondary particle interactions (multiple Coulomb scattering, energy loss, etc.) improve the tracking performance in general. We present feasibility studies regarding the integration of the wireless technology at 60 GHz into a silicon tracking detector. Spare silicon strip modules of the ATLAS experiment are measured to be opaque in the 60 GHz range. The reduction of cross talk between links is studied. An estimate of the maximum achievable link density is given. It is shown that wireless links can be placed as close as 2 cm next to each other for a layer distance of 10 cm by exploiting one or several of the following measures: highly directive antennas, absorbers, linear polarization and frequency channeling. Combining these measures, a data rate area density of up to 11 Tb/(s $\cdot$ m$^2$) seems feasible. In addition, two types of silicon sensors are tested under mm-wave irradiation . No deterioration of the performance of both prototypes is observed.

preprint2016arXiv

The MuPix System-on-Chip for the Mu3e Experiment

Mu3e is a novel experiment searching for charged lepton flavor violation in the rare decay $μ^+ \rightarrow e^+e^-e^+$. Decay vertex position, decay time and particle momenta have to be precisely measured in order to reject both accidental and physics background. A silicon pixel tracker based on $50\,μ$m thin high voltage monolithic active pixel sensors (HV-MAPS) in a 1 T solenoidal magnetic field provides precise vertex and momentum information. The MuPix chip combines pixel sensor cells with integrated analog electronics and a periphery with a complete digital readout. The MuPix7 is the first HV-MAPS prototype implementing all functionalities of the final sensor including a readout state machine and high speed serialization with 1.25 Gbit/s data output, allowing for a streaming readout in parallel to the data taking. The observed efficiency of the MuPix7 chip including the full readout system is $\geq99\%$ in a high rate test beam.