Source author record

Rainer Leupers

Rainer Leupers appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Hardware Architecture Emerging Technologies Cryptography and Security Distributed, Parallel, and Cluster Computing eess.SY Machine Learning Mathematical Software Neural and Evolutionary Computing Operating Systems Programming Languages Software Engineering Systems and Control

Catalog footprint

What is connected

15works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

NQC2: A Non-Intrusive QEMU Code Coverage Plugin

Code coverage analysis has become a standard approach in software development, facilitating the assessment of test suite effectiveness, the identification of under-tested code segments, and the discovery of performance bottlenecks. When code coverage of software for embedded systems needs to be measured, conventional approaches quickly meet their limits. A commonly used approach involves instrumenting the source files with added code that collects and dumps coverage information during runtime. This inserted code usually relies on the existence of an operating and a file system to dump the collected data. These features are not available for bare-metal programs that are executed on embedded systems. To overcome this issue, we present NQC2, a plugin for QEMU.NQC2 extracts coverage information from QEMU during runtime and stores them into a file on the host machine. This approach is even compatible with modified QEMU versions and does not require target-software instrumentation. NQC2 outperforms a comparable approach from Xilinx by up to 8.5 x.

preprint2022arXiv

A Temperature Independent Readout Circuit for ISFET-Based Sensor Applications

The ion-sensitive field-effect transistor (ISFET) is an emerging technology that has received much attention in numerous research areas, including biochemistry, medicine, and security applications. However, compared to other types of sensors, the complexity of ISFETs make it more challenging to achieve a sensitive, fast and repeatable response. Therefore, various readout circuits have been developed to improve the performance of ISFETs, especially to eliminate the temperature effect. This paper presents a new approach for a temperature-independent readout circuit that uses the threshold voltage differences of an ISFET-MOSFET pair. The Linear Technology Simulation Program with Integrated Circuit Emphasis (LTspice) is used to analyze the ISFET performance based on the proposed readout circuit characteristics. A macro-model is used to model ISFET behavior, including the first-level Spice model for the MOSFET part and Verilog-A to model the surface potential, reference electrode, and electrolyte of the ISFET to determine the relationships between variables.In this way, the behavior of the ISFET is monitored by the output voltage of the readout circuit based on a change in the electrolyte's hydrogen potential (pH), determined by the simulation. The proposed readout circuit has a temperature coefficient of 11.9 $ppm/°C$ for a temperature range of 0-100 $°C$ and pH between 1 and 13. The proposed ISFET readout circuit outperforms other designs in terms of simplicity and not requiring an additional sensor.

preprint2022arXiv

Designing ML-Resilient Locking at Register-Transfer Level

Various logic-locking schemes have been proposed to protect hardware from intellectual property piracy and malicious design modifications. Since traditional locking techniques are applied on the gate-level netlist after logic synthesis, they have no semantic knowledge of the design function. Data-driven, machine-learning (ML) attacks can uncover the design flaws within gate-level locking. Recent proposals on register-transfer level (RTL) locking have access to semantic hardware information. We investigate the resilience of ASSURE, a state-of-the-art RTL locking method, against ML attacks. We used the lessons learned to derive two ML-resilient RTL locking schemes built to reinforce ASSURE locking. We developed ML-driven security metrics to evaluate the schemes against an RTL adaptation of the state-of-the-art, ML-based SnapShot attack.

preprint2022arXiv

EmuNoC: Hybrid Emulation for Fast and Flexible Network-on-Chip Prototyping on FPGAs

Networks-on-Chips (NoCs) recently became widely used, from multi-core CPUs to edge-AI accelerators. Emulation on FPGAs promises to accelerate their RTL modeling compared to slow simulations. However, realistic test stimuli are challenging to generate in hardware for diverse applications. In other words, both a fast and flexible design framework is required. The most promising solution is hybrid emulation, in which parts of the design are simulated in software, and the other parts are emulated in hardware. This paper proposes a novel hybrid emulation framework called EmuNoC. We introduce a clock-synchronization method and software-only packet generation that improves the emulation speed by 36.3x to 79.3x over state-of-the-art frameworks while retaining the flexibility of a pure-software interface for stimuli simulation. We also increased the area efficiency to model up to an NoC with 169 routers on a single FPGA, while previous frameworks only achieved 64 routers.

preprint2022arXiv

PA-PUF: A Novel Priority Arbiter PUF

This paper proposes a 3-input arbiter-based novel physically unclonable function (PUF) design. Firstly, a 3-input priority arbiter is designed using a simple arbiter, two multiplexers (2:1), and an XOR logic gate. The priority arbiter has an equal probability of 0's and 1's at the output, which results in excellent uniformity (49.45%) while retrieving the PUF response. Secondly, a new PUF design based on priority arbiter PUF (PA-PUF) is presented. The PA-PUF design is evaluated for uniqueness, non-linearity, and uniformity against the standard tests. The proposed PA-PUF design is configurable in challenge-response pairs through an arbitrary number of feed-forward priority arbiters introduced to the design. We demonstrate, through extensive experiments, reliability of 100% after performing the error correction techniques and uniqueness of 49.63%. Finally, the design is compared with the literature to evaluate its implementation efficiency, where it is clearly found to be superior compared to the state-of-the-art.

preprint2022arXiv

pHGen: A pH-Based Key Generation Mechanism Using ISFETs

Digital keys are a fundamental component of many hardware- and software-based security mechanisms. However, digital keys are limited to binary values and easily exploitable when stored in standard memories. In this paper, based on emerging technologies, we introduce pHGen, a potential-of-hydrogen (pH)-based key generation mechanism that leverages chemical reactions in the form of a potential change in ion-sensitive field-effect transistors (ISFETs). The threshold voltage of ISFETs is manipulated corresponding to a known pH buffer solution (key) in which the transistors are immersed. To read the chemical information effectively via ISFETs, we designed a readout circuit for stable operation and detection of voltage thresholds. To demonstrate the applicability of the proposed key generation, we utilize pHGen for logic locking -- a hardware integrity protection scheme. The proposed key-generation method breaks the limits of binary values and provides the first steps toward the utilization of multi-valued voltage thresholds of ISFETs controlled by chemical information. The pHGen approach is expected to be a turning point for using more sophisticated bio-based analog keys for securing next-generation electronics.

preprint2022arXiv

X-Fault: Impact of Faults on Binary Neural Networks in Memristor-Crossbar Arrays with Logic-in-Memory Computation

Memristor-based crossbar arrays represent a promising emerging memory technology to replace conventional memories by offering a high density and enabling computing-in-memory (CIM) paradigms. While analog computing provides the best performance, non-idealities and ADC/DAC conversion limit memristor-based CIM. Logic-in-Memory (LIM) presents another flavor of CIM, in which the memristors are used in a binary manner to implement logic gates. Since binary neural networks (BNNs) use binary logic gates as the dominant operation, they can benefit from the massively parallel execution of binary operations and better resilience to variations of the memristors. Although conventional neural networks have been thoroughly investigated, the impact of faults on memristor-based BNNs remains unclear. Therefore, we analyze the impact of faults on logic gates in memristor-based crossbar arrays for BNNs. We propose a simulation framework that simulates different traditional faults to examine the accuracy loss of BNNs on memristive crossbar arrays. In addition, we compare different logic families based on the robustness and feasibility to accelerate AI applications.

preprint2021arXiv

An Investigation on Inherent Robustness of Posit Data Representation

As the dimensions and operating voltages of computer electronics shrink to cope with consumers' demand for higher performance and lower power consumption, circuit sensitivity to soft errors increases dramatically. Recently, a new data-type is proposed in the literature called posit data type. Posit arithmetic has absolute advantages such as higher numerical accuracy, speed, and simpler hardware design than IEEE 754-2008 technical standard-compliant arithmetic. In this paper, we propose a comparative robustness study between 32-bit posit and 32-bit IEEE 754-2008 compliant representations. At first, we propose a theoretical analysis for IEEE 754 compliant numbers and posit numbers for single bit flip and double bit flips. Then, we conduct exhaustive fault injection experiments that show a considerable inherent resilience in posit format compared to classical IEEE 754 compliant representation. To show a relevant use-case of fault-tolerant applications, we perform experiments on a set of machine-learning applications. In more than 95% of the exhaustive fault injection exploration, posit representation is less impacted by faults than the IEEE 754 compliant floating-point representation. Moreover, in 100% of the tested machine-learning applications, the accuracy of posit-implemented systems is higher than the classical floating-point-based ones.

preprint2021arXiv

ANDROMEDA: An FPGA Based RISC-V MPSoC Exploration Framework

With the growing demands of consumer electronic products, the computational requirements are increasing exponentially. Due to the applications' computational needs, the computer architects are trying to pack as many cores as possible on a single die for accelerated execution of the application program codes. In a multiprocessor system-on-chip (MPSoC), striking a balance among the number of cores, memory subsystems, and network-on-chip parameters is essential to attain the desired performance. In this paper, we present ANDROMEDA, a RISC-V based framework that allows us to explore the different configurations of an MPSoC and observe the performance penalties and gains. We emulate the various configurations of MPSoC on the Synopsys HAPS-80D Dual FPGA platform. Using STREAM, matrix multiply, and N-body simulations as benchmarks, we demonstrate our framework's efficacy in quickly identifying the right parameters for efficient execution of these benchmarks.

preprint2021arXiv

Architecture, Dataflow and Physical Design Implications of 3D-ICs for DNN-Accelerators

The everlasting demand for higher computing power for deep neural networks (DNNs) drives the development of parallel computing architectures. 3D integration, in which chips are integrated and connected vertically, can further increase performance because it introduces another level of spatial parallelism. Therefore, we analyze dataflows, performance, area, power and temperature of such 3D-DNN-accelerators. Monolithic and TSV-based stacked 3D-ICs are compared against 2D-ICs. We identify workload properties and architectural parameters for efficient 3D-ICs and achieve up to 9.14x speedup of 3D vs. 2D. We discuss area-performance trade-offs. We demonstrate applicability as the 3D-IC draws similar power as 2D-ICs and is not thermal limited.

preprint2021arXiv

Brightening the Optical Flow through Posit Arithmetic

As new technologies are invented, their commercial viability needs to be carefully examined along with their technical merits and demerits. The posit data format, proposed as a drop-in replacement for IEEE 754 float format, is one such invention that requires extensive theoretical and experimental study to identify products that can benefit from the advantages of posits for specific market segments. In this paper, we present an extensive empirical study of posit-based arithmetic vis-à-vis IEEE 754 compliant arithmetic for the optical flow estimation method called Lucas-Kanade (LuKa). First, we use SoftPosit and SoftFloat format emulators to perform an empirical error analysis of the LuKa method. Our study shows that the average error in LuKa with SoftPosit is an order of magnitude lower than LuKa with SoftFloat. We then present the integration of the hardware implementation of a posit adder and multiplier in a RISC-V open-source platform. We make several recommendations, along with the analysis of LuKa in the RISC-V context, for future generation platforms incorporating posit arithmetic units.

preprint2021arXiv

NeuroHammer: Inducing Bit-Flips in Memristive Crossbar Memories

Emerging non-volatile memory (NVM) technologies offer unique advantages in energy efficiency, latency, and features such as computing-in-memory. Consequently, emerging NVM technologies are considered an ideal substrate for computation and storage in future-generation neuromorphic platforms. These technologies need to be evaluated for fundamental reliability and security issues. In this paper, we present \emph{NeuroHammer}, a security threat in ReRAM crossbars caused by thermal crosstalk between memory cells. We demonstrate that bit-flips can be deliberately induced in ReRAM devices in a crossbar by systematically writing adjacent memory cells. A simulation flow is developed to evaluate NeuroHammer and the impact of physical parameters on the effectiveness of the attack. Finally, we discuss the security implications in the context of possible attack scenarios.

preprint2020arXiv

Dataflow Aware Mapping of Convolutional Neural Networks Onto Many-Core Platforms With Network-on-Chip Interconnect

Machine intelligence, especially using convolutional neural networks (CNNs), has become a large area of research over the past years. Increasingly sophisticated hardware accelerators are proposed that exploit e.g. the sparsity in computations and make use of reduced precision arithmetic to scale down the energy consumption. However, future platforms require more than just energy efficiency: Scalability is becoming an increasingly important factor. The required effort for physical implementation grows with the size of the accelerator making it more difficult to meet target constraints. Using many-core platforms consisting of several homogeneous cores can alleviate the aforementioned limitations with regard to physical implementation at the expense of an increased dataflow mapping effort. While the dataflow in CNNs is deterministic and can therefore be optimized offline, the problem of finding a suitable scheme that minimizes both runtime and off-chip memory accesses is a challenging task which becomes even more complex if an interconnect system is involved. This work presents an automated mapping strategy starting at the single-core level with different optimization targets for minimal runtime and minimal off-chip memory accesses. The strategy is then extended towards a suitable many-core mapping scheme and evaluated using a scalable system-level simulation with a network-on-chip interconnect. Design space exploration is performed by mapping the well-known CNNs AlexNet and VGG-16 to platforms of different core counts and computational power per core in order to investigate the trade-offs. Our mapping strategy and system setup is scaled starting from the single core level up to 128 cores, thereby showing the limits of the selected approach.

preprint2013arXiv

EURETILE 2010-2012 summary: first three years of activity of the European Reference Tiled Experiment

This is the summary of first three years of activity of the EURETILE FP7 project 247846. EURETILE investigates and implements brain-inspired and fault-tolerant foundational innovations to the system architecture of massively parallel tiled computer architectures and the corresponding programming paradigm. The execution targets are a many-tile HW platform, and a many-tile simulator. A set of SW process - HW tile mapping candidates is generated by the holistic SW tool-chain using a combination of analytic and bio-inspired methods. The Hardware dependent Software is then generated, providing OS services with maximum efficiency/minimal overhead. The many-tile simulator collects profiling data, closing the loop of the SW tool chain. Fine-grain parallelism inside processes is exploited by optimized intra-tile compilation techniques, but the project focus is above the level of the elementary tile. The elementary HW tile is a multi-processor, which includes a fault tolerant Distributed Network Processor (for inter-tile communication) and ASIP accelerators. Furthermore, EURETILE investigates and implements the innovations for equipping the elementary HW tile with high-bandwidth, low-latency brain-like inter-tile communication emulating 3 levels of connection hierarchy, namely neural columns, cortical areas and cortex, and develops a dedicated cortical simulation benchmark: DPSNN-STDP (Distributed Polychronous Spiking Neural Net with synaptic Spiking Time Dependent Plasticity). EURETILE leverages on the multi-tile HW paradigm and SW tool-chain developed by the FET-ACA SHAPES Integrated Project (2006-2009).

preprint2010arXiv

A Scalable VLSI Architecture for Soft-Input Soft-Output Depth-First Sphere Decoding

Multiple-input multiple-output (MIMO) wireless transmission imposes huge challenges on the design of efficient hardware architectures for iterative receivers. A major challenge is soft-input soft-output (SISO) MIMO demapping, often approached by sphere decoding (SD). In this paper, we introduce the - to our best knowledge - first VLSI architecture for SISO SD applying a single tree-search approach. Compared with a soft-output-only base architecture similar to the one proposed by Studer et al. in IEEE J-SAC 2008, the architectural modifications for soft input still allow a one-node-per-cycle execution. For a 4x4 16-QAM system, the area increases by 57% and the operating frequency degrades by 34% only.

Rainer Leupers

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

NQC2: A Non-Intrusive QEMU Code Coverage Plugin

A Temperature Independent Readout Circuit for ISFET-Based Sensor Applications

Designing ML-Resilient Locking at Register-Transfer Level

EmuNoC: Hybrid Emulation for Fast and Flexible Network-on-Chip Prototyping on FPGAs

PA-PUF: A Novel Priority Arbiter PUF

pHGen: A pH-Based Key Generation Mechanism Using ISFETs

X-Fault: Impact of Faults on Binary Neural Networks in Memristor-Crossbar Arrays with Logic-in-Memory Computation

An Investigation on Inherent Robustness of Posit Data Representation

ANDROMEDA: An FPGA Based RISC-V MPSoC Exploration Framework

Architecture, Dataflow and Physical Design Implications of 3D-ICs for DNN-Accelerators

Brightening the Optical Flow through Posit Arithmetic

NeuroHammer: Inducing Bit-Flips in Memristive Crossbar Memories

Dataflow Aware Mapping of Convolutional Neural Networks Onto Many-Core Platforms With Network-on-Chip Interconnect

EURETILE 2010-2012 summary: first three years of activity of the European Reference Tiled Experiment

A Scalable VLSI Architecture for Soft-Input Soft-Output Depth-First Sphere Decoding