Researcher profile

Hussam Amrouch

Hussam Amrouch contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2022arXiv

AppGNN: Approximation-Aware Functional Reverse Engineering using Graph Neural Networks

The globalization of the Integrated Circuit (IC) market is attracting an ever-growing number of partners, while remarkably lengthening the supply chain. Thereby, security concerns, such as those imposed by functional Reverse Engineering (RE), have become quintessential. RE leads to disclosure of confidential information to competitors, potentially enabling the theft of intellectual property. Traditional functional RE methods analyze a given gate-level netlist through employing pattern matching towards reconstructing the underlying basic blocks, and hence, reverse engineer the circuit's function. In this work, we are the first to demonstrate that applying Approximate Computing (AxC) principles to circuits significantly improves the resiliency against RE. This is attributed to the increased complexity in the underlying pattern-matching process. The resiliency remains effective even for Graph Neural Networks (GNNs) that are presently one of the most powerful state-of-the-art techniques in functional RE. Using AxC, we demonstrate a substantial reduction in GNN average classification accuracy-- from 98% to a mere 53%. To surmount the challenges introduced by AxC in RE, we propose the highly promising AppGNN platform, which enables GNNs (still being trained on exact circuits) to: (i) perform accurate classifications, and (ii) reverse engineer the circuit functionality, notwithstanding the applied approximation technique. AppGNN accomplishes this by implementing a novel graph-based node sampling approach that mimics generic approximation methodologies, requiring zero knowledge of the targeted approximation type. We perform an extensive evaluation and show that, using our method, we can improve the classification accuracy from 53% to 81% when classifying approximate adder circuits that have been generated using evolutionary algorithms, which our method is oblivious of.

preprint2022arXiv

Automated Design Approximation to Overcome Circuit Aging

Transistor aging phenomena manifest themselves as degradations in the main electrical characteristics of transistors. Over time, they result in a significant increase of cell propagation delay, leading to errors due to timing violations, since the operating frequency becomes unsustainable as the circuit ages. Conventional techniques employ timing guardbands to mitigate aging-induced delay increase, which leads to considerable performance losses from the beginning of the circuit's lifetime. Leveraging the inherent error resilience of a vast number of application domains, approximate computing was recently introduced as an aging mitigation mechanism. In this work, we present the first automated framework for generating aging-aware approximate circuits. Our framework, by applying directed gate-level netlist approximation, induces a small functional error and recovers the delay degradation due to aging. As a result, our optimized circuits eliminate aging-induced timing errors. Experimental evaluation over a variety of arithmetic circuits and image processing benchmarks demonstrates that for an average error of merely $5\times10^{-3}$, our framework completely eliminates aging-induced timing guardbands. Compared to the respective baseline circuits without timing guardbands (i.e., iso-performance evaluation), the error of the circuits generated by our framework is $1208$x smaller.

preprint2022arXiv

GNN4REL: Graph Neural Networks for Predicting Circuit Reliability Degradation

Process variations and device aging impose profound challenges for circuit designers. Without a precise understanding of the impact of variations on the delay of circuit paths, guardbands, which keep timing violations at bay, cannot be correctly estimated. This problem is exacerbated for advanced technology nodes, where transistor dimensions reach atomic levels and established margins are severely constrained. Hence, traditional worst-case analysis becomes impractical, resulting in intolerable performance overheads. Contrarily, process-variation/aging-aware static timing analysis (STA) equips designers with accurate statistical delay distributions. Timing guardbands that are small, yet sufficient, can then be effectively estimated. However, such analysis is costly as it requires intensive Monte-Carlo simulations. Further, it necessitates access to confidential physics-based aging models to generate the standard-cell libraries required for STA. In this work, we employ graph neural networks (GNNs) to accurately estimate the impact of process variations and device aging on the delay of any path within a circuit. Our proposed GNN4REL framework empowers designers to perform rapid and accurate reliability estimations without accessing transistor models, standard-cell libraries, or even STA; these components are all incorporated into the GNN model via training by the foundry. Specifically, GNN4REL is trained on a FinFET technology model that is calibrated against industrial 14nm measurement data. Through our extensive experiments on EPFL and ITC-99 benchmarks, as well as RISC-V processors, we successfully estimate delay degradations of all paths -- notably within seconds -- with a mean absolute error down to 0.01 percentage points.

preprint2021arXiv

Control Variate Approximation for DNN Accelerators

In this work, we introduce a control variate approximation technique for low error approximate Deep Neural Network (DNN) accelerators. The control variate technique is used in Monte Carlo methods to achieve variance reduction. Our approach significantly decreases the induced error due to approximate multiplications in DNN inference, without requiring time-exhaustive retraining compared to state-of-the-art. Leveraging our control variate method, we use highly approximated multipliers to generate power-optimized DNN accelerators. Our experimental evaluation on six DNNs, for Cifar-10 and Cifar-100 datasets, demonstrates that, compared to the accurate design, our control variate approximation achieves same performance and 24% power reduction for a merely 0.16% accuracy loss.

preprint2021arXiv

Positive/Negative Approximate Multipliers for DNN Accelerators

Recent Deep Neural Networks (DNNs) managed to deliver superhuman accuracy levels on many AI tasks. Several applications rely more and more on DNNs to deliver sophisticated services and DNN accelerators are becoming integral components of modern systems-on-chips. DNNs perform millions of arithmetic operations per inference and DNN accelerators integrate thousands of multiply-accumulate units leading to increased energy requirements. Approximate computing principles are employed to significantly lower the energy consumption of DNN accelerators at the cost of some accuracy loss. Nevertheless, recent research demonstrated that complex DNNs are increasingly sensitive to approximation. Hence, the obtained energy savings are often limited when targeting tight accuracy constraints. In this work, we present a dynamically configurable approximate multiplier that supports three operation modes, i.e., exact, positive error, and negative error. In addition, we propose a filter-oriented approximation method to map the weights to the appropriate modes of the approximate multiplier. Our mapping algorithm balances the positive with the negative errors due to the approximate multiplications, aiming at maximizing the energy reduction while minimizing the overall convolution error. We evaluate our approach on multiple DNNs and datasets against state-of-the-art approaches, where our method achieves 18.33% energy gains on average across 7 NNs on 4 different datasets for a maximum accuracy drop of only 1%.

preprint2021arXiv

Reliability-Aware Quantization for Anti-Aging NPUs

Transistor aging is one of the major concerns that challenges designers in advanced technologies. It profoundly degrades the reliability of circuits during its lifetime as it slows down transistors resulting in errors due to timing violations unless large guardbands are included, which leads to considerable performance losses. When it comes to Neural Processing Units (NPUs), where increasing the inference speed is the primary goal, such performance losses cannot be tolerated. In this work, we are the first to propose a reliability-aware quantization to eliminate aging effects in NPUs while completely removing guardbands. Our technique delivers a graceful inference accuracy degradation over time while compensating for the aging-induced delay increase of the NPU. Our evaluation, over ten state-of-the-art neural network architectures trained on the ImageNet dataset, demonstrates that for an entire lifetime of 10 years, the average accuracy loss is merely 3%. In the meantime, our technique achieves 23% higher performance due to the elimination of the aging guardband.

preprint2020arXiv

Power Side-Channel Attacks in Negative Capacitance Transistor (NCFET)

Side-channel attacks have empowered bypassing of cryptographic components in circuits. Power side-channel (PSC) attacks have received particular traction, owing to their non-invasiveness and proven effectiveness. Aside from prior art focused on conventional technologies, this is the first work to investigate the emerging Negative Capacitance Transistor (NCFET) technology in the context of PSC attacks. We implement a CAD flow for PSC evaluation at design-time. It leverages industry-standard design tools, while also employing the widely-accepted correlation power analysis (CPA) attack. Using standard-cell libraries based on the 7nm FinFET technology for NCFET and its counterpart CMOS setup, our evaluation reveals that NCFET-based circuits are more resilient to the classical CPA attack, due to the considerable effect of negative capacitance on the switching power. We also demonstrate that the thicker the ferroelectric layer, the higher the resiliency of the NCFET-based circuit, which opens new doors for optimization and trade-offs.

preprint2020arXiv

Run-Time Accuracy Reconfigurable Stochastic Computing for Dynamic Reliability and Power Management

In this paper, we propose a novel accuracy-reconfigurable stochastic computing (ARSC) framework for dynamic reliability and power management. Different than the existing stochastic computing works, where the accuracy versus power/energy trade-off is carried out in the design time, the new ARSC design can change accuracy or bit-width of the data in the run-time so that it can accommodate the long-term aging effects by slowing the system clock frequency at the cost of accuracy while maintaining the throughput of the computing. We validate the ARSC concept on a discrete cosine transformation (DCT) and inverse DCT designs for image compressing/decompressing applications, which are implemented on Xilinx Spartan-6 family XC6SLX45 platform. Experimental results shows that the new design can easily mitigate the long-term aging induced effects by accuracy trade-off while maintaining the throughput of the whole computing process using simple frequency scaling. We further show that one-bit precision loss for input data, which translated to 3.44dB of the accuracy loss in term of Peak Signal to Noise Ratio for images, we can sufficiently compensate the NBTI induced aging effects in 10 years while maintaining the pre-aging computing throughput of 7.19 frames per second. At the same time, we can save 74\% power consumption by 10.67dB of accuracy loss. The proposed ARSC computing framework also allows much aggressive frequency scaling, which can lead to order of magnitude power savings compared to the traditional dynamic voltage and frequency scaling (DVFS) techniques.

preprint2020arXiv

SoftWear: Software-Only In-Memory Wear-Leveling for Non-Volatile Main Memory

Several emerging technologies for byte-addressable non-volatile memory (NVM) have been considered to replace DRAM as the main memory in computer systems during the last years. The disadvantage of a lower write endurance, compared to DRAM, of NVM technologies like Phase-Change Memory (PCM) or Ferroelectric RAM (FeRAM) has been addressed in the literature. As a solution, in-memory wear-leveling techniques have been proposed, which aim to balance the wear-level over all memory cells to achieve an increased memory lifetime. Generally, to apply such advanced aging-aware wear-leveling techniques proposed in the literature, additional special hardware is introduced into the memory system to provide the necessary information about the cell age and thus enable aging-aware wear-leveling decisions. This paper proposes software-only aging-aware wear-leveling based on common CPU features and does not rely on any additional hardware support from the memory subsystem. Specifically, we exploit the memory management unit (MMU), performance counters, and interrupts to approximate the memory write counts as an aging indicator. Although the software-only approach may lead to slightly worse wear-leveling, it is applicable on commonly available hardware. We achieve page-level coarse-grained wear-leveling by approximating the current cell age through statistical sampling and performing physical memory remapping through the MMU. This method results in non-uniform memory usage patterns within a memory page. Hence, we further propose a fine-grained wear-leveling in the stack region of C / C++ compiled software. By applying both wear-leveling techniques, we achieve up to $78.43\%$ of the ideal memory lifetime, which is a lifetime improvement of more than a factor of $900$ compared to the lifetime without any wear-leveling.