Source author record

Anand Raghunathan

Anand Raghunathan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Emerging Technologies Computer Vision Hardware Architecture Artificial Intelligence Computation and Language cond-mat.dis-nn cond-mat.mes-hall Cryptography and Security eess.SP eess.SY Neural and Evolutionary Computing Other Computer Science Systems and Control

Catalog footprint

What is connected

16works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A3D: Agentic AI flow for autonomous Accelerator Design

Accelerating applications through the design of hardware accelerators can significantly enhance system performance and energy efficiency. Despite advances, such as high-level synthesis (HLS), designing accelerators for complex applications still remains highly labor-intensive, demanding considerable expertise in understanding workloads to be accelerated, hardware design, micro-architecture, and EDA tool usage, posing challenges for application domain experts. Therefore, most accelerator solutions are limited to applications with a regular predictable dataflow. Advances in AI have enabled agents that perform autonomous planning, reasoning, execution and reflection, leading to unprecedented potential for automation through agentic AI. We present A3D, an Agentic AI flow for end-to-end Automation of hardware Accelerator Design. A3D automates workload analysis, performance bottleneck identification, code refactoring for HLS compatibility and micro-architecture generation. A3D also generates diverse accelerator designs by automatically exploring the speed-area tradeoff space. Recent efforts have explored the use of AI for specific tasks such as design space exploration in HLS, leaving several tasks to still be performed manually. A3D addresses the challenges in applying modern LLMs to accelerator design by judiciously partitioning tasks among specialist agents, orchestrating process loops with specialist and verifier agents, utilizing pre-existing and custom tools, and employing agentic RAG for codebase and proprietary EDA tool documentation exploration. Our implementation of A3D, using commercial components like Claude Sonnet 4.5 and the Catapult HLS tool, demonstrates its effectiveness by generating accelerator designs with no human intervention from complex scientific applications like LAMMPS (molecular dynamics simulation) and QMCPACK (quantum chemistry).

preprint2022arXiv

A Co-design view of Compute in-Memory with Non-Volatile Elements for Neural Networks

Deep Learning neural networks are pervasive, but traditional computer architectures are reaching the limits of being able to efficiently execute them for the large workloads of today. They are limited by the von Neumann bottleneck: the high cost in energy and latency incurred in moving data between memory and the compute engine. Today, special CMOS designs address this bottleneck. The next generation of computing hardware will need to eliminate or dramatically mitigate this bottleneck. We discuss how compute-in-memory can play an important part in this development. Here, a non-volatile memory based cross-bar architecture forms the heart of an engine that uses an analog process to parallelize the matrix vector multiplication operation, repeatedly used in all neural network workloads. The cross-bar architecture, at times referred to as a neuromorphic approach, can be a key hardware element in future computing machines. In the first part of this review we take a co-design view of the design constraints and the demands it places on the new materials and memory devices that anchor the cross-bar architecture. In the second part, we review what is knows about the different new non-volatile memory materials and devices suited for compute in-memory, and discuss the outlook and challenges.

preprint2022arXiv

AxFormer: Accuracy-driven Approximation of Transformers for Faster, Smaller and more Accurate NLP Models

Transformers have greatly advanced the state-of-the-art in Natural Language Processing (NLP) in recent years, but present very large computation and storage requirements. We observe that the design process of Transformers (pre-train a foundation model on a large dataset in a self-supervised manner, and subsequently fine-tune it for different downstream tasks) leads to task-specific models that are highly over-parameterized, adversely impacting both accuracy and inference efficiency. We propose AxFormer, a systematic framework that applies accuracy-driven approximations to create optimized transformer models for a given downstream task. AxFormer combines two key optimizations -- accuracy-driven pruning and selective hard attention. Accuracy-driven pruning identifies and removes parts of the fine-tuned transformer that hinder performance on the given downstream task. Sparse hard-attention optimizes attention blocks in selected layers by eliminating irrelevant word aggregations, thereby helping the model focus only on the relevant parts of the input. In effect, AxFormer leads to models that are more accurate, while also being faster and smaller. Our experiments on GLUE and SQUAD tasks show that AxFormer models are up to 4.5% more accurate, while also being up to 2.5X faster and up to 3.2X smaller than conventional fine-tuned models. In addition, we demonstrate that AxFormer can be combined with previous efforts such as distillation or quantization to achieve further efficiency gains.

preprint2022arXiv

STeP-CiM: Strain-enabled Ternary Precision Computation-in-Memory based on Non-Volatile 2D Piezoelectric Transistors

We propose 2D Piezoelectric FET (PeFET) based compute-enabled non-volatile memory for ternary deep neural networks (DNNs). PeFETs consist of a material with ferroelectric and piezoelectric properties coupled with Transition Metal Dichalcogenide channel. We utilize (a) ferroelectricity to store binary bits (0/1) in the form of polarization (-P/+P) and (b) polarization dependent piezoelectricity to read the stored state by means of strain-induced bandgap change in Transition Metal Dichalcogenide channel. The unique read mechanism of PeFETs enables us to expand the traditional association of +P (-P) with low (high) resistance states to their dual high (low) resistance depending on read voltage. Specifically, we demonstrate that +P (-P) stored in PeFETs can be dynamically configured in (a) a low (high) resistance state for positive read voltages and (b) their dual high (low) resistance states for negative read voltages, without afflicting a read disturb. Such a feature, which we name as Polarization Preserved Piezoelectric Effect Reversal with Dual Voltage Polarity (PiER), is unique to PeFETs and has not been shown in hitherto explored memories. We leverage PiER to propose a Strain-enabled Ternary Precision Computation-in-Memory (STeP-CiM) cell with capabilities of computing the scalar product of the stored weight and input, both of which are represented with signed ternary precision. Further, using multi word-line assertion of STeP-CiM cells, we achieve massively parallel computation of dot products of signed ternary inputs and weights. Our array level analysis shows 91% lower delay and improvements of 15% and 91% in energy for in-memory multiply-and-accumulate operations compared to near-memory design approaches based on SRAM and PeFET respectively. STeP-CiM exhibits upto 8.91x improvement in performance and 6.07x average improvement in energy over SRAM/PeFET based near-memory design.

preprint2021arXiv

HW/SW Framework for Improving the Safety of Implantable and Wearable Medical Devices

Implantable and wearable medical devices (IWMDs) are widely used for the monitoring and therapy of an increasing range of medical conditions. Improvements in medical devices, enabled by advances in low-power processors, more complex firmware, and wireless connectivity, have greatly improved therapeutic outcomes and patients' quality-of-life. However, security attacks, malfunctions and sometimes user errors have raised great concerns regarding the safety of IWMDs. In this work, we present a HW/SW (Hardware/Software) framework for improving the safety of IWMDs, wherein a set of safety rules and a rule check mechanism are used to monitor both the extrinsic state (the patient's physiological parameters sensed by the IWMD) and the internal state of the IWMD (I/O activities of the microcontroller) to infer unsafe operations that may be triggered by user errors, software bugs, or security attacks. We discuss how this approach can be realized in the context of a artificial pancreas with wireless connectivity and implement a prototype to demonstrate its effectiveness in improving safety at modest overheads.

preprint2021arXiv

TxSim:Modeling Training of Deep Neural Networks on Resistive Crossbar Systems

Resistive crossbars have attracted significant interest in the design of Deep Neural Network (DNN) accelerators due to their ability to natively execute massively parallel vector-matrix multiplications within dense memory arrays. However, crossbar-based computations face a major challenge due to a variety of device and circuit-level non-idealities, which manifest as errors in the vector-matrix multiplications and eventually degrade DNN accuracy. To address this challenge, there is a need for tools that can model the functional impact of non-idealities on DNN training and inference. Existing efforts towards this goal are either limited to inference, or are too slow to be used for large-scale DNN training. We propose TxSim, a fast and customizable modeling framework to functionally evaluate DNN training on crossbar-based hardware considering the impact of non-idealities. The key features of TxSim that differentiate it from prior efforts are: (i) It comprehensively models non-idealities during all training operations (forward propagation, backward propagation, and weight update) and (ii) it achieves computational efficiency by mapping crossbar evaluations to well-optimized BLAS routines and incorporates speedup techniques to further reduce simulation time with minimal impact on accuracy. TxSim achieves orders-of-magnitude improvement in simulation speed over prior works, and thereby makes it feasible to evaluate training of large-scale DNNs on crossbars. Our experiments using TxSim reveal that the accuracy degradation in DNN training due to non-idealities can be substantial (3%-10%) for large-scale DNNs, underscoring the need for further research in mitigation techniques. We also analyze the impact of various device and circuit-level parameters and the associated non-idealities to provide key insights that can guide the design of crossbar-based DNN training accelerators.

preprint2020arXiv

EMPIR: Ensembles of Mixed Precision Deep Networks for Increased Robustness against Adversarial Attacks

Ensuring robustness of Deep Neural Networks (DNNs) is crucial to their adoption in safety-critical applications such as self-driving cars, drones, and healthcare. Notably, DNNs are vulnerable to adversarial attacks in which small input perturbations can produce catastrophic misclassifications. In this work, we propose EMPIR, ensembles of quantized DNN models with different numerical precisions, as a new approach to increase robustness against adversarial attacks. EMPIR is based on the observation that quantized neural networks often demonstrate much higher robustness to adversarial attacks than full precision networks, but at the cost of a substantial loss in accuracy on the original (unperturbed) inputs. EMPIR overcomes this limitation to achieve the 'best of both worlds', i.e., the higher unperturbed accuracies of the full precision models combined with the higher robustness of the low precision models, by composing them in an ensemble. Further, as low precision DNN models have significantly lower computational and storage requirements than full precision models, EMPIR models only incur modest compute and memory overheads compared to a single full-precision model (<25% in our evaluations). We evaluate EMPIR across a suite of DNNs for 3 different image recognition tasks (MNIST, CIFAR-10 and ImageNet) and under 4 different adversarial attacks. Our results indicate that EMPIR boosts the average adversarial accuracies by 42.6%, 15.2% and 10.5% for the DNN models trained on the MNIST, CIFAR-10 and ImageNet datasets respectively, when compared to single full-precision models, without sacrificing accuracy on the unperturbed inputs.

preprint2020arXiv

Pruning Filters while Training for Efficiently Optimizing Deep Learning Networks

Modern deep networks have millions to billions of parameters, which leads to high memory and energy requirements during training as well as during inference on resource-constrained edge devices. Consequently, pruning techniques have been proposed that remove less significant weights in deep networks, thereby reducing their memory and computational requirements. Pruning is usually performed after training the original network, and is followed by further retraining to compensate for the accuracy loss incurred during pruning. The prune-and-retrain procedure is repeated iteratively until an optimum tradeoff between accuracy and efficiency is reached. However, such iterative retraining adds to the overall training complexity of the network. In this work, we propose a dynamic pruning-while-training procedure, wherein we prune filters of the convolutional layers of a deep network during training itself, thereby precluding the need for separate retraining. We evaluate our dynamic pruning-while-training approach with three different pre-existing pruning strategies, viz. mean activation-based pruning, random pruning, and L1 normalization-based pruning. Our results for VGG-16 trained on CIFAR10 shows that L1 normalization provides the best performance among all the techniques explored in this work with less than 1% drop in accuracy after pruning 80% of the filters compared to the original network. We further evaluated the L1 normalization based pruning mechanism on CIFAR100. Results indicate that pruning while training yields a compressed network with almost no accuracy loss after pruning 50% of the filters compared to the original network and ~5% loss for high pruning rates (>80%). The proposed pruning methodology yields 41% reduction in the number of computations and memory accesses during training for CIFAR10, CIFAR100 and ImageNet compared to training with retraining for 10 epochs .

preprint2020arXiv

RxNN: A Framework for Evaluating Deep Neural Networks on Resistive Crossbars

Resistive crossbars designed with non-volatile memory devices have emerged as promising building blocks for Deep Neural Network (DNN) hardware, due to their ability to compactly and efficiently realize vector-matrix multiplication (VMM), the dominant computational kernel in DNNs. However, a key challenge with resistive crossbars is that they suffer from a range of device and circuit level non-idealities such as interconnect parasitics, peripheral circuits, sneak paths, and process variations. These non-idealities can lead to errors in VMMs, eventually degrading the DNN's accuracy. It is therefore critical to study the impact of crossbar non-idealities on the accuracy of large-scale DNNs. However, this is challenging because existing device and circuit models are too slow to use in application-level evaluations. We present RxNN, a fast and accurate simulation framework to evaluate large-scale DNNs on resistive crossbar systems. RxNN splits and maps the computations involved in each DNN layer into crossbar operations, and evaluates them using a Fast Crossbar Model (FCM) that accurately captures the errors arising due to crossbar non-idealities while being four-to-five orders of magnitude faster than circuit simulation. FCM models a crossbar-based VMM operation using three stages - non-linear models for the input and output peripheral circuits (DACs and ADCs), and an equivalent non-ideal conductance matrix for the core crossbar array. We implement RxNN by extending the Caffe machine learning framework and use it to evaluate a suite of six large-scale DNNs developed for the ImageNet Challenge. Our experiments reveal that resistive crossbar non-idealities can lead to significant accuracy degradations (9.6%-32%) for these large-scale DNNs. To the best of our knowledge, this work is the first quantitative evaluation of the accuracy of large-scale DNNs on resistive crossbar based hardware.

preprint2020arXiv

Sparsity Turns Adversarial: Energy and Latency Attacks on Deep Neural Networks

Adversarial attacks have exposed serious vulnerabilities in Deep Neural Networks (DNNs) through their ability to force misclassifications through human-imperceptible perturbations to DNN inputs. We explore a new direction in the field of adversarial attacks by suggesting attacks that aim to degrade the computational efficiency of DNNs rather than their classification accuracy. Specifically, we propose and demonstrate sparsity attacks, which adversarial modify a DNN's inputs so as to reduce sparsity (or the presence of zero values) in its internal activation values. In resource-constrained systems, a wide range of hardware and software techniques have been proposed that exploit sparsity to improve DNN efficiency. The proposed attack increases the execution time and energy consumption of sparsity-optimized DNN implementations, raising concern over their deployment in latency and energy-critical applications. We propose a systematic methodology to generate adversarial inputs for sparsity attacks by formulating an objective function that quantifies the network's activation sparsity, and minimizing this function using iterative gradient-descent techniques. We launch both white-box and black-box versions of adversarial sparsity attacks on image recognition DNNs and demonstrate that they decrease activation sparsity by up to 1.82x. We also evaluate the impact of the attack on a sparsity-optimized DNN accelerator and demonstrate degradations up to 1.59x in latency, and also study the performance of the attack on a sparsity-optimized general-purpose processor. Finally, we evaluate defense techniques such as activation thresholding and input quantization and demonstrate that the proposed attack is able to withstand them, highlighting the need for further efforts in this new direction within the field of adversarial machine learning.

preprint2020arXiv

TiM-DNN: Ternary in-Memory accelerator for Deep Neural Networks

The use of lower precision has emerged as a popular technique to optimize the compute and storage requirements of complex Deep Neural Networks (DNNs). In the quest for lower precision, recent studies have shown that ternary DNNs (which represent weights and activations by signed ternary values) represent a promising sweet spot, achieving accuracy close to full-precision networks on complex tasks. We propose TiM-DNN, a programmable in-memory accelerator that is specifically designed to execute ternary DNNs. TiM-DNN supports various ternary representations including unweighted {-1,0,1}, symmetric weighted {-a,0,a}, and asymmetric weighted {-a,0,b} ternary systems. The building blocks of TiM-DNN are TiM tiles -- specialized memory arrays that perform massively parallel signed ternary vector-matrix multiplications with a single access. TiM tiles are in turn composed of Ternary Processing Cells (TPCs), bit-cells that function as both ternary storage units and signed ternary multiplication units. We evaluate an implementation of TiM-DNN in 32nm technology using an architectural simulator calibrated with SPICE simulations and RTL synthesis. We evaluate TiM-DNN across a suite of state-of-the-art DNN benchmarks including both deep convolutional and recurrent neural networks. A 32-tile instance of TiM-DNN achieves a peak performance of 114 TOPs/s, consumes 0.9W power, and occupies 1.96mm2 chip area, representing a 300X and 388X improvement in TOPS/W and TOPS/mm2, respectively, compared to an NVIDIA Tesla V100 GPU. In comparison to specialized DNN accelerators, TiM-DNN achieves 55X-240X and 160X-291X improvement in TOPS/W and TOPS/mm2, respectively. Finally, when compared to a well-optimized near-memory accelerator for ternary DNNs, TiM-DNN demonstrates 3.9x-4.7x improvement in system-level energy and 3.2x-4.2x speedup, underscoring the potential of in-memory computing for ternary DNNs.

preprint2016arXiv

Energy-Efficient Object Detection using Semantic Decomposition

Machine-learning algorithms offer immense possibilities in the development of several cognitive applications. In fact, large scale machine-learning classifiers now represent the state-of-the-art in a wide range of object detection/classification problems. However, the network complexities of large-scale classifiers present them as one of the most challenging and energy intensive workloads across the computing spectrum. In this paper, we present a new approach to optimize energy efficiency of object detection tasks using semantic decomposition to build a hierarchical classification framework. We observe that certain semantic information like color/texture are common across various images in real-world datasets for object detection applications. We exploit these common semantic features to distinguish the objects of interest from the remaining inputs (non-objects of interest) in a dataset at a lower computational effort. We propose a 2-stage hierarchical classification framework, with increasing levels of complexity, wherein the first stage is trained to recognize the broad representative semantic features relevant to the object of interest. The first stage rejects the input instances that do not have the representative features and passes only the relevant instances to the second stage. Our methodology thus allows us to reject certain information at lower complexity and utilize the full computational effort of a network only on a smaller fraction of inputs to perform detection. We use color and texture as distinctive traits to carry out several experiments for object detection. Our experiments on the Caltech101/CIFAR10 dataset show that the proposed method yields 1.93x/1.46x improvement in average energy, respectively, over the traditional single classifier model.

preprint2016arXiv

Yield, Area and Energy Optimization in Stt-MRAMs using failure aware ECC

Spin Transfer Torque MRAMs are attractive due to their non-volatility, high density and zero leakage. However, STT-MRAMs suffer from poor reliability due to shared read and write paths. Additionally, conflicting requirements for data retention and write-ability (both related to the energy barrier height of the magnet) makes design more challenging. Furthermore, the energy barrier height depends on the physical dimensions of the free layer. Any variations in the dimensions of the free layer lead to variations in the energy barrier height. In order to address poor reliability of STT-MRAMs, usage of Error Correcting Codes (ECC) have been proposed. Unlike traditional CMOS memory technologies, ECC is expected to correct both soft and hard errors in STT_MRAMs. To achieve acceptable yield with low write power, stronger ECC is required, resulting in increased number of encoded bits and degraded memory efficiency. In this paper, we propose Failure aware ECC (FaECC), which masks permanent faults while maintaining the same correction capability for soft errors without increased encoded bits. Furthermore, we investigate the impact of process variations on run-time reliability of STT-MRAMs. We provide an analysis on the impact of process variations on the life-time of the free layer and retention failures. In order to analyze the effectiveness of our methodology, we developed a cross-layer simulation framework that consists of device, circuit and array level analysis of STT-MRAM memory arrays. Our results show that using FaECC relaxes the requirements on the energy barrier height, which reduces the write energy and results in smaller access transistor size and memory array area. Keywords: STT-MRAM, reliability, Error Correcting Codes, ECC, magnetic memory

preprint2015arXiv

Exploring Spin-Transfer-Torque Devices for Logic Applications

As CMOS nears the end of the projected scaling roadmap, significant effort has been devoted to the search for new materials and devices that can realize memory and logic. Spintronics, is one of the promising directions for the Post-CMOS era. While the potential of spintronic memories is relatively well known, realizing logic remains an open and critical challenge. All Spin Logic (ASL) is a recently proposed logic style that realizes Boolean logic using spin-transfer-torque (STT) devices based on the principle of non-local spin torque. ASL has advantages such as density, non-volatility, and low operating voltage. However, it also suffers from drawbacks such as low speed and static power dissipation. Recent work has shown that, in the context of simple arithmetic circuits (adders, multipliers), the efficiency of ASL can be greatly improved using techniques that utilize its unique characteristics. An evaluation of ASL across a broad range of circuits, considering the known optimization techniques, is an important next step in determining its viability. In this work, we propose a systematic methodology for the synthesis of ASL circuits. Our methodology performs various optimizations that benefit ASL, such as intra-cycle power gating, stacking of ASL nanomagnets, and fine-grained logic pipelining. We utilize the proposed methodology to evaluate the suitability of ASL implementations for a wide range of benchmarks viz. random combinational and sequential logic, digital signal processing circuits, and the Leon SPARC3 general-purpose processor. Based on our evaluation, we identify (i) the large current requirement of nanomagnets at fast switching speeds, (ii) the static power dissipation in the all-metallic devices, and (iii) the short spin flip length in interconnects as key bottlenecks that limit the competitiveness of ASL.

preprint2014arXiv

STT-SNN: A Spin-Transfer-Torque Based Soft-Limiting Non-Linear Neuron for Low-Power Artificial Neural Networks

Recent years have witnessed growing interest in the use of Artificial Neural Networks (ANNs) for vision, classification, and inference problems. An artificial neuron sums N weighted inputs and passes the result through a non-linear transfer function. Large-scale ANNs impose very high computing requirements for training and classification, leading to great interest in the use of post-CMOS devices to realize them in an energy efficient manner. In this paper, we propose a spin-transfer-torque (STT) device based on Domain Wall Motion (DWM) magnetic strip that can efficiently implement a Soft-limiting Non-linear Neuron (SNN) operating at ultra-low supply voltage and current. In contrast to previous spin-based neurons that can only realize hard-limiting transfer functions, the proposed STT-SNN displays a continuous resistance change with varying input current, and can therefore be employed to implement a soft-limiting neuron transfer function. Soft-limiting neurons are greatly preferred to hard-limiting ones due to their much improved modeling capacity, which leads to higher network accuracy and lower network complexity. We also present an ANN hardware design employing the proposed STT-SNNs and Memristor Crossbar Arrays (MCA) as synapses. The ultra-low voltage operation of the magneto metallic STT-SNN enables the programmable MCA-synapses, computing analog-domain weighted summation of input voltages, to also operate at ultra-low voltage. We modeled the STT-SNN using micro-magnetic simulation and evaluated them using an ANN for character recognition. Comparisons with analog and digital CMOS neurons show that STT-SNNs can achieve more than two orders of magnitude lower energy consumption.

preprint2007arXiv

Hardware Accelerated Power Estimation

In this paper, we present power emulation, a novel design paradigm that utilizes hardware acceleration for the purpose of fast power estimation. Power emulation is based on the observation that the functions necessary for power estimation (power model evaluation, aggregation, etc.) can be implemented as hardware circuits. Therefore, we can enhance any given design with "power estimation hardware", map it to a prototyping platform, and exercise it with any given test stimuli to obtain power consumption estimates. Our empirical studies with industrial designs reveal that power emulation can achieve significant speedups (10X to 500X) over state-of-the-art commercial register-transfer level (RTL) power estimation tools.

Anand Raghunathan

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

A3D: Agentic AI flow for autonomous Accelerator Design

A Co-design view of Compute in-Memory with Non-Volatile Elements for Neural Networks

AxFormer: Accuracy-driven Approximation of Transformers for Faster, Smaller and more Accurate NLP Models

STeP-CiM: Strain-enabled Ternary Precision Computation-in-Memory based on Non-Volatile 2D Piezoelectric Transistors

HW/SW Framework for Improving the Safety of Implantable and Wearable Medical Devices

TxSim:Modeling Training of Deep Neural Networks on Resistive Crossbar Systems

EMPIR: Ensembles of Mixed Precision Deep Networks for Increased Robustness against Adversarial Attacks

Pruning Filters while Training for Efficiently Optimizing Deep Learning Networks

RxNN: A Framework for Evaluating Deep Neural Networks on Resistive Crossbars

Sparsity Turns Adversarial: Energy and Latency Attacks on Deep Neural Networks

TiM-DNN: Ternary in-Memory accelerator for Deep Neural Networks

Energy-Efficient Object Detection using Semantic Decomposition

Yield, Area and Energy Optimization in Stt-MRAMs using failure aware ECC

Exploring Spin-Transfer-Torque Devices for Logic Applications

STT-SNN: A Spin-Transfer-Torque Based Soft-Limiting Non-Linear Neuron for Low-Power Artificial Neural Networks

Hardware Accelerated Power Estimation