Source author record

Deliang Fan

Deliang Fan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Emerging Technologies Machine Learning cond-mat.dis-nn Cryptography and Security Computer Vision Hardware Architecture Artificial Intelligence cond-mat.mes-hall Neural and Evolutionary Computing cond-mat.mtrl-sci

Catalog footprint

What is connected

21works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AdpSplit: Error-Driven Adaptive Splitting for Faster Geometry Discovery in 3D Gaussian Splatting

Adaptive density control in 3D Gaussian Splatting (3DGS) repeatedly grows the Gaussian population through fixed-cardinality random splitting to discover useful scene structure. However, in vanilla 3DGS, its binary split operator requires many densification rounds to expose fine details, making it a bottleneck for efficient training schedules with fewer iterations. We introduce AdpSplit, an error-driven adaptive split operator that determines the number of split children and initializes the child parameters from L1-pixel-error region statistics, enabling fewer densification iterations, thus reduced training time, while preserving the rendering quality of full-schedule training. Across the MipNeRF360, Deep-Blending, and Tanks&Temples datasets, AdpSplit reduces the training time of multiple accelerated 3DGS pipelines by 9.2%-22.3% as a simple drop-in replacement for the standard split operator. With FastGS, AdpSplit matches the full-schedule PSNR on MipNeRF360 while reducing training time by 16.4%, corresponding to a 12.6x acceleration over vanilla 3DGS.

preprint2022arXiv

ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated Learning

This work aims to tackle Model Inversion (MI) attack on Split Federated Learning (SFL). SFL is a recent distributed training scheme where multiple clients send intermediate activations (i.e., feature map), instead of raw data, to a central server. While such a scheme helps reduce the computational load at the client end, it opens itself to reconstruction of raw data from intermediate activation by the server. Existing works on protecting SFL only consider inference and do not handle attacks during training. So we propose ResSFL, a Split Federated Learning Framework that is designed to be MI-resistant during training. It is based on deriving a resistant feature extractor via attacker-aware training, and using this extractor to initialize the client-side model prior to standard SFL training. Such a method helps in reducing the computational complexity due to use of strong inversion model in client-side adversarial training as well as vulnerability of attacks launched in early training epochs. On CIFAR-100 dataset, our proposed framework successfully mitigates MI attack on a VGG-11 model with a high reconstruction Mean-Square-Error of 0.050 compared to 0.005 obtained by the baseline system. The framework achieves 67.5% accuracy (only 1% accuracy drop) with very low computation overhead. Code is released at: https://github.com/zlijingtao/ResSFL.

preprint2022arXiv

TRGP: Trust Region Gradient Projection for Continual Learning

Catastrophic forgetting is one of the major challenges in continual learning. To address this issue, some existing methods put restrictive constraints on the optimization space of the new task for minimizing the interference to old tasks. However, this may lead to unsatisfactory performance for the new task, especially when the new task is strongly correlated with old tasks. To tackle this challenge, we propose Trust Region Gradient Projection (TRGP) for continual learning to facilitate the forward knowledge transfer based on an efficient characterization of task correlation. Particularly, we introduce a notion of `trust region' to select the most related old tasks for the new task in a layer-wise and single-shot manner, using the norm of gradient projection onto the subspace spanned by task inputs. Then, a scaled weight projection is proposed to cleverly reuse the frozen weights of the selected old tasks in the trust region through a layer-wise scaling matrix. By jointly optimizing the scaling matrices and the model, where the model is updated along the directions orthogonal to the subspaces of old tasks, TRGP can effectively prompt knowledge transfer without forgetting. Extensive experiments show that our approach achieves significant improvement over related state-of-the-art methods.

preprint2021arXiv

NeurObfuscator: A Full-stack Obfuscation Tool to Mitigate Neural Architecture Stealing

Neural network stealing attacks have posed grave threats to neural network model deployment. Such attacks can be launched by extracting neural architecture information, such as layer sequence and dimension parameters, through leaky side-channels. To mitigate such attacks, we propose NeurObfuscator, a full-stack obfuscation tool to obfuscate the neural network architecture while preserving its functionality with very limited performance overhead. At the heart of this tool is a set of obfuscating knobs, including layer branching, layer widening, selective fusion and schedule pruning, that increase the number of operators, reduce/increase the latency, and number of cache and DRAM accesses. A genetic algorithm-based approach is adopted to orchestrate the combination of obfuscating knobs to achieve the best obfuscating effect on the layer sequence and dimension parameters so that the architecture information cannot be successfully extracted. Results on sequence obfuscation show that the proposed tool obfuscates a ResNet-18 ImageNet model to a totally different architecture (with 44 layer difference) without affecting its functionality with only 2% overall latency overhead. For dimension obfuscation, we demonstrate that an example convolution layer with 64 input and 128 output channels can be obfuscated to generate a layer with 207 input and 93 output channels with only a 2% latency overhead.

preprint2021arXiv

RADAR: Run-time Adversarial Weight Attack Detection and Accuracy Recovery

Adversarial attacks on Neural Network weights, such as the progressive bit-flip attack (PBFA), can cause a catastrophic degradation in accuracy by flipping a very small number of bits. Furthermore, PBFA can be conducted at run time on the weights stored in DRAM main memory. In this work, we propose RADAR, a Run-time adversarial weight Attack Detection and Accuracy Recovery scheme to protect DNN weights against PBFA. We organize weights that are interspersed in a layer into groups and employ a checksum-based algorithm on weights to derive a 2-bit signature for each group. At run time, the 2-bit signature is computed and compared with the securely stored golden signature to detect the bit-flip attacks in a group. After successful detection, we zero out all the weights in a group to mitigate the accuracy drop caused by malicious bit-flips. The proposed scheme is embedded in the inference computation stage. For the ResNet-18 ImageNet model, our method can detect 9.6 bit-flips out of 10 on average. For this model, the proposed accuracy recovery scheme can restore the accuracy from below 1% caused by 10 bit flips to above 69%. The proposed method has extremely low time and storage overhead. System-level simulation on gem5 shows that RADAR only adds <1% to the inference time, making this scheme highly suitable for run-time attack detection and mitigation.

preprint2021arXiv

T-BFA: Targeted Bit-Flip Adversarial Weight Attack

Traditional Deep Neural Network (DNN) security is mostly related to the well-known adversarial input example attack. Recently, another dimension of adversarial attack, namely, attack on DNN weight parameters, has been shown to be very powerful. As a representative one, the Bit-Flip-based adversarial weight Attack (BFA) injects an extremely small amount of faults into weight parameters to hijack the executing DNN function. Prior works of BFA focus on un-targeted attack that can hack all inputs into a random output class by flipping a very small number of weight bits stored in computer memory. This paper proposes the first work of targeted BFA based (T-BFA) adversarial weight attack on DNNs, which can intentionally mislead selected inputs to a target output class. The objective is achieved by identifying the weight bits that are highly associated with classification of a targeted output through a class-dependent weight bit ranking algorithm. Our proposed T-BFA performance is successfully demonstrated on multiple DNN architectures for image classification tasks. For example, by merely flipping 27 out of 88 million weight bits of ResNet-18, our T-BFA can misclassify all the images from 'Hen' class into 'Goose' class (i.e., 100 % attack success rate) in ImageNet dataset, while maintaining 59.35 % validation accuracy. Moreover, we successfully demonstrate our T-BFA attack in a real computer prototype system running DNN computation, with Ivy Bridge-based Intel i7 CPU and 8GB DDR3 memory.

preprint2020arXiv

A Progressive Sub-Network Searching Framework for Dynamic Inference

Many techniques have been developed, such as model compression, to make Deep Neural Networks (DNNs) inference more efficiently. Nevertheless, DNNs still lack excellent run-time dynamic inference capability to enable users trade-off accuracy and computation complexity (i.e., latency on target hardware) after model deployment, based on dynamic requirements and environments. Such research direction recently draws great attention, where one realization is to train the target DNN through a multiple-term objective function, which consists of cross-entropy terms from multiple sub-nets. Our investigation in this work show that the performance of dynamic inference highly relies on the quality of sub-net sampling. With objective to construct a dynamic DNN and search multiple high quality sub-nets with minimal searching cost, we propose a progressive sub-net searching framework, which is embedded with several effective techniques, including trainable noise ranking, channel group and fine-tuning threshold setting, sub-nets re-selection. The proposed framework empowers the target DNN with better dynamic inference capability, which outperforms prior works on both CIFAR-10 and ImageNet dataset via comprehensive experiments on different network structures. Taken ResNet18 as an example, our proposed method achieves much better dynamic inference accuracy compared with prior popular Universally-Slimmable-Network by 4.4%-maximally and 2.3%-averagely in ImageNet dataset with the same model size.

preprint2020arXiv

DeepHammer: Depleting the Intelligence of Deep Neural Networks through Targeted Chain of Bit Flips

Security of machine learning is increasingly becoming a major concern due to the ubiquitous deployment of deep learning in many security-sensitive domains. Many prior studies have shown external attacks such as adversarial examples that tamper with the integrity of DNNs using maliciously crafted inputs. However, the security implication of internal threats (i.e., hardware vulnerability) to DNN models has not yet been well understood. In this paper, we demonstrate the first hardware-based attack on quantized deep neural networks-DeepHammer-that deterministically induces bit flips in model weights to compromise DNN inference by exploiting the rowhammer vulnerability. DeepHammer performs aggressive bit search in the DNN model to identify the most vulnerable weight bits that are flippable under system constraints. To trigger deterministic bit flips across multiple pages within reasonable amount of time, we develop novel system-level techniques that enable fast deployment of victim pages, memory-efficient rowhammering and precise flipping of targeted bits. DeepHammer can deliberately degrade the inference accuracy of the victim DNN system to a level that is only as good as random guess, thus completely depleting the intelligence of targeted DNN systems. We systematically demonstrate our attacks on real systems against 12 DNN architectures with 4 different datasets and different application domains. Our evaluation shows that DeepHammer is able to successfully tamper DNN inference behavior at run-time within a few minutes. We further discuss several mitigation techniques from both algorithm and system levels to protect DNNs against such attacks. Our work highlights the need to incorporate security mechanisms in future deep learning system to enhance the robustness of DNN against hardware-based deterministic fault injections.

preprint2020arXiv

KSM: Fast Multiple Task Adaption via Kernel-wise Soft Mask Learning

Deep Neural Networks (DNN) could forget the knowledge about earlier tasks when learning new tasks, and this is known as \textit{catastrophic forgetting}. While recent continual learning methods are capable of alleviating the catastrophic problem on toy-sized datasets, some issues still remain to be tackled when applying them in real-world problems. Recently, the fast mask-based learning method (e.g. piggyback \cite{mallya2018piggyback}) is proposed to address these issues by learning only a binary element-wise mask in a fast manner, while keeping the backbone model fixed. However, the binary mask has limited modeling capacity for new tasks. A more recent work \cite{hung2019compacting} proposes a compress-grow-based method (CPG) to achieve better accuracy for new tasks by partially training backbone model, but with order-higher training cost, which makes it infeasible to be deployed into popular state-of-the-art edge-/mobile-learning. The primary goal of this work is to simultaneously achieve fast and high-accuracy multi task adaption in continual learning setting. Thus motivated, we propose a new training method called \textit{kernel-wise Soft Mask} (KSM), which learns a kernel-wise hybrid binary and real-value soft mask for each task, while using the same backbone model. Such a soft mask can be viewed as a superposition of a binary mask and a properly scaled real-value tensor, which offers a richer representation capability without low-level kernel support to meet the objective of low hardware overhead. We validate KSM on multiple benchmark datasets against recent state-of-the-art methods (e.g. Piggyback, Packnet, CPG, etc.), which shows good improvement in both accuracy and training cost.

preprint2020arXiv

MERAM: Non-Volatile Cache Memory Based on Magneto-Electric FETs

Magneto-Electric FET (MEFET) is a recently developed post-CMOS FET, which offers intriguing characteristics for high speed and low-power design in both logic and memory applications. In this paper, for the first time, we propose a non-volatile 2T-1MEFET memory bit-cell with separate read and write paths. We show that with proper co-design at the device, cell and array levels, such a design is a promising candidate for fast non-volatile cache memory, termed as MERAM. To further evaluate its performance in memory system, we, for the first time, build a device-to-architecture cross-layer evaluation framework based on an experimentally-calibrated MEFET device model to quantitatively analyze and benchmark the proposed MERAM design with other memory technologies, including both volatile memory (i.e. SRAM, eDRAM) and other popular non-volatile emerging memory (i.e. ReRAM, STT-MRAM, and SOT-MRAM). The experiment results show that MERAM has a high state distinguishability with almost 36x magnitude difference in sense current. Results for the PARSEC benchmark suite indicate that as an L2 cache alternative, MERAM reduces Energy Area Latency (EAT) product on average by ~98\% and ~70\% compared with typical 6T SRAM and 2T SOT-MRAM platforms, respectively.

preprint2020arXiv

Non-Structured DNN Weight Pruning -- Is It Beneficial in Any Platform?

Large deep neural network (DNN) models pose the key challenge to energy efficiency due to the significantly higher energy consumption of off-chip DRAM accesses than arithmetic or SRAM operations. It motivates the intensive research on model compression with two main approaches. Weight pruning leverages the redundancy in the number of weights and can be performed in a non-structured, which has higher flexibility and pruning rate but incurs index accesses due to irregular weights, or structured manner, which preserves the full matrix structure with lower pruning rate. Weight quantization leverages the redundancy in the number of bits in weights. Compared to pruning, quantization is much more hardware-friendly, and has become a "must-do" step for FPGA and ASIC implementations. This paper provides a definitive answer to the question for the first time. First, we build ADMM-NN-S by extending and enhancing ADMM-NN, a recently proposed joint weight pruning and quantization framework. Second, we develop a methodology for fair and fundamental comparison of non-structured and structured pruning in terms of both storage and computation efficiency. Our results show that ADMM-NN-S consistently outperforms the prior art: (i) it achieves 348x, 36x, and 8x overall weight pruning on LeNet-5, AlexNet, and ResNet-50, respectively, with (almost) zero accuracy loss; (ii) we demonstrate the first fully binarized (for all layers) DNNs can be lossless in accuracy in many cases. These results provide a strong baseline and credibility of our study. Based on the proposed comparison framework, with the same accuracy and quantization, the results show that non-structrued pruning is not competitive in terms of both storage and computation efficiency. Thus, we conclude that non-structured pruning is considered harmful. We urge the community not to continue the DNN inference acceleration for non-structured sparsity.

preprint2020arXiv

PANDA: Processing-in-MRAM Accelerated De Bruijn Graph based DNA Assembly

Spurred by widening gap between data processing speed and data communication speed in Von-Neumann computing architectures, some bioinformatic applications have harnessed the computational power of Processing-in-Memory (PIM) platforms. However, the performance of PIMs unavoidably diminishes when dealing with such complex applications seeking bulk bit-wise comparison or addition operations. In this work, we present an efficient Processing-in-MRAM Accelerated De Bruijn Graph based DNA Assembly platform named PANDA based on an optimized and hardware-friendly genome assembly algorithm. PANDA is able to assemble large-scale DNA sequence data-set from all-pair overlaps. We first design PANDA platform that exploits MRAM as a computational memory and converts it to a potent processing unit for genome assembly. PANDA can execute not only efficient bulk bit-wise X(N)OR-based comparison/addition operations heavily required for the genome assembly task but a full-set of 2-/3-input logic operations inside MRAM chip. We then develop a highly parallel and step-by-step hardware-friendly DNA assembly algorithm for PANDA that only requires the developed in-memory logic operations. The platform is then configured with a novel data partitioning and mapping technique that provides local storage and processing to fully utilize the algorithm-level's parallelism. The cross-layer simulation results demonstrate that PANDA reduces the run time and power, respectively, by a factor of 18 and 11 compared with CPU. Besides, speed-ups of up-to 2-4x can be obtained over recent processing-in-MRAM platforms to perform the same task.

preprint2020arXiv

TBT: Targeted Neural Network Attack with Bit Trojan

Security of modern Deep Neural Networks (DNNs) is under severe scrutiny as the deployment of these models become widespread in many intelligence-based applications. Most recently, DNNs are attacked through Trojan which can effectively infect the model during the training phase and get activated only through specific input patterns (i.e, trigger) during inference. In this work, for the first time, we propose a novel Targeted Bit Trojan(TBT) method, which can insert a targeted neural Trojan into a DNN through the bit-flip attack. Our algorithm efficiently generates a trigger specifically designed to locate certain vulnerable bits of DNN weights stored in main memory (i.e., DRAM). The objective is that once the attacker flips these vulnerable bits, the network still operates with normal inference accuracy with benign input. However, when the attacker activates the trigger by embedding it with any input, the network is forced to classify all inputs to a certain target class. We demonstrate that flipping only several vulnerable bits identified by our method, using available bit-flip techniques (i.e, row-hammer), can transform a fully functional DNN model into a Trojan-infected model. We perform extensive experiments of CIFAR-10, SVHN and ImageNet datasets on both VGG-16 and Resnet-18 architectures. Our proposed TBT could classify 92 % of test images to a target class with as little as 84 bit-flips out of 88 million weight bits on Resnet-18 for CIFAR10 dataset.

preprint2014arXiv

Design and Synthesis of Ultra Low Energy Spin-Memristor Threshold Logic

A threshold logic gate (TLG) performs weighted sum of multiple inputs and compares the sum with a threshold. We propose Spin-Memeristor Threshold Logic (SMTL) gates, which employ memristive cross-bar array (MCA) to perform current-mode summation of binary inputs, whereas, the low-voltage fast-switching spintronic threshold devices (STD) carry out the threshold operation in an energy efficient manner. Field programmable SMTL gate arrays can operate at a small terminal voltage of ~50mV, resulting in ultra-low power consumption in gates as well as programmable interconnect networks. We evaluate the performance of SMTL using threshold logic synthesis. Results for common benchmarks show that SMTL based programmable logic hardware can be more than 100x energy efficient than state of the art CMOS FPGA.

preprint2014arXiv

Hierarchical Temporal Memory Based on Spin-Neurons and Resistive Memory for Energy-Efficient Brain-Inspired Computing

Hierarchical temporal memory (HTM) tries to mimic the computing in cerebral-neocortex. It identifies spatial and temporal patterns in the input for making inferences. This may require large number of computationally expensive tasks like, dot-product evaluations. Nano-devices that can provide direct mapping for such primitives are of great interest. In this work we show that the computing blocks for HTM can be mapped using low-voltage, fast-switching, magneto-metallic spin-neurons combined with emerging resistive cross-bar network (RCN). Results show possibility of more than 200x lower energy as compared to 45nm CMOS ASIC design

preprint2014arXiv

STT-SNN: A Spin-Transfer-Torque Based Soft-Limiting Non-Linear Neuron for Low-Power Artificial Neural Networks

Recent years have witnessed growing interest in the use of Artificial Neural Networks (ANNs) for vision, classification, and inference problems. An artificial neuron sums N weighted inputs and passes the result through a non-linear transfer function. Large-scale ANNs impose very high computing requirements for training and classification, leading to great interest in the use of post-CMOS devices to realize them in an energy efficient manner. In this paper, we propose a spin-transfer-torque (STT) device based on Domain Wall Motion (DWM) magnetic strip that can efficiently implement a Soft-limiting Non-linear Neuron (SNN) operating at ultra-low supply voltage and current. In contrast to previous spin-based neurons that can only realize hard-limiting transfer functions, the proposed STT-SNN displays a continuous resistance change with varying input current, and can therefore be employed to implement a soft-limiting neuron transfer function. Soft-limiting neurons are greatly preferred to hard-limiting ones due to their much improved modeling capacity, which leads to higher network accuracy and lower network complexity. We also present an ANN hardware design employing the proposed STT-SNNs and Memristor Crossbar Arrays (MCA) as synapses. The ultra-low voltage operation of the magneto metallic STT-SNN enables the programmable MCA-synapses, computing analog-domain weighted summation of input voltages, to also operate at ultra-low voltage. We modeled the STT-SNN using micro-magnetic simulation and evaluated them using an ANN for character recognition. Comparisons with analog and digital CMOS neurons show that STT-SNNs can achieve more than two orders of magnitude lower energy consumption.

preprint2013arXiv

Energy-Efficient and Robust Associative Computing with Electrically Coupled Dual Pillar Spin-Torque Oscillators

Dynamics of coupled spin-torque oscillators can be exploited for non-Boolean information processing. However, the feasibility of coupling large number of STOs with energy-efficiency and sufficient robustness towards parameter-variation and thermal-noise, may be critical for such computing applications. In this work, the impacts of parameter-variation and thermal-noise on two different coupling mechanisms for STOs, namely, magnetic-coupling and electrical-coupling are analyzed. Magnetic coupling is simulated using dipolar-field interactions. For electricalcoupling we employed global RF-injection. In this method, multiple STOs are phase-locked to a common RF-signal that is injected into the STOs along with the DC bias. Results for variation and noise analysis indicate that electrical-coupling can be significantly more robust as compared to magnetic-coupling. For room-temperature simulations, appreciable phase-lock was retained among tens of electrically coupled STOs for up to 20% 3s random variations in critical device parameters. The magnetic-coupling technique however failed to retain locking beyond ~3% 3s parameter-variations, even for small-size STO clusters with near-neighborhood connectivity. We propose and analyze Dual-Pillar STO (DP-STO) for low-power computing using the proposed electrical coupling method. We observed that DP-STO can better exploit the electrical-coupling technique due to separation between the biasing RF signal and its own RF output.

preprint2013arXiv

Exploring Boolean and Non-Boolean Computing Applications of Spin Torque Devices

In this paper we discuss the potential of emerging spintorque devices for computing applications. Recent proposals for spinbased computing schemes may be differentiated as all-spin vs. hybrid, programmable vs. fixed, and, Boolean vs. non-Boolean. All spin logic-styles may offer high area-density due to small form-factor of nano-magnetic devices. However, circuit and system-level design techniques need to be explored that leverage the specific spin-device characteristics to achieve energy-efficiency, performance and reliability comparable to those of CMOS. The non-volatility of nanomagnets can be exploited in the design of energy and area-efficient programmable logic. In such logic-styles, spin-devices may play the dual-role of computing as well as memory-elements that provide field-programmability. Spin-based threshold logic design is presented as an example (dynamic resisitve threshold logic and magnetic threshold logic). Emerging spintronic phenomena may lead to ultralow- voltage, current-mode, spin-torque switches that can offer attractive computing capabilities, beyond digital switches. Such devices may be suitable for non-Boolean data-processing applications which involve analog processing. Integration of such spin-torque devices with charge-based devices like CMOS and resistive memory can lead to highly energy-efficient information processing hardware for applications like pattern-matching, neuromorphic-computing, image-processing and data-conversion. Towards the end, we discuss the possibility of applying emerging spin-torque switches in the design of energy-efficient global interconnects, for future chip multiprocessors.

preprint2013arXiv

Ultra Low Power Associative Computing with Spin Neurons and Resistive Crossbar Memory

Emerging resistive-crossbar memory (RCM) technology can be promising for computationally-expensive analog pattern-matching tasks. However, the use of CMOS analog-circuits with RCM would result in large power-consumption and poor scalability, thereby eschewing the benefits of RCM-based computation. We propose the use of low-voltage, fast-switching, magneto-metallic spin-neurons for ultra low-power non-Boolean computing with RCM. We present the design of analog associative memory for face recognition using RCM, where, substituting conventional analog circuits with spin-neurons can achieve ~100x lower power. This makes the proposed design ~1000x more energy-efficient than a 45nm-CMOS digital ASIC, thereby significantly enhancing the prospects of RCM based computational hardware.

preprint2013arXiv

Ultra-low Energy, High Performance and Programmable Magnetic Threshold Logic

We propose magnetic threshold-logic (MTL) design based on non-volatile spin-torque switches. A threshold logic gate (TLG) performs summation of multiple inputs multiplied by a fixed set of weights and compares the sum with a threshold. MTL employs resistive states of magnetic tunnel junctions as programmable input weights, while, a low-voltage domain-wall shift based spin-torque switch is used for thresholding operation. The resulting MTL gate acts as a low-power, configurable logic unit and can be used to build fully pipelined, high-performance programmable computing blocks. Multiple stages in such a MTL design can be connected using energy-efficient ultralow swing programmable interconnect networks based on resistive switches. Owing to memory-based compact logic and interconnect design and low-voltage, high-speed spintorque based threshold operation, MTL can achieve more than two orders of magnitude improvement in energy-delay product as compared to look-up table based CMOS FPGA.

preprint2013arXiv

Ultra-low Energy, High-Performance Dynamic Resistive Threshold Logic

We propose dynamic resistive threshold-logic (DRTL) design based on non-volatile resistive memory. A threshold logic gate (TLG) performs summation of multiple inputs multiplied by a fixed set of weights and compares the sum with a threshold. DRTL employs resistive memory elements to implement the weights and the thresholds, while a compact dynamic CMOS latch is used for the comparison operation. The resulting DRTL gate acts as a low-power, configurable dynamic logic unit and can be used to build fully pipelined, high-performance programmable computing blocks. Multiple stages in such a DRTL design can be connected using energy-efficient low swing programmable interconnect networks based on resistive switches. Owing to memory-based compact logic and interconnect design and highspeed dynamic-pipelined operation, DRTL can achieve more than two orders of magnitude improvement in energy-delay product as compared to look-up table based CMOS FPGA.

Deliang Fan

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

AdpSplit: Error-Driven Adaptive Splitting for Faster Geometry Discovery in 3D Gaussian Splatting

ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated Learning

TRGP: Trust Region Gradient Projection for Continual Learning

NeurObfuscator: A Full-stack Obfuscation Tool to Mitigate Neural Architecture Stealing

RADAR: Run-time Adversarial Weight Attack Detection and Accuracy Recovery

T-BFA: Targeted Bit-Flip Adversarial Weight Attack

A Progressive Sub-Network Searching Framework for Dynamic Inference

DeepHammer: Depleting the Intelligence of Deep Neural Networks through Targeted Chain of Bit Flips

KSM: Fast Multiple Task Adaption via Kernel-wise Soft Mask Learning

MERAM: Non-Volatile Cache Memory Based on Magneto-Electric FETs

Non-Structured DNN Weight Pruning -- Is It Beneficial in Any Platform?

PANDA: Processing-in-MRAM Accelerated De Bruijn Graph based DNA Assembly

TBT: Targeted Neural Network Attack with Bit Trojan

Design and Synthesis of Ultra Low Energy Spin-Memristor Threshold Logic

Hierarchical Temporal Memory Based on Spin-Neurons and Resistive Memory for Energy-Efficient Brain-Inspired Computing

STT-SNN: A Spin-Transfer-Torque Based Soft-Limiting Non-Linear Neuron for Low-Power Artificial Neural Networks

Energy-Efficient and Robust Associative Computing with Electrically Coupled Dual Pillar Spin-Torque Oscillators

Exploring Boolean and Non-Boolean Computing Applications of Spin Torque Devices

Ultra Low Power Associative Computing with Spin Neurons and Resistive Crossbar Memory

Ultra-low Energy, High Performance and Programmable Magnetic Threshold Logic

Ultra-low Energy, High-Performance Dynamic Resistive Threshold Logic