Source author record

Sarma Vrudhula

Sarma Vrudhula appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Hardware Architecture Emerging Technologies Machine Learning Computer Vision Cryptography and Security Data Structures and Algorithms eess.SP Neural and Evolutionary Computing

Catalog footprint

What is connected

7works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Novel ASIC Design Flow using Weight-Tunable Binary Neurons as Standard Cells

In this paper, we describe a design of a mixed signal circuit for a binary neuron (a.k.a perceptron, threshold logic gate) and a methodology for automatically embedding such cells in ASICs. The binary neuron, referred to as an FTL (flash threshold logic) uses floating gate or flash transistors whose threshold voltages serve as a proxy for the weights of the neuron. Algorithms for mapping the weights to the flash transistor threshold voltages are presented. The threshold voltages are determined to maximize both the robustness of the cell and its speed. The performance, power, and area of a single FTL cell are shown to be significantly smaller (79.4%), consume less power (61.6%), and operate faster (40.3%) compared to conventional CMOS logic equivalents. Also included are the architecture and the algorithms to program the flash devices of an FTL. The FTL cells are implemented as standard cells, and are designed to allow commercial synthesis and P&R tools to automatically use them in synthesis of ASICs. Substantial reductions in area and power without sacrificing performance are demonstrated on several ASIC benchmarks by the automatic embedding of FTL cells. The paper also demonstrates how FTL cells can be used for fixing timing errors after fabrication.

preprint2020arXiv

Enabling Incremental Knowledge Transfer for Object Detection at the Edge

Object detection using deep neural networks (DNNs) involves a huge amount of computation which impedes its implementation on resource/energy-limited user-end devices. The reason for the success of DNNs is due to having knowledge over all different domains of observed environments. However, we need a limited knowledge of the observed environment at inference time which can be learned using a shallow neural network (SHNN). In this paper, a system-level design is proposed to improve the energy consumption of object detection on the user-end device. An SHNN is deployed on the user-end device to detect objects in the observing environment. Also, a knowledge transfer mechanism is implemented to update the SHNN model using the DNN knowledge when there is a change in the object domain. DNN knowledge can be obtained from a powerful edge device connected to the user-end device through LAN or Wi-Fi. Experiments demonstrate that the energy consumption of the user-end device and the inference time can be improved by 78% and 71% compared with running the deep model on the user-end device.

preprint2019arXiv

ELSA: A Throughput-Optimized Design of an LSTM Accelerator for Energy-Constrained Devices

The next significant step in the evolution and proliferation of artificial intelligence technology will be the integration of neural network (NN) models within embedded and mobile systems. This calls for the design of compact, energy efficient NN models in silicon. In this paper, we present a scalable ASIC design of an LSTM accelerator named ELSA, that is suitable for energy-constrained devices. It includes several architectural innovations to achieve small area and high energy efficiency. To reduce the area and power consumption of the overall design, the compute-intensive units of ELSA employ approximate multiplications and still achieve high performance and accuracy. The performance is further improved through efficient synchronization of the elastic pipeline stages to maximize the utilization. The paper also includes a performance model of ELSA, as a function of the hidden nodes and time steps, permitting its use for the evaluation of any LSTM application. ELSA was implemented in RTL and was synthesized and placed and routed in 65nm technology. Its functionality is demonstrated for language modeling-a common application of LSTM. ELSA is compared against a baseline implementation of an LSTM accelerator with standard functional units and without any of the architectural innovations of ELSA. The paper demonstrates that ELSA can achieve significant improvements in power, area and energy-efficiency when compared to the baseline design and several ASIC implementations reported in the literature, making it suitable for use in embedded systems and real-time applications.

preprint2019arXiv

Threshold Logic in a Flash

This paper describes a novel design of a threshold logic gate (a binary perceptron) and its implementation as a standard cell. This new cell structure, referred to as flash threshold logic (FTL), uses floating gate (flash) transistors to realize the weights associated with a threshold function. The threshold voltages of the flash transistors serve as a proxy for the weights. An FTL cell can be equivalently viewed as a multi-input, edge-triggered flipflop which computes a threshold function on a clock edge. Consequently, it can be used in the automatic synthesis of ASICs. The use of flash transistors in the FTL cell allows programming of the weights after fabrication, thereby preventing discovery of its function by a foundry or by reverse engineering. This paper focuses on the design and characteristics of the FTL cell. We present a novel method for programming the weights of an FTL cell for a specified threshold function using a modified perceptron learning algorithm. The algorithm is further extended to select weights to maximize the robustness of the design in the presence of process variations. The FTL circuit was designed in 40nm technology and simulations with layout-extracted parasitics included, demonstrate significant improvements in the area (79.7%), power (61.1%), and performance (42.5%) when compared to the equivalent implementations of the same function in conventional static CMOS design. Weight selection targeting robustness is demonstrated using Monte Carlo simulations. The paper also shows how FTL cells can be used for fixing timing errors after fabrication.

preprint2016arXiv

Digital IP Protection Using Threshold Voltage Control

This paper proposes a method to completely hide the functionality of a digital standard cell. This is accomplished by a differential threshold logic gate (TLG). A TLG with $n$ inputs implements a subset of Boolean functions of $n$ variables that are linear threshold functions. The output of such a gate is one if and only if an integer weighted linear arithmetic sum of the inputs equals or exceeds a given integer threshold. We present a novel architecture of a TLG that not only allows a single TLG to implement a large number of complex logic functions, which would require multiple levels of logic when implemented using conventional logic primitives, but also allows the selection of that subset of functions by assignment of the transistor threshold voltages to the input transistors. To obfuscate the functionality of the TLG, weights of some inputs are set to zero by setting their device threshold to be a high $V_t$. The threshold voltage of the remaining transistors is set to low $V_t$ to increase their transconductance. The function of a TLG is not determined by the cell itself but rather the signals that are connected to its inputs. This makes it possible to hide the support set of the function by essentially removing some variable from the support set of the function by selective assignment of high and low $V_t$ to the input transistors. We describe how a standard cell library of TLGs can be mixed with conventional standard cells to realize complex logic circuits, whose function can never be discovered by reverse engineering. A 32-bit Wallace tree multiplier and a 28-bit 4-tap filter were synthesized on an ST 65nm process, placed and routed, then simulated including extracted parastics with and without obfuscation. Both obfuscated designs had much lower area (25%) and much lower dynamic power (30%) than their nonobfuscated CMOS counterparts, operating at the same frequency.

preprint2016arXiv

Efficient Enumeration of Unidirectional Cuts for Technology Mapping of Boolean Networks

In technology mapping, enumeration of subcircuits or cuts to be replaced by a standard cell is an important step that decides both the quality of the solution and execution speed. In this work, we view cuts as set of edges instead of as set of nodes and based on it, provide a classification of cuts. It is shown that if enumeration is restricted to a subclass of cuts called unidirectional cuts, the quality of solution does not degrade. We also show that such cuts are equivalent to a known class of cuts called strong line cuts first proposed in [14]. We propose an efficient enumeration method based on a novel graph pruning algorithm that utilizes network flow to approximate minimum strong line cut. The runtimes for the proposed enumeration method are shown to be quite practical for enumeration of a large number of cuts.

preprint2007arXiv

Stochastic Power Grid Analysis Considering Process Variations

In this paper, we investigate the impact of interconnect and device process variations on voltage fluctuations in power grids. We consider random variations in the power grid's electrical parameters as spatial stochastic processes and propose a new and efficient method to compute the stochastic voltage response of the power grid. Our approach provides an explicit analytical representation of the stochastic voltage response using orthogonal polynomials in a Hilbert space. The approach has been implemented in a prototype software called OPERA (Orthogonal Polynomial Expansions for Response Analysis). Use of OPERA on industrial power grids demonstrated speed-ups of up to two orders of magnitude. The results also show a significant variation of about $\pm$ 35% in the nominal voltage drops at various nodes of the power grids and demonstrate the need for variation-aware power grid analysis.

Sarma Vrudhula

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

A Novel ASIC Design Flow using Weight-Tunable Binary Neurons as Standard Cells

Enabling Incremental Knowledge Transfer for Object Detection at the Edge

ELSA: A Throughput-Optimized Design of an LSTM Accelerator for Energy-Constrained Devices

Threshold Logic in a Flash

Digital IP Protection Using Threshold Voltage Control

Efficient Enumeration of Unidirectional Cuts for Technology Mapping of Boolean Networks

Stochastic Power Grid Analysis Considering Process Variations