Source author record

Amit Ranjan Trivedi

Amit Ranjan Trivedi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.SP eess.SY Systems and Control eess.IV Hardware Architecture Machine Learning

Catalog footprint

What is connected

3works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

MF-Net: Compute-In-Memory SRAM for Multibit Precision Inference using Memory-immersed Data Conversion and Multiplication-free Operators

We propose a co-design approach for compute-in-memory inference for deep neural networks (DNN). We use multiplication-free function approximators based on ell_1 norm along with a co-adapted processing array and compute flow. Using the approach, we overcame many deficiencies in the current art of in-SRAM DNN processing such as the need for digital-to-analog converters (DACs) at each operating SRAM row/column, the need for high precision analog-to-digital converters (ADCs), limited support for multi-bit precision weights, and limited vector-scale parallelism. Our co-adapted implementation seamlessly extends to multi-bit precision weights, it doesn't require DACs, and it easily extends to higher vector-scale parallelism. We also propose an SRAM-immersed successive approximation ADC (SA-ADC), where we exploit the parasitic capacitance of bit lines of SRAM array as a capacitive DAC. Since the dominant area overhead in SA-ADC comes due to its capacitive DAC, by exploiting the intrinsic parasitic of SRAM array, our approach allows low area implementation of within-SRAM SA-ADC. Our 8$\times$62 SRAM macro, which requires a 5-bit ADC, achieves $\sim$105 tera operations per second per Watt (TOPS/W) with 8-bit input/weight processing at 45 nm CMOS.

preprint2020arXiv

$MC^2RAM$: Markov Chain Monte Carlo Sampling in SRAM for Fast Bayesian Inference

This work discusses the implementation of Markov Chain Monte Carlo (MCMC) sampling from an arbitrary Gaussian mixture model (GMM) within SRAM. We show a novel architecture of SRAM by embedding it with random number generators (RNGs), digital-to-analog converters (DACs), and analog-to-digital converters (ADCs) so that SRAM arrays can be used for high performance Metropolis-Hastings (MH) algorithm-based MCMC sampling. Most of the expensive computations are performed within the SRAM and can be parallelized for high speed sampling. Our iterative compute flow minimizes data movement during sampling. We characterize power-performance trade-off of our design by simulating on 45 nm CMOS technology. For a two-dimensional, two mixture GMM, the implementation consumes ~ 91 micro-Watts power per sampling iteration and produces 500 samples in 2000 clock cycles on an average at 1 GHz clock frequency. Our study highlights interesting insights on how low-level hardware non-idealities can affect high-level sampling characteristics, and recommends ways to optimally operate SRAM within area/power constraints for high performance sampling.

preprint2020arXiv

Low Power Unsupervised Anomaly Detection by Non-Parametric Modeling of Sensor Statistics

This work presents AEGIS, a novel mixed-signal framework for real-time anomaly detection by examining sensor stream statistics. AEGIS utilizes Kernel Density Estimation (KDE)-based non-parametric density estimation to generate a real-time statistical model of the sensor data stream. The likelihood estimate of the sensor data point can be obtained based on the generated statistical model to detect outliers. We present CMOS Gilbert Gaussian cell-based design to realize Gaussian kernels for KDE. For outlier detection, the decision boundary is defined in terms of kernel standard deviation ($σ_{Kernel}$) and likelihood threshold ($P_{Thres}$). We adopt a sliding window to update the detection model in real-time. We use time-series dataset provided from Yahoo to benchmark the performance of AEGIS. A f1-score higher than 0.87 is achieved by optimizing parameters such as length of the sliding window and decision thresholds which are programmable in AEGIS. Discussed architecture is designed using 45nm technology node and our approach on average consumes $\sim$75 $μ$W power at a sampling rate of 2 MHz while using ten recent inlier samples for density estimation. \textcolor{red}{Full-version of this research has been published at IEEE TVLSI}

Amit Ranjan Trivedi

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

MF-Net: Compute-In-Memory SRAM for Multibit Precision Inference using Memory-immersed Data Conversion and Multiplication-free Operators

$MC^2RAM$: Markov Chain Monte Carlo Sampling in SRAM for Fast Bayesian Inference

Low Power Unsupervised Anomaly Detection by Non-Parametric Modeling of Sensor Statistics