Source author record

Partha Pratim Pande

Partha Pratim Pande appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Emerging Technologies Distributed, Parallel, and Cluster Computing Hardware Architecture eess.SY Machine Learning Networking and Internet Architecture Systems and Control

Catalog footprint

What is connected

4works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

ReaLPrune: ReRAM Crossbar-aware Lottery Ticket Pruned CNNs

Training machine learning (ML) models at the edge (on-chip training on end user devices) can address many pressing challenges including data privacy/security, increase the accessibility of ML applications to different parts of the world by reducing the dependence on the communication fabric and the cloud infrastructure, and meet the real-time requirements of AR/VR applications. However, existing edge platforms do not have sufficient computing capabilities to support complex ML tasks such as training large CNNs. ReRAM-based architectures offer high-performance yet energy efficient computing platforms for on-chip CNN training/inferencing. However, ReRAM-based architectures are not scalable with the size of the CNN. Larger CNNs have more weights, which requires more ReRAM cells that cannot be integrated in a single chip. Moreover, training larger CNNs on-chip will require higher power, which cannot be afforded by these smaller devices. Pruning is an effective way to solve this problem. However, existing pruning techniques are either targeted for inferencing only, or they are not crossbar-aware. This leads to sub-optimal hardware savings and performance benefits for CNN training on ReRAM-based architectures. In this paper, we address this problem by proposing a novel crossbar-aware pruning strategy, referred as ReaLPrune, which can prune more than 90% of CNN weights. The pruned model can be trained from scratch without any accuracy loss. Experimental results indicate that ReaLPrune reduces hardware requirements by 77.2% and accelerates CNN training by ~20X compared to unpruned CNNs. ReaLPrune also outperforms other crossbar-aware pruning techniques in terms of both performance and hardware savings. In addition, ReaLPrune is equally effective for diverse datasets and more complex CNNs

preprint2021arXiv

ReGraphX: NoC-enabled 3D Heterogeneous ReRAM Architecture for Training Graph Neural Networks

Graph Neural Network (GNN) is a variant of Deep Neural Networks (DNNs) operating on graphs. However, GNNs are more complex compared to traditional DNNs as they simultaneously exhibit features of both DNN and graph applications. As a result, architectures specifically optimized for either DNNs or graph applications are not suited for GNN training. In this work, we propose a 3D heterogeneous manycore architecture for on-chip GNN training to address this problem. The proposed architecture, ReGraphX, involves heterogeneous ReRAM crossbars to fulfill the disparate requirements of both DNN and graph computations simultaneously. The ReRAM-based architecture is complemented with a multicast-enabled 3D NoC to improve the overall achievable performance. We demonstrate that ReGraphX outperforms conventional GPUs by up to 3.5X (on an average 3X) in terms of execution time, while reducing energy consumption by as much as 11X.

preprint2020arXiv

An Energy-Aware Online Learning Framework for Resource Management in Heterogeneous Platforms

Mobile platforms must satisfy the contradictory requirements of fast response time and minimum energy consumption as a function of dynamically changing applications. To address this need, system-on-chips (SoC) that are at the heart of these devices provide a variety of control knobs, such as the number of active cores and their voltage/frequency levels. Controlling these knobs optimally at runtime is challenging for two reasons. First, the large configuration space prohibits exhaustive solutions. Second, control policies designed offline are at best sub-optimal since many potential new applications are unknown at design-time. We address these challenges by proposing an online imitation learning approach. Our key idea is to construct an offline policy and adapt it online to new applications to optimize a given metric (e.g., energy). The proposed methodology leverages the supervision enabled by power-performance models learned at runtime. We demonstrate its effectiveness on a commercial mobile platform with 16 diverse benchmarks. Our approach successfully adapts the control policy to an unknown application after executing less than 25% of its instructions.

preprint2016arXiv

Design-Space Exploration and Optimization of an Energy-Efficient and Reliable 3D Small-world Network-on-Chip

A three-dimensional (3D) Network-on-Chip (NoC) enables the design of high performance and low power many-core chips. Existing 3D NoCs are inadequate for meeting the ever-increasing performance requirements of many-core processors since they are simple extensions of regular 2D architectures and they do not fully exploit the advantages provided by 3D integration. Moreover, the anticipated performance gain of a 3D NoC-enabled many-core chip may be compromised due to the potential failures of through-silicon-vias (TSVs) that are predominantly used as vertical interconnects in a 3D IC. To address these problems, we propose a machine-learning-inspired predictive design methodology for energy-efficient and reliable many-core architectures enabled by 3D integration. We demonstrate that a small-world network-based 3D NoC (3D SWNoC) performs significantly better than its 3D MESH-based counterparts. On average, the 3D SWNoC shows 35% energy-delay-product (EDP) improvement over 3D MESH for the PARSEC and SPLASH2 benchmarks considered in this work. To improve the reliability of 3D NoC, we propose a computationally efficient spare-vertical link (sVL) allocation algorithm based on a state-space search formulation. Our results show that the proposed sVL allocation algorithm can significantly improve the reliability as well as the lifetime of 3D SWNoC.