Researcher profile

Abhimanyu Bambhaniya

Abhimanyu Bambhaniya contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 11 - UnverifiedVerification L1Unclaimed author
1works
0followers
1topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

1 published item(s)

preprint2022arXiv

Enabling Flexibility for Sparse Tensor Acceleration via Heterogeneity

Recently, numerous sparse hardware accelerators for Deep Neural Networks (DNNs), Graph Neural Networks (GNNs), and scientific computing applications have been proposed. A common characteristic among all of these accelerators is that they target tensor algebra (typically matrix multiplications); yet dozens of new accelerators are proposed for every new application. The motivation is that the size and sparsity of the workloads heavily influence which architecture is best for memory and computation efficiency. To satisfy the growing demand of efficient computations across a spectrum of workloads on large data centers, we propose deploying a flexible 'heterogeneous' accelerator, which contains many 'sub-accelerators' (smaller specialized accelerators) working together. To this end, we propose: (1) HARD TACO, a quick and productive C++ to RTL design flow to generate many types of sub-accelerators for sparse and dense computations for fair design-space exploration, (2) AESPA, a heterogeneous sparse accelerator design template constructed with the sub-accelerators generated from HARD TACO, and (3) a suite of scheduling strategies to map tensor kernels onto heterogeneous sparse accelerators with high efficiency and utilization. AESPA with optimized scheduling achieves 1.96X higher performance, and 7.9X better energy-delay product (EDP) than a Homogeneous EIE-like accelerator with our diverse workload suite.