Researcher profile

Anuj Pathania

Anuj Pathania contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2022arXiv

CoMeT: An Integrated Interval Thermal Simulation Toolchain for 2D, 2.5D, and 3D Processor-Memory Systems

Processing cores and the accompanying main memory working in tandem enable the modern processors. Dissipating heat produced from computation, memory access remains a significant problem for processors. Therefore, processor thermal management continues to be an active research topic. Most thermal management research takes place using simulations, given the challenges of measuring temperature in real processors. Since core and memory are fabricated on separate packages in most existing processors, with the memory having lower power densities, thermal management research in processors has primarily focused on the cores. Memory bandwidth limitations associated with 2D processors lead to high-density 2.5D and 3D packaging technology. 2.5D packaging places cores and memory on the same package. 3D packaging technology takes it further by stacking layers of memory on the top of cores themselves. Such packagings significantly increase the power density, making processors prone to heating. Therefore, mitigating thermal issues in high-density processors (packaged with stacked memory) becomes an even more pressing problem. However, given the lack of thermal modeling for memories in existing interval thermal simulation toolchains, they are unsuitable for studying thermal management for high-density processors. To address this issue, we present CoMeT, the first integrated Core and Memory interval Thermal simulation toolchain. CoMeT comprehensively supports thermal simulation of high- and low-density processors corresponding to four different core-memory configurations - off-chip DDR memory, off-chip 3D memory, 2.5D, and 3D. CoMeT supports several novel features that facilitate overlying system research. Compared to an equivalent state-of-the-art core-only toolchain, CoMeT adds only a ~5% simulation-time overhead. The source code of CoMeT has been made open for public use under the MIT license.

preprint2020arXiv

High-Throughput CNN Inference on Embedded ARM big.LITTLE Multi-Core Processors

IoT Edge intelligence requires Convolutional Neural Network (CNN) inference to take place in the edge devices itself. ARM big.LITTLE architecture is at the heart of prevalent commercial edge devices. It comprises of single-ISA heterogeneous cores grouped into multiple homogeneous clusters that enable power and performance trade-offs. All cores are expected to be simultaneously employed in inference to attain maximal throughput. However, high communication overhead involved in parallelization of computations from convolution kernels across clusters is detrimental to throughput. We present an alternative framework called Pipe-it that employs pipelined design to split convolutional layers across clusters while limiting parallelization of their respective kernels to the assigned cluster. We develop a performance-prediction model that utilizes only the convolutional layer descriptors to predict the execution time of each layer individually on all permitted core configurations (type and count). Pipe-it then exploits the predictions to create a balanced pipeline using an efficient design space exploration algorithm. Pipe-it on average results in a 39% higher throughput than the highest antecedent throughput.

preprint2020arXiv

Neural Network Inference on Mobile SoCs

The ever-increasing demand from mobile Machine Learning (ML) applications calls for evermore powerful on-chip computing resources. Mobile devices are empowered with heterogeneous multi-processor Systems-on-Chips (SoCs) to process ML workloads such as Convolutional Neural Network (CNN) inference. Mobile SoCs house several different types of ML capable components on-die, such as CPU, GPU, and accelerators. These different components are capable of independently performing inference but with very different power-performance characteristics. In this article, we provide a quantitative evaluation of the inference capabilities of the different components on mobile SoCs. We also present insights behind their respective power-performance behavior. Finally, we explore the performance limit of the mobile SoCs by synergistically engaging all the components concurrently. We observe that a mobile SoC provides up to 2x improvement with parallel inference when all its components are engaged, as opposed to engaging only one component.