Source author record

Udit Gupta

Udit Gupta appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Hardware Architecture Machine Learning Cryptography and Security Artificial Intelligence Networking and Internet Architecture Performance

Catalog footprint

What is connected

12works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CarbonClarity: Understanding and Addressing Uncertainty in Embodied Carbon for Sustainable Computing

Embodied carbon footprint modeling has become an area of growing interest due to its significant contribution to carbon emissions in computing. However, the deterministic nature of the existing models fail to account for the spatial and temporal variability in the semiconductor supply chain. The absence of uncertainty modeling limits system designers' ability to make informed, carbon-aware decisions. We introduce CarbonClarity, a probabilistic framework designed to model embodied carbon footprints through distributions that reflect uncertainties in energy-per-area, gas-per-area, yield, and carbon intensity across different technology nodes. Our framework enables a deeper understanding of how design choices, such as chiplet architectures and new vs. old technology node selection, impact emissions and their associated uncertainties. For example, we show that the gap between the mean and 95th percentile of embodied carbon per cm$^2$ can reach up to 1.6X for the 7nm technology node. Additionally, we demonstrate through case studies that: (i) CarbonClarity is a valuable resource for device provisioning, help maintaining performance under a tight carbon budget; and (ii) chiplet technology and mature nodes not only reduce embodied carbon but also significantly lower its associated uncertainty, achieving an 18% reduction in the 95th percentile compared to monolithic designs for the mobile application.

preprint2022arXiv

Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation

Personalized recommendation is an important class of deep-learning applications that powers a large collection of internet services and consumes a considerable amount of datacenter resources. As the scale of production-grade recommendation systems continues to grow, optimizing their serving performance and efficiency in a heterogeneous datacenter is important and can translate into infrastructure capacity saving. In this paper, we propose Hercules, an optimized framework for personalized recommendation inference serving that targets diverse industry-representative models and cloud-scale heterogeneous systems. Hercules performs a two-stage optimization procedure - offline profiling and online serving. The first stage searches the large under-explored task scheduling space with a gradient-based search algorithm achieving up to 9.0x latency-bounded throughput improvement on individual servers; it also identifies the optimal heterogeneous server architecture for each recommendation workload. The second stage performs heterogeneity-aware cluster provisioning to optimize resource mapping and allocation in response to fluctuating diurnal loads. The proposed cluster scheduler in Hercules achieves 47.7% cluster capacity saving and reduces the provisioned power by 23.7% over a state-of-the-art greedy scheduler.

preprint2022arXiv

Sustainable AI: Environmental Implications, Challenges and Opportunities

This paper explores the environmental impact of the super-linear growth trends for AI from a holistic perspective, spanning Data, Algorithms, and System Hardware. We characterize the carbon footprint of AI computing by examining the model development cycle across industry-scale machine learning use cases and, at the same time, considering the life cycle of system hardware. Taking a step further, we capture the operational and manufacturing carbon footprint of AI computing and present an end-to-end analysis for what and how hardware-software design and at-scale optimization can help reduce the overall carbon footprint of AI. Based on the industry experience and lessons learned, we share the key challenges and chart out important development directions across the many dimensions of AI. We hope the key messages and insights presented in this paper can inspire the community to advance the field of AI in an environmentally-responsible manner.

preprint2021arXiv

RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference

Neural personalized recommendation models are used across a wide variety of datacenter applications including search, social media, and entertainment. State-of-the-art models comprise large embedding tables that have billions of parameters requiring large memory capacities. Unfortunately, large and fast DRAM-based memories levy high infrastructure costs. Conventional SSD-based storage solutions offer an order of magnitude larger capacity, but have worse read latency and bandwidth, degrading inference performance. RecSSD is a near data processing based SSD memory system customized for neural recommendation inference that reduces end-to-end model inference latency by 2X compared to using COTS SSDs across eight industry-representative models.

preprint2020arXiv

DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference

Neural personalized recommendation is the corner-stone of a wide collection of cloud services and products, constituting significant compute demand of the cloud infrastructure. Thus, improving the execution efficiency of neural recommendation directly translates into infrastructure capacity saving. In this paper, we devise a novel end-to-end modeling infrastructure, DeepRecInfra, that adopts an algorithm and system co-design methodology to custom-design systems for recommendation use cases. Leveraging the insights from the recommendation characterization, a new dynamic scheduler, DeepRecSched, is proposed to maximize latency-bounded throughput by taking into account characteristics of inference query size and arrival patterns, recommendation model architectures, and underlying hardware systems. By doing so, system throughput is doubled across the eight industry-representative recommendation models. Finally, design, deployment, and evaluation in at-scale production datacenter shows over 30% latency reduction across a wide variety of recommendation models running on hundreds of machines.

preprint2020arXiv

MLPerf Training Benchmark

Machine learning (ML) needs industry-standard performance benchmarks to support design and competitive evaluation of the many emerging software and hardware solutions for ML. But ML training presents three unique benchmarking challenges absent from other domains: optimizations that improve training throughput can increase the time to solution, training is stochastic and time to solution exhibits high variance, and software and hardware systems are so diverse that fair benchmarking with the same binary, code, and even hyperparameters is difficult. We therefore present MLPerf, an ML benchmark that overcomes these challenges. Our analysis quantitatively evaluates MLPerf's efficacy at driving performance and scalability improvements across two rounds of results from multiple vendors.

preprint2020arXiv

The Architectural Implications of Facebook's DNN-based Personalized Recommendation

The widespread application of deep learning has changed the landscape of computation in the data center. In particular, personalized recommendation for content ranking is now largely accomplished leveraging deep neural networks. However, despite the importance of these models and the amount of compute cycles they consume, relatively little research attention has been devoted to systems for recommendation. To facilitate research and to advance the understanding of these workloads, this paper presents a set of real-world, production-scale DNNs for personalized recommendation coupled with relevant performance metrics for evaluation. In addition to releasing a set of open-source workloads, we conduct in-depth analysis that underpins future system design and optimization for at-scale recommendation: Inference latency varies by 60% across three Intel server generations, batching and co-location of inferences can drastically improve latency-bounded throughput, and the diverse composition of recommendation models leads to different optimization strategies.

preprint2019arXiv

RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing

Personalized recommendation systems leverage deep learning models and account for the majority of data center AI cycles. Their performance is dominated by memory-bound sparse embedding operations with unique irregular memory access patterns that pose a fundamental challenge to accelerate. This paper proposes a lightweight, commodity DRAM compliant, near-memory processing solution to accelerate personalized recommendation inference. The in-depth characterization of production-grade recommendation models shows that embedding operations with high model-, operator- and data-level parallelism lead to memory bandwidth saturation, limiting recommendation inference performance. We propose RecNMP which provides a scalable solution to improve system throughput, supporting a broad range of sparse embedding models. RecNMP is specifically tailored to production environments with heavy co-location of operators on a single server. Several hardware/software co-optimization techniques such as memory-side caching, table-aware packet scheduling, and hot entry profiling are studied, resulting in up to 9.8x memory latency speedup over a highly-optimized baseline. Overall, RecNMP offers 4.2x throughput improvement and 45.8% memory energy savings.

preprint2015arXiv

Application of Multi factor authentication in Internet of Things domain

Authentication forms the gateway to any secure system. Together with integrity, confidentiality and authorization it helps in preventing any sort of intrusions into the system. Up until a few years back password based authentication was the most common form of authentication to any secure network. But with the advent of more sophisticated technologies this form of authentication although still widely used has become insecure. Furthermore, with the rise of 'Internet of Things' where the number of devices would grow manifold it would be infeasible for user to remember innumerable passwords. Therefore, it's important to address this concern by devising ways in which multiple forms of authentication would be required to gain access to any smart devices and at the same time its usability would be high. In this paper, a methodology is discussed as to what kind of authentication mechanisms could be deployed in internet of things (IOT).

preprint2015arXiv

Comparison between security majors in virtual machine and linux containers

Virtualization started to gain traction in the domain of information technology in the early 2000s when managing resource distribution was becoming an uphill task for developers. As a result, tools like VMWare, Hyper V (hypervisor) started making inroads into the software repository on different operating systems. VMWare and Hyper V could support multiple virtual machines running on them with each having their own isolated environment. Due to this isolation, the security aspects of virtual machines (VMs) did not differ much from that of physical machines (having a dedicated operating system on hardware). The advancement made in the domain of linux containers (LXC) has taken virtualization to an altogether different level where resource utilization by various applications has been further optimized. But the container security has assumed primary importance amongst the researchers today and this paper is inclined towards providing a brief overview about comparisons between security of container and VMs.

preprint2015arXiv

Monitoring in IOT enabled devices

As network size continues to grow exponentially, there has been a proportionate increase in the number of nodes in the corresponding network. With the advent of Internet of things (IOT), it is assumed that many more devices will be connected to the existing network infrastructure. As a result, monitoring is expected to get more complex for administrators as networks tend to become more heterogeneous. Moreover, the addressing for IOTs would be more complex given the scale at which devices will be added to the network and hence monitoring is bound to become an uphill task due to management of larger range of addresses. This paper will throw light on what kind of monitoring mechanisms can be deployed in internet of things (IOTs) and their overall effectiveness.

preprint2015arXiv

Secure management of logs in internet of things

Ever since the advent of computing, managing data has been of extreme importance. With innumerable devices getting added to network infrastructure, there has been a proportionate increase in the data which needs to be stored. With the advent of Internet of Things (IOT) it is anticipated that billions of devices will be a part of the internet in another decade. Since those devices will be communicating with each other on a regular basis with little or no human intervention, plethora of real time data will be generated in quick time which will result in large number of log files. Apart from complexity pertaining to storage, it will be mandatory to maintain confidentiality and integrity of these logs in IOT enabled devices. This paper will provide a brief overview about how logs can be efficiently and securely stored in IOT devices.

Udit Gupta

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

CarbonClarity: Understanding and Addressing Uncertainty in Embodied Carbon for Sustainable Computing

Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation

Sustainable AI: Environmental Implications, Challenges and Opportunities

RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference

DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference

MLPerf Training Benchmark

The Architectural Implications of Facebook's DNN-based Personalized Recommendation

RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing

Application of Multi factor authentication in Internet of Things domain

Comparison between security majors in virtual machine and linux containers

Monitoring in IOT enabled devices

Secure management of logs in internet of things