Source author record

Ramesh K. Sitaraman

Ramesh K. Sitaraman appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Networking and Internet Architecture Data Structures and Algorithms Computer Vision eess.IV Multimedia Systems and Control

Catalog footprint

What is connected

8works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CATRF: Codec-Adaptive TriPlane Radiance Fields for Volumetric Content Delivery

Volumetric media promises next-generation content delivery applications, but its bandwidth demand remains a key bottleneck. Implicit and hybrid volumetric representations reduce model sizes, yet still require careful coding to reach 2D video-like bitrates. We present CATRF, a standard-codec-in-the-loop compression framework for plane-factorized radiance fields. During training, we quantize and pack 2D feature planes into codec-friendly canvases, run a standard codec roundtrip (JPEG/VP9/HEVC/AV1), then unpack and dequantize the decoded features before volume rendering. We use a straight-through estimator (STE) to insert the non-differentiable, standard codec pipeline into the training loop, allowing radiance-field features to adapt directly to the real, client-side codec distortions without introducing any learned codec parameters. On both static and dynamic benchmarks, CATRF consistently achieves a better rate-distortion trade-off over codec-agnostic and learned-codec-in-the-loop baselines, and also outperforms recent compressed 3DGS methods in both compression efficiency and decoding speed. These results highlight a practical path toward low-bitrate, compression-resilient volumetric representations for free-viewpoint video streaming.

preprint2026arXiv

HADIS: Hybrid Adaptive Diffusion Model Serving for Efficient Text-to-Image Generation

Text-to-image diffusion models have achieved remarkable visual quality but incur high computational costs, making latency-aware, scalable deployment challenging. To address this, we advocate a hybrid architecture that achieves query awareness when serving diffusion models. Unlike existing query-aware serving systems that cascade lightweight and heavyweight models with a fixed configuration, our hybrid architecture first routes each query directly to a suitable model variant, then reroutes it to a cascaded heavyweight model only if necessary. We theoretically analyze conditions for the hybrid architecture to outperform non-hybrid alternatives in latency and response quality. Building on this architecture, we design HADIS, a hybrid serving system for latency-aware diffusion models that jointly optimizes cascade model selection, query routing, and resource allocation. To reduce the complexity of resource management, HADIS uses an offline profiling phase to produce a Pareto-optimal cascade configuration table. At runtime, HADIS selects the best cascade configuration and GPU allocation given latency and workload constraints. Empirical evaluations on real-world traces demonstrate that HADIS improves response quality by up to 35% while reducing latency violation rates by 2.7-45$\times$ compared to state-of-the-art model serving systems.

preprint2020arXiv

BOLA: Near-Optimal Bitrate Adaptation for Online Videos

Modern video players employ complex algorithms to adapt the bitrate of the video that is shown to the user. Bitrate adaptation requires a tradeoff between reducing the probability that the video freezes (rebuffers) and enhancing the quality of the video. A bitrate that is too high leads to frequent rebuffering, while a bitrate that is too low leads to poor video quality. Video providers segment videos into short segments and encode each segment at multiple bitrates. The video player adaptively chooses the bitrate of each segment to download, possibly choosing different bitrates for successive segments. We formulate bitrate adaptation as a utility-maximization problem and devise an online control algorithm called BOLA that uses Lyapunov optimization to minimize rebuffering and maximize video quality. We prove that BOLA achieves a time-average utility that is within an additive term O(1/V) of the optimal value, for a control parameter V related to the video buffer size. Further, unlike prior work, BOLA does not require prediction of available network bandwidth. We empirically validate BOLA in a simulated network environment using a collection of network traces. We show that BOLA achieves near-optimal utility and in many cases significantly higher utility than current state-of-the-art algorithms. Our work has immediate impact on real-world video players and for the evolving DASH standard for video transmission. We also implemented an updated version of BOLA that is now part of the standard reference player dash.js and is used in production by several video providers such as Akamai, BBC, CBS, and Orange.

preprint2016arXiv

Go-With-The-Winner: Client-Side Server Selection for Content Delivery

Content delivery networks deliver much of the web and video content in the world by deploying a large distributed network of servers. We model and analyze a simple paradigm for client-side server selection that is commonly used in practice where each user independently measures the performance of a set of candidate servers and selects the one that performs the best. For web (resp., video) delivery, we propose and analyze a simple algorithm where each user randomly chooses two or more candidate servers and selects the server that provided the best hit rate (resp., bit rate). We prove that the algorithm converges quickly to an optimal state where all users receive the best hit rate (resp., bit rate), with high probability. We also show that if each user chose just one random server instead of two, some users receive a hit rate (resp., bit rate) that tends to zero. We simulate our algorithm and evaluate its performance with varying choices of parameters, system load, and content popularity.

preprint2013arXiv

Dynamic Provisioning in Next-Generation Data Centers with On-site Power Production

The critical need for clean and economical sources of energy is transforming data centers that are primarily energy consumers to also energy producers. We focus on minimizing the operating costs of next-generation data centers that can jointly optimize the energy supply from on-site generators and the power grid, and the energy demand from servers as well as power conditioning and cooling systems. We formulate the cost minimization problem and present an offline optimal algorithm. For "on-grid" data centers that use only the grid, we devise a deterministic online algorithm that achieves the best possible competitive ratio of $2-α_{s}$, where $α_{s}$ is a normalized look-ahead window size. For "hybrid" data centers that have on-site power generation in addition to the grid, we develop an online algorithm that achieves a competitive ratio of at most \textmd{\normalsize {\small $\frac{P_{\max} (2-α_{s})}{c_{o}+c_{m}/L} [1+2\frac{P_{\max}-c_{o}}{P_{\max}(1+α_{g})}]$}}, where $α_{s}$ and $α_{g}$ are normalized look-ahead window sizes, $P_{\max}$ is the maximum grid power price, and $L$, $c_{o}$, and $c_{m}$ are parameters of an on-site generator. Using extensive workload traces from Akamai with the corresponding grid power prices, we simulate our offline and online algorithms in a realistic setting. Our offline (resp., online) algorithm achieves a cost reduction of 25.8% (resp., 20.7%) for a hybrid data center and 12.3% (resp., 7.3%) for an on-grid data center. The cost reductions are quite significant and make a strong case for a joint optimization of energy supply and energy demand in a data center. A hybrid data center provides about 13% additional cost reduction over an on-grid data center representing the additional cost benefits that on-site power generation provides over using the grid alone.

preprint2012arXiv

Optimizing MapReduce for Highly Distributed Environments

MapReduce, the popular programming paradigm for large-scale data processing, has traditionally been deployed over tightly-coupled clusters where the data is already locally available. The assumption that the data and compute resources are available in a single central location, however, no longer holds for many emerging applications in commercial, scientific and social networking domains, where the data is generated in a geographically distributed manner. Further, the computational resources needed for carrying out the data analysis may be distributed across multiple data centers or community resources such as Grids. In this paper, we develop a modeling framework to capture MapReduce execution in a highly distributed environment comprising distributed data sources and distributed computational resources. This framework is flexible enough to capture several design choices and performance optimizations for MapReduce execution. We propose a model-driven optimization that has two key features: (i) it is end-to-end as opposed to myopic optimizations that may only make locally optimal but globally suboptimal decisions, and (ii) it can control multiple MapReduce phases to achieve low runtime, as opposed to single-phase optimizations that may control only individual phases. Our model results show that our optimization can provide nearly 82% and 64% reduction in execution time over myopic and single-phase optimizations, respectively. We have modified Hadoop to implement our model outputs, and using three different MapReduce applications over an 8-node emulated PlanetLab testbed, we show that our optimized Hadoop execution plan achieves 31-41% reduction in runtime over a vanilla Hadoop execution. Our model-driven optimization also provides several insights into the choice of techniques and execution parameters based on application and platform characteristics.

preprint2011arXiv

Algorithms for Constructing Overlay Networks For Live Streaming

We present a polynomial time approximation algorithm for constructing an overlay multicast network for streaming live media events over the Internet. The class of overlay networks constructed by our algorithm include networks used by Akamai Technologies to deliver live media events to a global audience with high fidelity. We construct networks consisting of three stages of nodes. The nodes in the first stage are the entry points that act as sources for the live streams. Each source forwards each of its streams to one or more nodes in the second stage that are called reflectors. A reflector can split an incoming stream into multiple identical outgoing streams, which are then sent on to nodes in the third and final stage that act as sinks and are located in edge networks near end-users. As the packets in a stream travel from one stage to the next, some of them may be lost. A sink combines the packets from multiple instances of the same stream (by reordering packets and discarding duplicates) to form a single instance of the stream with minimal loss. Our primary contribution is an algorithm that constructs an overlay network that provably satisfies capacity and reliability constraints to within a constant factor of optimal, and minimizes cost to within a logarithmic factor of optimal. Further in the common case where only the transmission costs are minimized, we show that our algorithm produces a solution that has cost within a factor of 2 of optimal. We also implement our algorithm and evaluate it on realistic traces derived from Akamai's live streaming network. Our empirical results show that our algorithm can be used to efficiently construct large-scale overlay networks in practice with near-optimal cost.

preprint2011arXiv

Energy-Aware Load Balancing in Content Delivery Networks

Internet-scale distributed systems such as content delivery networks (CDNs) operate hundreds of thousands of servers deployed in thousands of data center locations around the globe. Since the energy costs of operating such a large IT infrastructure are a significant fraction of the total operating costs, we argue for redesigning CDNs to incorporate energy optimizations as a first-order principle. We propose techniques to turn off CDN servers during periods of low load while seeking to balance three key design goals: maximize energy reduction, minimize the impact on client-perceived service availability (SLAs), and limit the frequency of on-off server transitions to reduce wear-and-tear and its impact on hardware reliability. We propose an optimal offline algorithm and an online algorithm to extract energy savings both at the level of local load balancing within a data center and global load balancing across data centers. We evaluate our algorithms using real production workload traces from a large commercial CDN. Our results show that it is possible to reduce the energy consumption of a CDN by more than 55% while ensuring a high level of availability that meets customer SLA requirements and incurring an average of one on-off transition per server per day. Further, we show that keeping even 10% of the servers as hot spares helps absorb load spikes due to global flash crowds with little impact on availability SLAs. Finally, we show that redistributing load across proximal data centers can enhance service availability significantly, but has only a modest impact on energy savings.

Ramesh K. Sitaraman

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

CATRF: Codec-Adaptive TriPlane Radiance Fields for Volumetric Content Delivery

HADIS: Hybrid Adaptive Diffusion Model Serving for Efficient Text-to-Image Generation

BOLA: Near-Optimal Bitrate Adaptation for Online Videos

Go-With-The-Winner: Client-Side Server Selection for Content Delivery

Dynamic Provisioning in Next-Generation Data Centers with On-site Power Production

Optimizing MapReduce for Highly Distributed Environments

Algorithms for Constructing Overlay Networks For Live Streaming

Energy-Aware Load Balancing in Content Delivery Networks