Source author record

Scott Shenker

Scott Shenker appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Networking and Internet Architecture Distributed, Parallel, and Cluster Computing Databases Artificial Intelligence Data Structures and Algorithms Genomics Logic in Computer Science Machine Learning Operating Systems

Catalog footprint

What is connected

17works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Rethinking Network Topologies for Cost-Effective Mixture-of-Experts LLM Serving

Mixture-of-experts (MoE) architectures have turned LLM serving into a cluster-scale workload in which communication consumes a considerable portion of LLM serving runtime. This has prompted industry to invest heavily in expensive high-bandwidth scale-up networks. We question whether such costly infrastructure is strictly necessary. We present the first systematic cross-layer analysis of network cost-effectiveness for MoE LLM serving, comparing four representative XPU (e.g., GPU/TPU) topologies (scale-up, scale-out, 3D torus, and 3D full-mesh). We find that lower-cost switchless topologies are more cost-effective than the scale-up topology across all serving scenarios explored, improving cost-effectiveness by 20.6-56.2%. In particular, the 3D full-mesh topology is Pareto-optimal in terms of the performance-cost tradeoff. We also find that current scale-up link bandwidths are over-provisioned: reducing the link bandwidth improves throughput per cost by up to 27%. A forward-looking analysis of upcoming GPU generations indicates that the cost-performance advantage of switchless networks will likely persist.

preprint2026arXiv

SkyNomad: On Using Multi-Region Spot Instances to Minimize AI Batch Job Cost

AI batch jobs such as model training, inference pipelines, and data analytics require substantial GPU resources and often need to finish before a deadline. Spot instances offer 3-10x lower cost than on-demand instances, but their unpredictable availability makes meeting deadlines difficult. Existing systems either rely solely on spot instances and risk deadline violations, or operate in simplified single-region settings. These approaches overlook substantial spatial and temporal heterogeneity in spot availability, lifetimes, and prices. We show that exploiting such heterogeneity to access more spot capacity is the key to reduce the job execution cost. We present SkyNomad, a multi-region scheduling system that maximizes spot usage and minimizes cost while guaranteeing deadlines. SkyNomad uses lightweight probing to estimate availability, predicts spot lifetimes, accounts for migration cost, and unifies regional characteristics and deadline pressure into a monetary cost model that guides scheduling decisions. Our evaluation shows that SkyNomad achieves 1.25-3.96x cost savings in real cloud deployments and performs within 10% cost differences of an optimal policy in simulation, while consistently meeting deadlines.

preprint2022arXiv

3PO: Programmed Far-Memory Prefetching for Oblivious Applications

Using memory located on remote machines, or far memory, as a swap space is a promising approach to meet the increasing memory demands of modern datacenter applications. Operating systems have long relied on prefetchers to mask the increased latency of fetching pages from swap space to main memory. Unfortunately, with traditional prefetching heuristics, performance still degrades when applications use far memory. In this paper we propose a new prefetching technique for far-memory applications. We focus our efforts on memory-intensive, oblivious applications whose memory access patterns are independent of their inputs, such as matrix multiplication. For this class of applications we observe that we can perfectly prefetch pages without relying on heuristics. However, prefetching perfectly without requiring significant application modifications is challenging. In this paper we describe the design and implementation of 3PO, a system that provides pre-planned prefetching for general oblivious applications. We demonstrate that 3PO can accelerate applications, e.g., running them 30-150% faster than with Linux's prefetcher with 20% local memory. We also use 3PO to understand the fundamental software overheads of prefetching in a paging-based system, and the minimum performance penalty that they impose when we run applications under constrained local memory.

preprint2022arXiv

Blockaid: Data Access Policy Enforcement for Web Applications

Modern web applications serve large amounts of sensitive user data, access to which is typically governed by data-access policies. Enforcing such policies is crucial to preventing improper data access, and prior work has proposed many enforcement mechanisms. However, these prior methods either alter application semantics or require adopting a new programming model; the former can result in unexpected application behavior, while the latter cannot be used with existing web frameworks. Blockaid is an access-policy enforcement system that preserves application semantics and is compatible with existing web frameworks. It intercepts database queries from the application, attempts to verify that each query is policy-compliant, and blocks queries that are not. It verifies policy compliance using SMT solvers and generalizes and caches previous compliance decisions for better performance. We show that Blockaid supports existing web applications while requiring minimal code changes and adding only modest overheads.

preprint2022arXiv

The Sky Above The Clouds

Technology ecosystems often undergo significant transformations as they mature. For example, telephony, the Internet, and PCs all started with a single provider, but in the United States each is now served by a competitive market that uses comprehensive and universal technology standards to provide compatibility. This white paper presents our view on how the cloud ecosystem, barely over fifteen years old, could evolve as it matures.

preprint2020arXiv

How to Train your DNN: The Network Operator Edition

Deep Neural Nets have hit quite a crest, But physical networks are where they must rest, And here we put them all to the test, To see which network optimization is best.

preprint2016arXiv

Exploring the Limits of Static Failover Routing

We present and study the Static-Routing-Resiliency problem, motivated by routing on the Internet: Given a graph $G$, a unique destination vertex $d$, and an integer constant $c>0$, does there exist a static and destination-based routing scheme such that the correct delivery of packets from any source $s$ to the destination $d$ is guaranteed so long as (1) no more than $c$ edges fail and (2) there exists a physical path from $s$ to $d$? We embark upon a systematic exploration of this fundamental question in a variety of models (deterministic routing, randomized routing, with packet-duplication, with packet-header-rewriting) and present both positive and negative results that relate the edge-connectivity of a graph, i.e., the minimum number of edges whose deletion partitions $G$, to its resiliency.

preprint2016arXiv

Recursive SDN for Carrier Networks

Control planes for global carrier networks should be programmable (so that new functionality can be easily introduced) and scalable (so they can handle the numerical scale and geographic scope of these networks). Neither traditional control planes nor new SDN-based control planes meet both of these goals. In this paper, we propose a framework for recursive routing computations that combines the best of SDN (programmability) and traditional networks (scalability through hierarchy) to achieve these two desired properties. Through simulation on graphs of up to 10,000 nodes, we evaluate our design's ability to support a variety of routing and traffic engineering solutions, while incorporating a fast failure recovery mechanism.

preprint2016arXiv

Verifying Reachability in Networks with Mutable Datapaths

Recent work has made great progress in verifying the forwarding correctness of networks . However, these approaches cannot be used to verify networks containing middleboxes, such as caches and firewalls, whose forwarding behavior depends on previously observed traffic. We explore how to verify reachability properties for networks that include such "mutable datapath" elements. We want our verification results to hold not just for the given network, but also in the presence of failures. The main challenge lies in scaling the approach to handle large and complicated networks, We address by developing and leveraging the concept of slices, which allow network-wide verification to only require analyzing small portions of the network. We show that with slices the time required to verify an invariant on many production networks is independent of the size of the network itself.

preprint2015arXiv

Universal Packet Scheduling

In this paper we address a seemingly simple question: Is there a universal packet scheduling algorithm? More precisely, we analyze (both theoretically and empirically) whether there is a single packet scheduling algorithm that, at a network-wide level, can match the results of any given scheduling algorithm. We find that in general the answer is "no". However, we show theoretically that the classical Least Slack Time First (LSTF) scheduling algorithm comes closest to being universal and demonstrate empirically that LSTF can closely, though not perfectly, replay a wide range of scheduling algorithms in realistic network settings. We then evaluate whether LSTF can be used {\em in practice} to meet various network-wide objectives by looking at three popular performance metrics (mean FCT, tail packet delays, and fairness); we find that LSTF performs comparable to the state-of-the-art for each of them.

preprint2014arXiv

A cost-benefit analysis of low latency via added utilization

Several recently proposed techniques achieve latency reduction by trading it off for some amount of additional bandwidth usage. But how would one quantify whether the tradeoff is actually beneficial in a given system? We develop an economic cost vs. benefit analysis for answering this question. We use the analysis to derive a benchmark for wide-area client-server applications, and demonstrate how it can be applied to reason about a particular latency saving technique --- redundant DNS requests.

preprint2014arXiv

Verifying Isolation Properties in the Presence of Middleboxes

Great progress has been made recently in verifying the correctness of router forwarding tables. However, these approaches do not work for networks containing middleboxes such as caches and firewalls whose forwarding behavior depends on previously observed traffic. We explore how to verify isolation properties in networks that include such "dynamic datapath" elements using model checking. Our work leverages recent advances in SMT solvers, and the main challenge lies in scaling the approach to handle large and complicated networks. While the straightforward application of model checking to this problem can only handle very small networks (if at all), our approach can verify simple realistic invariants on networks containing 30,000 middleboxes in a few minutes.

preprint2013arXiv

Low latency via redundancy

Low latency is critical for interactive networked applications. But while we know how to scale systems to increase capacity, reducing latency --- especially the tail of the latency distribution --- can be much more difficult. In this paper, we argue that the use of redundancy is an effective way to convert extra capacity into reduced latency. By initiating redundant operations across diverse resources and using the first result which completes, redundancy improves a system's latency even under exceptional conditions. We study the tradeoff with added system utilization, characterizing the situations in which replicating all tasks reduces mean latency. We then demonstrate empirically that replicating all operations can result in significant mean and tail latency reduction in real-world systems including DNS queries, database servers, and packet forwarding within networks.

preprint2012arXiv

On the Resilience of Routing Tables

Many modern network designs incorporate "failover" paths into routers' forwarding tables. We initiate the theoretical study of the conditions under which such resilient routing tables can guarantee delivery of packets.

preprint2012arXiv

Shark: SQL and Rich Analytics at Scale

Shark is a new data analysis system that marries query processing with complex analytics on large clusters. It leverages a novel distributed memory abstraction to provide a unified engine that can run SQL queries and sophisticated analytics functions (e.g., iterative machine learning) at scale, and efficiently recovers from failures mid-query. This allows Shark to run SQL queries up to 100x faster than Apache Hive, and machine learning programs up to 100x faster than Hadoop. Unlike previous systems, Shark shows that it is possible to achieve these speedups while retaining a MapReduce-like execution engine, and the fine-grained fault tolerance properties that such engines provide. It extends such an engine in several ways, including column-oriented in-memory storage and dynamic mid-query replanning, to effectively execute SQL. The result is a system that matches the speedups reported for MPP analytic databases over MapReduce, while offering fault tolerance properties and complex analytics capabilities that they lack.

preprint2012arXiv

Slick Packets

Source-controlled routing has been proposed as a way to improve flexibility of future network architectures, as well as simplifying the data plane. However, if a packet specifies its path, this precludes fast local re-routing within the network. We propose SlickPackets, a novel solution that allows packets to slip around failures by specifying alternate paths in their headers, in the form of compactly-encoded directed acyclic graphs. We show that this can be accomplished with reasonably small packet headers for real network topologies, and results in responsiveness to failures that is competitive with past approaches that require much more state within the network. Our approach thus enables fast failure response while preserving the benefits of source-controlled routing.

preprint2011arXiv

Faster and More Accurate Sequence Alignment with SNAP

We present the Scalable Nucleotide Alignment Program (SNAP), a new short and long read aligner that is both more accurate (i.e., aligns more reads with fewer errors) and 10-100x faster than state-of-the-art tools such as BWA. Unlike recent aligners based on the Burrows-Wheeler transform, SNAP uses a simple hash index of short seed sequences from the genome, similar to BLAST's. However, SNAP greatly reduces the number and cost of local alignment checks performed through several measures: it uses longer seeds to reduce the false positive locations considered, leverages larger memory capacities to speed index lookup, and excludes most candidate locations without fully computing their edit distance to the read. The result is an algorithm that scales well for reads from one hundred to thousands of bases long and provides a rich error model that can match classes of mutations (e.g., longer indels) that today's fast aligners ignore. We calculate that SNAP can align a dataset with 30x coverage of a human genome in less than an hour for a cost of $2 on Amazon EC2, with higher accuracy than BWA. Finally, we describe ongoing work to further improve SNAP.

Scott Shenker

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Rethinking Network Topologies for Cost-Effective Mixture-of-Experts LLM Serving

SkyNomad: On Using Multi-Region Spot Instances to Minimize AI Batch Job Cost

3PO: Programmed Far-Memory Prefetching for Oblivious Applications

Blockaid: Data Access Policy Enforcement for Web Applications

The Sky Above The Clouds

How to Train your DNN: The Network Operator Edition

Exploring the Limits of Static Failover Routing

Recursive SDN for Carrier Networks

Verifying Reachability in Networks with Mutable Datapaths

Universal Packet Scheduling

A cost-benefit analysis of low latency via added utilization

Verifying Isolation Properties in the Presence of Middleboxes

Low latency via redundancy

On the Resilience of Routing Tables

Shark: SQL and Rich Analytics at Scale

Slick Packets

Faster and More Accurate Sequence Alignment with SNAP