Source author record

Aurojit Panda

Aurojit Panda appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Networking and Internet Architecture Distributed, Parallel, and Cluster Computing Machine Learning Databases Hardware Architecture Logic in Computer Science Operating Systems Software Engineering

Catalog footprint

What is connected

13works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Revisiting Speculative Leaderless Protocols for Low-Latency BFT Replication

As Byzantine Fault Tolerant (BFT) protocols begin to be used in permissioned blockchains for user-facing applications such as payments, it is crucial that they provide low latency. In pursuit of low latency, some recently proposed BFT consensus protocols employ a leaderless optimistic fast path, in which clients broadcast their requests directly to replicas without first serializing requests at a leader, resulting in an end-to-end commit latency of 2 message delays ($2Δ$) during fault-free, synchronous periods. However, such a fast path only works if there is no contention: concurrent contending requests can cause replicas to diverge if they receive conflicting requests in different orders, triggering costly recovery procedures. In this work, we present Aspen, a leaderless BFT protocol that achieves a near-optimal latency of $2Δ+ \varepsilon$, where $\varepsilon$ indicates a short waiting delay. Aspen removes the no-contention condition by utilizing a best-effort sequencing layer based on loosely synchronized clocks and network delay estimates. Aspen requires $n = 3f + 2p + 1$ replicas to cope with up to $f$ Byzantine nodes. The $2p$ extra nodes allow Aspen's fast path to proceed even if up to $p$ replicas diverge due to unpredictable network delays. When its optimistic conditions do not hold, Aspen falls back to PBFT-style protocol, guaranteeing safety and liveness under partial synchrony. In experiments with wide-area distributed replicas, Aspen commits requests in less than 75 ms, a 1.2 to 3.3$\times$ improvement compared to previous protocols, while supporting 19,000 requests per second.

preprint2022arXiv

3PO: Programmed Far-Memory Prefetching for Oblivious Applications

Using memory located on remote machines, or far memory, as a swap space is a promising approach to meet the increasing memory demands of modern datacenter applications. Operating systems have long relied on prefetchers to mask the increased latency of fetching pages from swap space to main memory. Unfortunately, with traditional prefetching heuristics, performance still degrades when applications use far memory. In this paper we propose a new prefetching technique for far-memory applications. We focus our efforts on memory-intensive, oblivious applications whose memory access patterns are independent of their inputs, such as matrix multiplication. For this class of applications we observe that we can perfectly prefetch pages without relying on heuristics. However, prefetching perfectly without requiring significant application modifications is challenging. In this paper we describe the design and implementation of 3PO, a system that provides pre-planned prefetching for general oblivious applications. We demonstrate that 3PO can accelerate applications, e.g., running them 30-150% faster than with Linux's prefetcher with 20% local memory. We also use 3PO to understand the fundamental software overheads of prefetching in a paging-based system, and the minimum performance penalty that they impose when we run applications under constrained local memory.

preprint2022arXiv

Blockaid: Data Access Policy Enforcement for Web Applications

Modern web applications serve large amounts of sensitive user data, access to which is typically governed by data-access policies. Enforcing such policies is crucial to preventing improper data access, and prior work has proposed many enforcement mechanisms. However, these prior methods either alter application semantics or require adopting a new programming model; the former can result in unexpected application behavior, while the latter cannot be used with existing web frameworks. Blockaid is an access-policy enforcement system that preserves application semantics and is compatible with existing web frameworks. It intercepts database queries from the application, attempts to verify that each query is policy-compliant, and blocks queries that are not. It verifies policy compliance using SMT solvers and generalizes and caches previous compliance decisions for better performance. We show that Blockaid supports existing web applications while requiring minimal code changes and adding only modest overheads.

preprint2022arXiv

Isolation mechanisms for high-speed packet-processing pipelines

Data-plane programmability is now mainstream. As we find more use cases, deployments need to be able to run multiple packet-processing modules in a single device. These are likely to be developed by independent teams, either within the same organization or from multiple organizations. Therefore, we need isolation mechanisms to ensure that modules on the same device do not interfere with each other. This paper presents Menshen, an extension of the Reconfigurable Match Tables (RMT) pipeline that enforces isolation between different packet-processing modules. Menshen is comprised of a set of lightweight hardware primitives and an extension to the open source P4-16 reference compiler that act in conjunction to meet this goal. We have prototyped Menshen on two FPGA platforms (NetFPGA and Corundum). We show that our design provides isolation, and allows new modules to be loaded without impacting the ones already running. Finally, we demonstrate that feasibility of implementing Menshen on ASICs by using the FreePDK45nm technology library and the Synopsys DC synthesis software, showing that our design meets timing at a 1GHz clock frequency and needs approximately 6% additional chip area. We have open sourced the code for Menshen's hardware and software at https://isolation.quest/.

preprint2022arXiv

Measuring the Effect of Training Data on Deep Learning Predictions via Randomized Experiments

We develop a new, principled algorithm for estimating the contribution of training data points to the behavior of a deep learning model, such as a specific prediction it makes. Our algorithm estimates the AME, a quantity that measures the expected (average) marginal effect of adding a data point to a subset of the training data, sampled from a given distribution. When subsets are sampled from the uniform distribution, the AME reduces to the well-known Shapley value. Our approach is inspired by causal inference and randomized experiments: we sample different subsets of the training data to train multiple submodels, and evaluate each submodel's behavior. We then use a LASSO regression to jointly estimate the AME of each data point, based on the subset compositions. Under sparsity assumptions ($k \ll N$ datapoints have large AME), our estimator requires only $O(k\log N)$ randomized submodel trainings, improving upon the best prior Shapley value estimators.

preprint2022arXiv

NNSmith: Generating Diverse and Valid Test Cases for Deep Learning Compilers

Deep-learning (DL) compilers such as TVM and TensorRT are increasingly being used to optimize deep neural network (DNN) models to meet performance, resource utilization and other requirements. Bugs in these compilers can result in models whose semantics differ from the original ones, producing incorrect results that corrupt the correctness of downstream applications. However, finding bugs in these compilers is challenging due to their complexity. In this work, we propose a new fuzz testing approach for finding bugs in deep-learning compilers. Our core approach consists of (i) generating diverse yet valid DNN test models that can exercise a large part of the compiler's transformation logic using light-weight operator specifications; (ii) performing gradient-based search to find model inputs that avoid any floating-point exceptional values during model execution, reducing the chance of missed bugs or false alarms; and (iii) using differential testing to identify bugs. We implemented this approach in NNSmith which has found 72 new bugs for TVM, TensorRT, ONNXRuntime, and PyTorch to date. Of these 58 have been confirmed and 51 have been fixed by their respective project maintainers.

preprint2020arXiv

How to Train your DNN: The Network Operator Edition

Deep Neural Nets have hit quite a crest, But physical networks are where they must rest, And here we put them all to the test, To see which network optimization is best.

preprint2016arXiv

Exploring the Limits of Static Failover Routing

We present and study the Static-Routing-Resiliency problem, motivated by routing on the Internet: Given a graph $G$, a unique destination vertex $d$, and an integer constant $c>0$, does there exist a static and destination-based routing scheme such that the correct delivery of packets from any source $s$ to the destination $d$ is guaranteed so long as (1) no more than $c$ edges fail and (2) there exists a physical path from $s$ to $d$? We embark upon a systematic exploration of this fundamental question in a variety of models (deterministic routing, randomized routing, with packet-duplication, with packet-header-rewriting) and present both positive and negative results that relate the edge-connectivity of a graph, i.e., the minimum number of edges whose deletion partitions $G$, to its resiliency.

preprint2016arXiv

Recursive SDN for Carrier Networks

Control planes for global carrier networks should be programmable (so that new functionality can be easily introduced) and scalable (so they can handle the numerical scale and geographic scope of these networks). Neither traditional control planes nor new SDN-based control planes meet both of these goals. In this paper, we propose a framework for recursive routing computations that combines the best of SDN (programmability) and traditional networks (scalability through hierarchy) to achieve these two desired properties. Through simulation on graphs of up to 10,000 nodes, we evaluate our design's ability to support a variety of routing and traffic engineering solutions, while incorporating a fast failure recovery mechanism.

preprint2016arXiv

Verifying Reachability in Networks with Mutable Datapaths

Recent work has made great progress in verifying the forwarding correctness of networks . However, these approaches cannot be used to verify networks containing middleboxes, such as caches and firewalls, whose forwarding behavior depends on previously observed traffic. We explore how to verify reachability properties for networks that include such "mutable datapath" elements. We want our verification results to hold not just for the given network, but also in the presence of failures. The main challenge lies in scaling the approach to handle large and complicated networks, We address by developing and leveraging the concept of slices, which allow network-wide verification to only require analyzing small portions of the network. We show that with slices the time required to verify an invariant on many production networks is independent of the size of the network itself.

preprint2014arXiv

Verifying Isolation Properties in the Presence of Middleboxes

Great progress has been made recently in verifying the correctness of router forwarding tables. However, these approaches do not work for networks containing middleboxes such as caches and firewalls whose forwarding behavior depends on previously observed traffic. We explore how to verify isolation properties in networks that include such "dynamic datapath" elements using model checking. Our work leverages recent advances in SMT solvers, and the main challenge lies in scaling the approach to handle large and complicated networks. While the straightforward application of model checking to this problem can only handle very small networks (if at all), our approach can verify simple realistic invariants on networks containing 30,000 middleboxes in a few minutes.

preprint2012arXiv

BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data

In this paper, we present BlinkDB, a massively parallel, sampling-based approximate query engine for running ad-hoc, interactive SQL queries on large volumes of data. The key insight that BlinkDB builds on is that one can often make reasonable decisions in the absence of perfect answers. For example, reliably detecting a malfunctioning server using a distributed collection of system logs does not require analyzing every request processed by the system. Based on this insight, BlinkDB allows one to trade-off query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars. To achieve this, BlinkDB uses two key ideas that differentiate it from previous work in this area: (1) an adaptive optimization framework that builds and maintains a set of multi-dimensional, multi-resolution samples from original data over time, and (2) a dynamic sample selection strategy that selects an appropriately sized sample based on a query's accuracy and/or response time requirements. We have built an open-source version of BlinkDB and validated its effectiveness using the well-known TPC-H benchmark as well as a real-world analytic workload derived from Conviva Inc. Our experiments on a 100 node cluster show that BlinkDB can answer a wide range of queries from a real-world query trace on up to 17 TBs of data in less than 2 seconds (over 100\times faster than Hive), within an error of 2 - 10%.

preprint2012arXiv

On the Resilience of Routing Tables

Many modern network designs incorporate "failover" paths into routers' forwarding tables. We initiate the theoretical study of the conditions under which such resilient routing tables can guarantee delivery of packets.

Aurojit Panda

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Revisiting Speculative Leaderless Protocols for Low-Latency BFT Replication

3PO: Programmed Far-Memory Prefetching for Oblivious Applications

Blockaid: Data Access Policy Enforcement for Web Applications

Isolation mechanisms for high-speed packet-processing pipelines

Measuring the Effect of Training Data on Deep Learning Predictions via Randomized Experiments

NNSmith: Generating Diverse and Valid Test Cases for Deep Learning Compilers

How to Train your DNN: The Network Operator Edition

Exploring the Limits of Static Failover Routing

Recursive SDN for Carrier Networks

Verifying Reachability in Networks with Mutable Datapaths

Verifying Isolation Properties in the Presence of Middleboxes

BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data

On the Resilience of Routing Tables