Source author record

Vyas Sekar

Vyas Sekar appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Networking and Internet Architecture Cryptography and Security Machine Learning Systems and Control Databases Distributed, Parallel, and Cluster Computing eess.SY Artificial Intelligence cs.CY Data Structures and Algorithms math.OC Multimedia physics.soc-ph Social and Information Networks

Catalog footprint

What is connected

20works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AHA: Scalable Alternative History Analysis for Operational Timeseries Applications

Many operational systems collect high-dimensional timeseries data about users/systems on key performance metrics. For instance, ISPs, content distribution networks, and video delivery services collect quality of experience metrics for user sessions associated with metadata (e.g., location, device, ISP). Over such historical data, operators and data analysts often need to run retrospective analysis; e.g., analyze anomaly detection algorithms, experiment with different configurations for alerts, evaluate new algorithms, and so on. We refer to this class of workloads as alternative history analysis for operational datasets. We show that in such settings, traditional data processing solutions (e.g., data warehouses, sampling, sketching, big-data systems) either pose high operational costs or do not guarantee accurate replay. We design and implement a system, called AHA (Alternative History Analytics), that overcomes both challenges to provide cost efficiency and fidelity for high-dimensional data. The design of AHA is based on analytical and empirical insights about such workloads: 1) the decomposability of underlying statistics; 2) sparsity in terms of active number of subpopulations over attribute-value combinations; and 3) efficiency structure of aggregation operations in modern analytics databases. Using multiple real-world datasets and as well as case-studies on production pipelines at a large video analytics company, we show that AHA provides 100% accuracy for a broad range of downstream tasks and up to 85x lower total cost of ownership (i.e., compute + storage) compared to conventional methods.

preprint2023arXiv

CANE: A Cascade-Control Approach for Network-Assisted Video QoE Management

Prior efforts have shown that network-assisted schemes can improve the Quality-of-Experience (QoE) and QoE fairness when multiple video players compete for bandwidth. However, realizing network-assisted schemes in practice is challenging, as: i) the network has limited visibility into the client players' internal state and actions; ii) players' actions may nullify or negate the network's actions; and iii) the players' objectives might be conflicting. To address these challenges, we formulate network-assisted QoE optimization through a cascade control abstraction. This informs the design of CANE, a practical network-assisted QoE framework. CANE uses machine learning techniques to approximate each player's behavior as a black-box model and model predictive control to achieve a near-optimal solution. We evaluate CANE through realistic simulations and show that CANE improves multiplayer QoE fairness by ~50% compared to pure client-side adaptive bitrate algorithms and by ~20% compared to uniform traffic shaping.

preprint2022arXiv

Enabling Efficient and General Subpopulation Analytics in Multidimensional Data Streams

Today's large-scale services (e.g., video streaming platforms, data centers, sensor grids) need diverse real-time summary statistics across multiple subpopulations of multidimensional datasets. However, state-of-the-art frameworks do not offer general and accurate analytics in real time at reasonable costs. The root cause is the combinatorial explosion of data subpopulations and the diversity of summary statistics we need to monitor simultaneously. We present Hydra, an efficient framework for multidimensional analytics that presents a novel combination of using a ``sketch of sketches'' to avoid the overhead of monitoring exponentially-many subpopulations and universal sketching to ensure accurate estimates for multiple statistics. We build Hydra as an Apache Spark plugin and address practical system challenges to minimize overheads at scale. Across multiple real-world and synthetic multidimensional datasets, we show that Hydra can achieve robust error bounds and is an order of magnitude more efficient in terms of operational cost and memory footprint than existing frameworks (e.g., Spark, Druid) while ensuring interactive estimation times.

preprint2022arXiv

On the Privacy Properties of GAN-generated Samples

The privacy implications of generative adversarial networks (GANs) are a topic of great interest, leading to several recent algorithms for training GANs with privacy guarantees. By drawing connections to the generalization properties of GANs, we prove that under some assumptions, GAN-generated samples inherently satisfy some (weak) privacy guarantees. First, we show that if a GAN is trained on m samples and used to generate n samples, the generated samples are (epsilon, delta)-differentially-private for (epsilon, delta) pairs where delta scales as O(n/m). We show that under some special conditions, this upper bound is tight. Next, we study the robustness of GAN-generated samples to membership inference attacks. We model membership inference as a hypothesis test in which the adversary must determine whether a given sample was drawn from the training dataset or from the underlying data distribution. We show that this adversary can achieve an area under the ROC curve that scales no better than O(m^{-1/4}).

preprint2022arXiv

RareGAN: Generating Samples for Rare Classes

We study the problem of learning generative adversarial networks (GANs) for a rare class of an unlabeled dataset subject to a labeling budget. This problem is motivated from practical applications in domains including security (e.g., synthesizing packets for DNS amplification attacks), systems and networking (e.g., synthesizing workloads that trigger high resource usage), and machine learning (e.g., generating images from a rare class). Existing approaches are unsuitable, either requiring fully-labeled datasets or sacrificing the fidelity of the rare class for that of the common classes. We propose RareGAN, a novel synthesis of three key ideas: (1) extending conditional GANs to use labelled and unlabelled data for better generalization; (2) an active learning approach that requests the most useful labels; and (3) a weighted loss function to favor learning the rare class. We show that RareGAN achieves a better fidelity-diversity tradeoff on the rare class than prior work across different applications, budgets, rare class fractions, GAN losses, and architectures.

preprint2021arXiv

Pareto GAN: Extending the Representational Power of GANs to Heavy-Tailed Distributions

Generative adversarial networks (GANs) are often billed as "universal distribution learners", but precisely what distributions they can represent and learn is still an open question. Heavy-tailed distributions are prevalent in many different domains such as financial risk-assessment, physics, and epidemiology. We observe that existing GAN architectures do a poor job of matching the asymptotic behavior of heavy-tailed distributions, a problem that we show stems from their construction. Additionally, when faced with the infinite moments and large distances between outlier points that are characteristic of heavy-tailed distributions, common loss functions produce unstable or near-zero gradients. We address these problems with the Pareto GAN. A Pareto GAN leverages extreme value theory and the functional properties of neural networks to learn a distribution that matches the asymptotic behavior of the marginal distributions of the features. We identify issues with standard loss functions and propose the use of alternative metric spaces that enable stable and efficient learning. Finally, we evaluate our proposed approach on a variety of heavy-tailed datasets.

preprint2021arXiv

Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

Limited data access is a longstanding barrier to data-driven research and development in the networked systems community. In this work, we explore if and how generative adversarial networks (GANs) can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge. As a specific target, our focus in this paper is on time series datasets with metadata (e.g., packet loss rate measurements with corresponding ISPs). We identify key challenges of existing GAN approaches for such workloads with respect to fidelity (e.g., long-term dependencies, complex multidimensional relationships, mode collapse) and privacy (i.e., existing guarantees are poorly understood and can sacrifice fidelity). To improve fidelity, we design a custom workflow called DoppelGANger (DG) and demonstrate that across diverse real-world datasets (e.g., bandwidth measurements, cluster requests, web sessions) and use cases (e.g., structural characterization, predictive modeling, algorithm comparison), DG achieves up to 43% better fidelity than baseline models. Although we do not resolve the privacy problem in this work, we identify fundamental challenges with both classical notions of privacy and recent advances to improve the privacy properties of GANs, and suggest a potential roadmap for addressing these challenges. By shedding light on the promise and challenges, we hope our work can rekindle the conversation on workflows for data sharing.

preprint2020arXiv

Fighting Fire with Light: A Case for Defending DDoS Attacks Using the Optical Layer

The DDoS attack landscape is growing at an unprecedented pace. Inspired by the recent advances in optical networking, we make a case for optical layer-aware DDoS defense (O-LAD) in this paper. Our approach leverages the optical layer to isolate attack traffic rapidly via dynamic reconfiguration of (backup) wavelengths using ROADMs---bridging the gap between (a) evolution of the DDoS attack landscape and (b) innovations in the optical layer (e.g., reconfigurable optics). We show that the physical separation of traffic profiles allows finer-grained handling of suspicious flows and offers better performance for benign traffic in the face of an attack. We present preliminary results modeling throughput and latency for legitimate flows while scaling the strength of attacks. We also identify a number of open problems for the security, optical, and systems communities: modeling diverse DDoS attacks (e.g., fixed vs. variable rate, detectable vs. undetectable), building a full-fledged defense system with optical advancements (e.g., OpenConfig), and optical layer-aware defenses for a broader class of attacks (e.g., network reconnaissance).

preprint2020arXiv

Unleashing In-network Computing on Scientific Workloads

Many recent efforts have shown that in-network computing can benefit various datacenter applications. In this paper, we explore a relatively less-explored domain which we argue can benefit from in-network computing: scientific workloads in high-performance computing. By analyzing canonical examples of HPC applications, we observe unique opportunities and challenges for exploiting in-network computing to accelerate scientific workloads. In particular, we find that the dynamic and demanding nature of scientific workloads is the major obstacle to the adoption of in-network approaches which are mostly open-loop and lack runtime feedback. In this paper, we present NSinC (Network-accelerated ScIeNtific Computing), an architecture for fully unleashing the potential benefits of in-network computing for scientific workloads by providing closed-loop runtime feedback to in-network acceleration services. We outline key challenges in realizing this vision and a preliminary design to enable acceleration for scientific applications.

preprint2016arXiv

NetMemex: Providing Full-Fidelity Traffic Archival

NetMemex explores efficient network traffic archival without any loss of information. Unlike NetFlow-like aggregation, NetMemex allows retrieving the entire packet data including full payload, which makes it useful in forensic analysis, networked and distributed system research, and network administration. Different from packet trace dumps, NetMemex performs sophisticated data compression for small storage space use and optimizes the data layout for fast query processing. NetMemex takes advantage of high-speed random access of flash drives and inexpensive storage space of hard disk drives. These efforts lead to a cost-effective yet high-performance full traffic archival system. We demonstrate that NetMemex can record full-fidelity traffic at near-Gbps rates using a single commodity machine, handling common queries at up to 90.1 K queries/second, at a low storage cost comparable to conventional hard disk-only traffic archival solutions.

preprint2016arXiv

On the Efficiency and Fairness of Multiplayer HTTP-based Adaptive Video Streaming

User-perceived quality-of-experience (QoE) is critical in internet video delivery systems. Extensive prior work has studied the design of client-side bitrate adaptation algorithms to maximize single-player QoE. However, multiplayer QoE fairness becomes critical as the growth of video traffic makes it more likely that multiple players share a bottleneck in the network. Despite several recent proposals, there is still a series of open questions. In this paper, we bring the problem space to light from a control theory perspective by formalizing the multiplayer QoE fairness problem and addressing two key questions in the broader problem space. First, we derive the sufficient conditions of convergence to steady state QoE fairness under TCP-based bandwidth sharing scheme. Based on the insight from this analysis that in-network active bandwidth allocation is needed, we propose a non-linear MPC-based, router-assisted bandwidth allocation algorithm that regards each player as closed-loop systems. We use trace-driven simulation to show the improvement over existing approaches. We identify several research directions enabled by the control theoretic modeling and envision that control theory can play an important role on guiding real system design in adaptive video streaming.

preprint2016arXiv

Shedding Light on the Adoption of Let's Encrypt

Let's Encrypt is a new entrant in the Certificate Authority ecosystem that offers free and automated certificate signing. It is visionary in its commitment to Certificate Transparency. In this paper, we shed light on the adoption patterns of Let's Encrypt "in the wild" and inform the future design and deployment of this exciting development in the security landscape. We analyze acquisition patterns of certificates as well as their usage and deployment trends in the real world. To this end, we analyze data from Certificate Transparency Logs containing records of more then 18 million certificates. We also leverage other sources like Censys, Alexa's historic records, Geolocation databases, and VirusTotal. We also perform active HTTPS measurements on the domains owning Let's Encrypt certificates. Our analysis of certificate acquisition shows that (1) the impact of Let's Encrypt is particularly visible in Western Europe; (2) Let's Encrypt has the potential to democratize HTTPS adoption in countries that are recent entrants to Internet adoption; (3) there is anecdotal evidence of popular domains quitting their previously untrustworthy or expensive CAs in order to transition to Let's Encrypt; and (4) there is a "heavy tailed" behavior where a small number of domains acquire a large number of certificates. With respect to usage, we find that: (1) only 54% of domains actually use the Let's Encrypt certificates they have procured; (2) there are many non-trivial incidents of server misconfigurations; and (3) there is early evidence of use of Let's Encrypt certificates for typosquatting and for malware-laden sites.

preprint2015arXiv

A New Approach to DDoS Defense using SDN and NFV

Networks today rely on expensive and proprietary hard- ware appliances, which are deployed at fixed locations, for DDoS defense. This introduces key limitations with respect to flexibility (e.g., complex routing to get traffic to these "chokepoints") and elasticity in handling changing attack patterns. We observe an opportunity to ad- dress these limitations using new networking paradigms such as software-defined networking (SDN) and network functions virtualization (NFV). Based on this observation, we design and implement of Bohatei, an elastic and flexible DDoS defense system. In designing Bohatei, we address key challenges of scalability, responsive- ness, and adversary-resilience. We have implemented defenses for several well-known DDoS attacks in Bohatei. Our evaluations show that Bohatei is scalable (handling 500 Gbps attacks), responsive (mitigating attacks within one minute), and resilient to dynamic adversaries.

preprint2015arXiv

Accelerating the Development of Software-Defined Network Optimization Applications Using SOL

Software-defined networking (SDN) can enable diverse network management applications such as traffic engineering, service chaining, network function outsourcing, and topology reconfiguration. Realizing the benefits of SDN for these applications, however, entails addressing complex network optimizations that are central to these problems. Unfortunately, such optimization problems require significant manual effort and expertise to express and non-trivial computation and/or carefully crafted heuristics to solve. Our vision is to simplify the deployment of SDN applications using general high-level abstractions for capturing optimization requirements from which we can efficiently generate optimal solutions. To this end, we present SOL, a framework that demonstrates that it is indeed possible to simultaneously achieve generality and efficiency. The insight underlying SOL is that SDN applications can be recast within a unifying path-based optimization abstraction, from which it efficiently generates near-optimal solutions, and device configurations to implement those solutions. We illustrate the generality of SOL by prototyping diverse and new applications. We show that SOL simplifies the development of SDN-based network optimization applications and provides comparable or better scalability than custom optimization solutions.

preprint2015arXiv

Analyzing TCP Throughput Stability and Predictability with Implications for Adaptive Video Streaming

Recent work suggests that TCP throughput stability and predictability within a video viewing session can inform the design of better video bitrate adaptation algorithms. Despite a rich tradition of Internet measurement, however, our understanding of throughput stability and predictability is quite limited. To bridge this gap, we present a measurement study of throughput stability using a large-scale dataset from a video service provider. Drawing on this analysis, we propose a simple-but-effective prediction mechanism based on a hidden Markov model and demonstrate that it outperforms other approaches. We also show the practical implications in improving the user experience of adaptive video streaming.

preprint2015arXiv

DDA: Cross-Session Throughput Prediction with Applications to Video Bitrate Selection

User experience of video streaming could be greatly improved by selecting a high-yet-sustainable initial video bitrate, and it is therefore critical to accurately predict throughput before a video session starts. Inspired by previous studies that show similarity among throughput of similar sessions (e.g., those sharing same bottleneck link), we argue for a cross-session prediction approach, where throughput measured on other sessions is used to predict the throughput of a new session. In this paper, we study the challenges of cross-session throughput prediction, develop an accurate throughput predictor called DDA, and evaluate the performance of the predictor with real-world datasets. We show that DDA can predict throughput more accurately than simple predictors and conventional machine learning algorithms; e.g., DDA's 80%ile prediction error of DDA is > 50% lower than other algorithms. We also show that this improved accuracy enables video players to select a higher sustainable initial bitrate; e.g., compared to initial bitrate without prediction, DDA leads to 4x higher average bitrate.

preprint2015arXiv

Scalable Testing of Context-Dependent Policies over Stateful Data Planes with Armstrong

Network operators today spend significant manual effort in ensuring and checking that the network meets their intended policies. While recent work in network verification has made giant strides to reduce this effort, they focus on simple reachability properties and cannot handle context-dependent policies (e.g., how many connections has a host spawned) that operators realize using stateful network functions (NFs). Together, these introduce new expressiveness and scalability challenges that fall outside the scope of existing network verification mechanisms. To address these challenges, we present Armstrong, a system that enables operators to test if network with stateful data plane elements correctly implements a given context-dependent policy. Our design makes three key contributions to address expressiveness and scalability: (1) An abstract I/O unit for modeling network I/O that encodes policy-relevant context information; (2) A practical representation of complex NFs via an ensemble of finite state machines abstraction; and (3) A scalable application of symbolic execution to tackle state space explosion. We demonstrate that Armstrong is several orders of magnitude faster than existing mechanisms.

preprint2014arXiv

A Framework to Quantify the Benefits of Network Functions Virtualization in Cellular Networks

Network functions virtualization (NFV) is an appealing vision that promises to dramatically reduce capital and operating expenses for cellular providers. However, existing efforts in this space leave open broad issues about how NFV deployments should be instantiated or how they should be provisioned. In this paper, we present an initial attempt at a framework that will help network operators systematically evaluate the potential benefits that different points in the NFV design space can offer.

preprint2014arXiv

Stratos: A Network-Aware Orchestration Layer for Virtual Middleboxes in Clouds

Enterprises want their in-cloud services to leverage the performance and security benefits that middleboxes offer in traditional deployments. Such virtualized deployments create new opportunities (e.g., flexible scaling) as well as new challenges (e.g., dynamics, multiplexing) for middlebox management tasks such as service composition and provisioning. Unfortunately, enterprises lack systematic tools to efficiently compose and provision in-the-cloud middleboxes and thus fall short of achieving the benefits that cloud-based deployments can offer. To this end, we present the design and implementation of Stratos, an orchestration layer for virtual middleboxes. Stratos provides efficient and correct composition in the presence of dynamic scaling via software-defined networking mechanisms. It ensures efficient and scalable provisioning by combining middlebox-specific traffic engineering, placement, and horizontal scaling strategies. We demonstrate the effectiveness of Stratos using an experimental prototype testbed and large-scale simulations.

preprint2012arXiv

Evolution of Social-Attribute Networks: Measurements, Modeling, and Implications using Google+

Understanding social network structure and evolution has important implications for many aspects of network and system design including provisioning, bootstrapping trust and reputation systems via social networks, and defenses against Sybil attacks. Several recent results suggest that augmenting the social network structure with user attributes (e.g., location, employer, communities of interest) can provide a more fine-grained understanding of social networks. However, there have been few studies to provide a systematic understanding of these effects at scale. We bridge this gap using a unique dataset collected as the Google+ social network grew over time since its release in late June 2011. We observe novel phenomena with respect to both standard social network metrics and new attribute-related metrics (that we define). We also observe interesting evolutionary patterns as Google+ went from a bootstrap phase to a steady invitation-only stage before a public release. Based on our empirical observations, we develop a new generative model to jointly reproduce the social structure and the node attributes. Using theoretical analysis and empirical evaluations, we show that our model can accurately reproduce the social and attribute structure of real social networks. We also demonstrate that our model provides more accurate predictions for practical application contexts.

Vyas Sekar

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

AHA: Scalable Alternative History Analysis for Operational Timeseries Applications

CANE: A Cascade-Control Approach for Network-Assisted Video QoE Management

Enabling Efficient and General Subpopulation Analytics in Multidimensional Data Streams

On the Privacy Properties of GAN-generated Samples

RareGAN: Generating Samples for Rare Classes

Pareto GAN: Extending the Representational Power of GANs to Heavy-Tailed Distributions

Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

Fighting Fire with Light: A Case for Defending DDoS Attacks Using the Optical Layer

Unleashing In-network Computing on Scientific Workloads

NetMemex: Providing Full-Fidelity Traffic Archival

On the Efficiency and Fairness of Multiplayer HTTP-based Adaptive Video Streaming

Shedding Light on the Adoption of Let's Encrypt

A New Approach to DDoS Defense using SDN and NFV

Accelerating the Development of Software-Defined Network Optimization Applications Using SOL

Analyzing TCP Throughput Stability and Predictability with Implications for Adaptive Video Streaming

DDA: Cross-Session Throughput Prediction with Applications to Video Bitrate Selection

Scalable Testing of Context-Dependent Policies over Stateful Data Planes with Armstrong

A Framework to Quantify the Benefits of Network Functions Virtualization in Cellular Networks

Stratos: A Network-Aware Orchestration Layer for Virtual Middleboxes in Clouds

Evolution of Social-Attribute Networks: Measurements, Modeling, and Implications using Google+