Source author record

Margo Seltzer

Margo Seltzer appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Machine Learning Artificial Intelligence Computation cs.CY Distributed, Parallel, and Cluster Computing Networking and Internet Architecture

Catalog footprint

What is connected

6works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Fast Sparse Decision Tree Optimization via Reference Ensembles

Sparse decision tree optimization has been one of the most fundamental problems in AI since its inception and is a challenge at the core of interpretable machine learning. Sparse decision tree optimization is computationally hard, and despite steady effort since the 1960's, breakthroughs have only been made on the problem within the past few years, primarily on the problem of finding optimal sparse decision trees. However, current state-of-the-art algorithms often require impractical amounts of computation time and memory to find optimal or near-optimal trees for some real-world datasets, particularly those having several continuous-valued features. Given that the search spaces of these decision tree optimization problems are massive, can we practically hope to find a sparse decision tree that competes in accuracy with a black box machine learning model? We address this problem via smart guessing strategies that can be applied to any optimal branch-and-bound-based decision tree algorithm. We show that by using these guesses, we can reduce the run time by multiple orders of magnitude, while providing bounds on how far the resulting trees can deviate from the black box's accuracy and expressive power. Our approach enables guesses about how to bin continuous features, the size of the tree, and lower bounds on the error for the optimal decision tree. Our experiments show that in many cases we can rapidly construct sparse decision trees that match the accuracy of black box models. To summarize: when you are having trouble optimizing, just guess.

preprint2022arXiv

Gridiron: A Technique for Augmenting Cloud Workloads with Network Bandwidth Requirements

Cloud applications use more than just server resources, they also require networking resources. We propose a new technique to model network bandwidth demand of networked cloud applications. Our technique, Gridiron, augments VM workload traces from Azure cloud with network bandwidth requirements. The key to the Gridiron technique is to derive inter-VM network bandwidth requirements using Amdahl's second law. As a case study, we use Gridiron to generate realistic traces with network bandwidth demands for a distributed machine learning training application. Workloads generated with Gridiron allow datacenter operators to estimate the network bandwidth demands of cloud applications and enable more realistic cloud resource scheduler evaluation.

preprint2020arXiv

The EuroSys 2020 Online Conference: Experience and lessons learned

The 15th European Conference on Computer Systems (EuroSys'20) was organized as a virtual (online) conference on April 27-30, 2020. The main EuroSys'20 track took place April 28-30, 2020, preceded by five workshops (EdgeSys'20, EuroDW'20, EuroSec'20, PaPoC'20, SPMA'20) on April 27, 2020. The decision to hold a virtual (online) conference was taken in early April 2020, after consultations with the EuroSys community and internal discussions about potential options, eventually allowing about three weeks for the organization. This paper describes the choices we made to organize EuroSys'20 as a virtual (online) conference, the challenges we addressed, and the lessons learned.

preprint2020arXiv

UNICORN: Runtime Provenance-Based Detector for Advanced Persistent Threats

Advanced Persistent Threats (APTs) are difficult to detect due to their "low-and-slow" attack patterns and frequent use of zero-day exploits. We present UNICORN, an anomaly-based APT detector that effectively leverages data provenance analysis. From modeling to detection, UNICORN tailors its design specifically for the unique characteristics of APTs. Through extensive yet time-efficient graph analysis, UNICORN explores provenance graphs that provide rich contextual and historical information to identify stealthy anomalous activities without pre-defined attack signatures. Using a graph sketching technique, it summarizes long-running system execution with space efficiency to combat slow-acting attacks that take place over a long time span. UNICORN further improves its detection capability using a novel modeling approach to understand long-term behavior as the system evolves. Our evaluation shows that UNICORN outperforms an existing state-of-the-art APT detection system and detects real-life APT scenarios with high accuracy.

preprint2020arXiv

Xanthus: Push-button Orchestration of Host Provenance Data Collection

Host-based anomaly detectors generate alarms by inspecting audit logs for suspicious behavior. Unfortunately, evaluating these anomaly detectors is hard. There are few high-quality, publicly-available audit logs, and there are no pre-existing frameworks that enable push-button creation of realistic system traces. To make trace generation easier, we created Xanthus, an automated tool that orchestrates virtual machines to generate realistic audit logs. Using Xanthus' simple management interface, administrators select a base VM image, configure a particular tracing framework to use within that VM, and define post-launch scripts that collect and save trace data. Once data collection is finished, Xanthus creates a self-describing archive, which contains the VM, its configuration parameters, and the collected trace data. We demonstrate that Xanthus hides many of the tedious (yet subtle) orchestration tasks that humans often get wrong; Xanthus avoids mistakes that lead to non-replicable experiments.

preprint2014arXiv

Accelerating MCMC via Parallel Predictive Prefetching

We present a general framework for accelerating a large class of widely used Markov chain Monte Carlo (MCMC) algorithms. Our approach exploits fast, iterative approximations to the target density to speculatively evaluate many potential future steps of the chain in parallel. The approach can accelerate computation of the target distribution of a Bayesian inference problem, without compromising exactness, by exploiting subsets of data. It takes advantage of whatever parallel resources are available, but produces results exactly equivalent to standard serial execution. In the initial burn-in phase of chain evaluation, it achieves speedup over serial evaluation that is close to linear in the number of available cores.

Margo Seltzer

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Fast Sparse Decision Tree Optimization via Reference Ensembles

Gridiron: A Technique for Augmenting Cloud Workloads with Network Bandwidth Requirements

The EuroSys 2020 Online Conference: Experience and lessons learned

UNICORN: Runtime Provenance-Based Detector for Advanced Persistent Threats

Xanthus: Push-button Orchestration of Host Provenance Data Collection

Accelerating MCMC via Parallel Predictive Prefetching