Source author record

Jonathan Mace

Jonathan Mace appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Cryptography and Security

Catalog footprint

What is connected

3works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

ACT now: Aggregate Comparison of Traces for Incident Localization

Incidents in production systems are common and downtime is expensive. Applying an appropriate mitigating action quickly, such as changing a specific firewall rule, reverting a change, or diverting traffic to a different availability zone, saves money. Incident localization is time-consuming since a single failure can have many effects, extending far from the site of failure. Knowing how different system events relate to each other is necessary to quickly identify \emph{where} to mitigate. Our approach, Aggregate Comparison of Traces (ACT), localizes incidents by comparing sets of traces (which capture events and their relationships for individual requests) sampled from the most recent steady-state operation and during an incident. In our quantitative experiments, we show that ACT is able to effectively localize more than 99% of incidents.

preprint2022arXiv

Groundhog: Efficient Request Isolation in FaaS

Security is a core responsibility for Function-as-a-Service (FaaS) providers. The prevailing approach has each function execute in its own container to isolate concurrent executions of different functions. However, successive invocations of the same function commonly reuse the runtime state of a previous invocation in order to avoid container cold-start delays when invoking a function. Although efficient, this container reuse has security implications for functions that are invoked on behalf of differently privileged users or administrative domains: bugs in a function's implementation, third-party library, or the language runtime may leak private data from one invocation of the function to subsequent invocations of the same function. Groundhog isolates sequential invocations of a function by efficiently reverting to a clean state, free from any private data, after each invocation. The system exploits two properties of typical FaaS platforms: each container executes at most one function at a time and legitimate functions do not retain state across invocations. This enables Groundhog to efficiently snapshot and restore function state between invocations in a manner that is independent of the programming language/runtime and does not require any changes to existing functions, libraries, language runtimes, or OS kernels. We describe the design of Groundhog and its implementation in OpenWhisk, a popular production-grade open-source FaaS framework. On three existing benchmark suites, Groundhog isolates sequential invocations with modest overhead on end-to-end latency (median: 1.5%, 95p: 7%) and throughput (median: 2.5%, 95p: 49.6%), relative to an insecure baseline that reuses the container and runtime state.

preprint2022arXiv

The Benefit of Hindsight: Tracing Edge-Cases in Distributed Systems

Today's distributed tracing frameworks are ill-equipped to troubleshoot rare edge-case requests. The crux of the problem is a trade-off between specificity and overhead. On the one hand, frameworks can indiscriminately select requests to trace when they enter the system (head sampling), but this is unlikely to capture a relevant edge-case trace because the framework cannot know which requests will be problematic until after-the-fact. On the other hand, frameworks can trace everything and later keep only the interesting edge-case traces (tail sampling), but this has high overheads on the traced application and enormous data ingestion costs. In this paper we circumvent this trade-off for any edge-case with symptoms that can be programmatically detected, such as high tail latency, errors, and bottlenecked queues. We propose a lightweight and always-on distributed tracing system, Hindsight, which implements a retroactive sampling abstraction: instead of eagerly ingesting and processing traces, Hindsight lazily retrieves trace data only after symptoms of a problem are detected. Hindsight is analogous to a car dash-cam that, upon detecting a sudden jolt in momentum, persists the last hour of footage. Developers using Hindsight receive the exact edge-case traces they desire without undue overhead or dependence on luck. Our evaluation shows that Hindsight scales to millions of requests per second, adds nanosecond-level overhead to generate trace data, handles GB/s of data per node, transparently integrates with existing distributed tracing systems, and successfully persists full, detailed traces in real-world use cases when edge-case problems are detected.

Jonathan Mace

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

ACT now: Aggregate Comparison of Traces for Incident Localization

Groundhog: Efficient Request Isolation in FaaS

The Benefit of Hindsight: Tracing Edge-Cases in Distributed Systems