Source author record

Ymir Vigfusson

Ymir Vigfusson appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Distributed, Parallel, and Cluster Computing Networking and Internet Architecture Social and Information Networks

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

The Benefit of Hindsight: Tracing Edge-Cases in Distributed Systems

Today's distributed tracing frameworks are ill-equipped to troubleshoot rare edge-case requests. The crux of the problem is a trade-off between specificity and overhead. On the one hand, frameworks can indiscriminately select requests to trace when they enter the system (head sampling), but this is unlikely to capture a relevant edge-case trace because the framework cannot know which requests will be problematic until after-the-fact. On the other hand, frameworks can trace everything and later keep only the interesting edge-case traces (tail sampling), but this has high overheads on the traced application and enormous data ingestion costs. In this paper we circumvent this trade-off for any edge-case with symptoms that can be programmatically detected, such as high tail latency, errors, and bottlenecked queues. We propose a lightweight and always-on distributed tracing system, Hindsight, which implements a retroactive sampling abstraction: instead of eagerly ingesting and processing traces, Hindsight lazily retrieves trace data only after symptoms of a problem are detected. Hindsight is analogous to a car dash-cam that, upon detecting a sudden jolt in momentum, persists the last hour of footage. Developers using Hindsight receive the exact edge-case traces they desire without undue overhead or dependence on luck. Our evaluation shows that Hindsight scales to millions of requests per second, adds nanosecond-level overhead to generate trace data, handles GB/s of data per node, transparently integrates with existing distributed tracing systems, and successfully persists full, detailed traces in real-world use cases when edge-case problems are detected.

preprint2020arXiv

Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining Representations for Cyberbullying Classification

Cyberbullying is a pervasive problem in online communities. To identify cyberbullying cases in large-scale social networks, content moderators depend on machine learning classifiers for automatic cyberbullying detection. However, existing models remain unfit for real-world applications, largely due to a shortage of publicly available training data and a lack of standard criteria for assigning ground truth labels. In this study, we address the need for reliable data using an original annotation framework. Inspired by social sciences research into bullying behavior, we characterize the nuanced problem of cyberbullying using five explicit factors to represent its social and linguistic aspects. We model this behavior using social network and language-based features, which improve classifier performance. These results demonstrate the importance of representing and modeling cyberbullying as a social phenomenon.

preprint2014arXiv

Wireless Scheduling Algorithms in Complex Environments

Efficient spectrum use in wireless sensor networks through spatial reuse requires effective models of packet reception at the physical layer in the presence of interference. Despite recent progress in analytic and simulations research into worst-case behavior from interference effects, these efforts generally assume geometric path loss and isotropic transmission, assumptions which have not been borne out in experiments. Our paper aims to provide a methodology for grounding theoretical results into wireless interference in experimental reality. We develop a new framework for wireless algorithms in which distance-based path loss is replaced by an arbitrary gain matrix, typically obtained by measurements of received signal strength (RSS). Gain matrices allow for the modeling of complex environments, e.g., with obstacles and walls. We experimentally evaluate the framework in two indoors testbeds with 20 and 60 motes, and confirm superior predictive performance in packet reception rate for a gain matrix model over a geometric distance-based model. At the heart of our approach is a new parameter $ζ$ called metricity which indicates how close the gain matrix is to a distance metric, effectively measuring the complexity of the environment. A powerful theoretical feature of this parameter is that all known SINR scheduling algorithms that work in general metric spaces carry over to arbitrary gain matrices and achieve equivalent performance guarantees in terms of $ζ$ as previously obtained in terms of the path loss constant. Our experiments confirm the sensitivity of $ζ$ to the nature of the environment. Finally, we show analytically and empirically how multiple channels can be leveraged to improve metricity and thereby performance. We believe our contributions will facilitate experimental validation for recent advances in algorithms for physical wireless interference models.