Source author record

Roger Wattenhofer

Roger Wattenhofer appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

52works

20topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Benchmarking Positional Encodings for GNNs and Graph Transformers

Positional Encodings (PEs) are essential for injecting structural information into Graph Neural Networks (GNNs), particularly Graph Transformers, yet their empirical impact remains insufficiently understood. We introduce a unified benchmarking framework that decouples PEs from architectural choices, enabling a fair comparison across 8 GNN and Transformer models, 9 PEs, and 10 synthetic and real-world datasets. Across more than 500 model-PE-dataset configurations, we find that commonly used expressiveness proxies, including Weisfeiler-Lehman distinguishability, do not reliably predict downstream performance. In particular, highly expressive PEs frequently fail to improve, and can even degrade performance on real-world tasks. At the same time, we identify several simple and previously overlooked model-PE combinations that match or outperform recent state-of-the-art methods. Our results demonstrate the strong task-dependence of PEs and underscore the need for empirical validation beyond theoretical expressiveness. To support reproducible research, we release an open-source benchmarking framework for evaluating PEs for graph learning tasks.

preprint2026arXiv

Broadcast in Almost Mixing Time

We study the problem of broadcasting multiple messages in the CONGEST model. In this problem, a dedicated source node $s$ possesses a set $M$ of messages with every message of size $O(\log n)$ where $n$ is the total number of nodes. The objective is to ensure that every node in the network learns all messages in $M$. The execution of an algorithm progresses in rounds, and we focus on optimizing the round complexity of broadcasting multiple messages. Our primary contribution is a randomized algorithm for networks with expander topology, which are widely used in practice for building scalable and robust distributed systems. The algorithm succeeds with high probability and achieves a round complexity that is optimal up to a factor of the network's mixing time and polylogarithmic terms. It leverages a multi-COBRA primitive, which uses multiple branching random walks running in parallel. To the best of our knowledge, this approach has not been applied in distributed algorithms before. A crucial aspect of our method is the use of these branching random walks to construct an optimal (up to a polylogarithmic factor) tree packing of a random graph, which is then used for efficient broadcasting. This result is of independent interest. We also prove the problem to be NP-hard in a centralized setting and provide insights into why straightforward lower bounds for general graphs, namely graph diameter and $\frac{|M|}{\textit{minCut}}$, cannot be tight.

preprint2026arXiv

From Message-Passing to Linearized Graph Sequence Models

Message-passing based approaches form the default backbone of most learning architectures on graph-structured data. However, the rapid progress of modern deep learning architectures in other domains, particularly sequence modeling, raises the question of how graph learning can benefit from these advances. We introduce Linearized Graph Sequence Models, a framework that recasts message-passing graph computation from the perspective of sequence modeling to simplify architectural choices. Our approach systematically separates the computational processing depth from the information propagation depth, allowing core graph architectural decisions to be treated as sequence modeling choices. Specifically, we analyze, both empirically and theoretically, what sequence properties make methods effective for learning and preserving the graph inductive bias. In particular, we validate our findings, demonstrating improved performance on long-range information tasks in graphs. Our findings provide a principled way to integrate modern sequence modeling advances into message-passing based graph learning. Beyond this, our work demonstrates how the separation of processing and information depth can recast central architectural questions as input modeling choices.

preprint2026arXiv

N-vium: Mixture-of-Exits Transformer for Accelerated Exact Generation

Improving the inference efficiency of autoregressive transformers typically means reducing FLOPs per token, usually through approximations that degrade model quality. We introduce N-vium, a mixture-of-exits transformer that partially parallelizes computation across depth on standard hardware, increasing effective FLOPs per second rather than minimizing compute per token. N-vium attaches prediction heads at multiple depths and defines the next-token distribution as a learned mixture over these exits, with token-adaptive routing. This formulation strictly generalizes the standard transformer, which is recovered exactly when routing assigns zero mass to all intermediate heads. Sampling from the mixture is exact, and complete KV caches are recovered by deferring the upper-layer computation and batching it with later tokens. We pretrain N-vium at scales up to 1.5B parameters. Our largest model reaches 57.9% wall-clock speedup over a parameter- and data-matched standard transformer at no perplexity cost.

preprint2026arXiv

WorldSpeech: A Multilingual Speech Corpus from Around the World

Automatic speech recognition (ASR) performs well for high-resource languages with abundant paired audio-transcript data, but its accuracy degrades sharply for most languages due to limited publicly available aligned data. To this end, we introduce WorldSpeech, a 24 kHz multilingual speech corpus comprising 65k hours of aligned audio-transcript data across 76 languages, collected from diverse public sources including parliamentary proceedings, international broadcasts, and public-domain audiobooks. For 37 languages, WorldSpeech provides more than 200 hours of aligned speech, with 28 exceeding 500 hours and 24 surpassing 1k hours. Fine-tuning existing ASR models on WorldSpeech results in an average relative Word-Error-Rate reduction of 63.5% across 11 typologically diverse languages.

preprint2024arXiv

Decentralized Federated Policy Gradient with Byzantine Fault-Tolerance and Provably Fast Convergence

In Federated Reinforcement Learning (FRL), agents aim to collaboratively learn a common task, while each agent is acting in its local environment without exchanging raw trajectories. Existing approaches for FRL either (a) do not provide any fault-tolerance guarantees (against misbehaving agents), or (b) rely on a trusted central agent (a single point of failure) for aggregating updates. We provide the first decentralized Byzantine fault-tolerant FRL method. Towards this end, we first propose a new centralized Byzantine fault-tolerant policy gradient (PG) algorithm that improves over existing methods by relying only on assumptions standard for non-fault-tolerant PG. Then, as our main contribution, we show how a combination of robust aggregation and Byzantine-resilient agreement methods can be leveraged in order to eliminate the need for a trusted central entity. Since our results represent the first sample complexity analysis for Byzantine fault-tolerant decentralized federated non-convex optimization, our technical contributions may be of independent interest. Finally, we corroborate our theoretical results experimentally for common RL environments, demonstrating the speed-up of decentralized federations w.r.t. the number of participating agents and resilience against various Byzantine attacks.

preprint2024arXiv

Provably Powerful Graph Neural Networks for Directed Multigraphs

This paper analyses a set of simple adaptations that transform standard message-passing Graph Neural Networks (GNN) into provably powerful directed multigraph neural networks. The adaptations include multigraph port numbering, ego IDs, and reverse message passing. We prove that the combination of these theoretically enables the detection of any directed subgraph pattern. To validate the effectiveness of our proposed adaptations in practice, we conduct experiments on synthetic subgraph detection tasks, which demonstrate outstanding performance with almost perfect results. Moreover, we apply our proposed adaptations to two financial crime analysis tasks. We observe dramatic improvements in detecting money laundering transactions, improving the minority-class F1 score of a standard message-passing GNN by up to 30%, and closely matching or outperforming tree-based and GNN baselines. Similarly impressive results are observed on a real-world phishing detection dataset, boosting three standard GNNs' F1 scores by around 15% and outperforming all baselines.

preprint2022arXiv

A Deep Learning Approach for the Segmentation of Electroencephalography Data in Eye Tracking Applications

The collection of eye gaze information provides a window into many critical aspects of human cognition, health and behaviour. Additionally, many neuroscientific studies complement the behavioural information gained from eye tracking with the high temporal resolution and neurophysiological markers provided by electroencephalography (EEG). One of the essential eye-tracking software processing steps is the segmentation of the continuous data stream into events relevant to eye-tracking applications, such as saccades, fixations, and blinks. Here, we introduce DETRtime, a novel framework for time-series segmentation that creates ocular event detectors that do not require additionally recorded eye-tracking modality and rely solely on EEG data. Our end-to-end deep learning-based framework brings recent advances in Computer Vision to the forefront of the times series segmentation of EEG data. DETRtime achieves state-of-the-art performance in ocular event detection across diverse eye-tracking experiment paradigms. In addition to that, we provide evidence that our model generalizes well in the task of EEG sleep stage segmentation.

preprint2022arXiv

A Theoretical Comparison of Graph Neural Network Extensions

We study and compare different Graph Neural Network extensions that increase the expressive power of GNNs beyond the Weisfeiler-Leman test. We focus on (i) GNNs based on higher order WL methods, (ii) GNNs that preprocess small substructures in the graph, (iii) GNNs that preprocess the graph up to a small radius, and (iv) GNNs that slightly perturb the graph to compute an embedding. We begin by presenting a simple improvement for this last extension that strictly increases the expressive power of this GNN variant. Then, as our main result, we compare the expressiveness of these extensions to each other through a series of example constructions that can be distinguished by one of the extensions, but not by another one. We also show negative examples that are particularly challenging for each of the extensions, and we prove several claims about the ability of these extensions to count cliques and cycles in the graph.

preprint2022arXiv

An Empirical Study of Market Inefficiencies in Uniswap and SushiSwap

Decentralized exchanges are revolutionizing finance. With their ever-growing increase in popularity, a natural question that begs to be asked is: how efficient are these new markets? We find that nearly 30% of analyzed trades are executed at an unfavorable rate. Additionally, we observe that, especially during the DeFi summer in 2020, price inaccuracies across the market plagued DEXes. Uniswap and SushiSwap, however, quickly adapt to their increased volumes. We see an increase in market efficiency with time during the observation period. Nonetheless, the DEXes still struggle to track the reference market when cryptocurrency prices are highly volatile. During such periods of high volatility, we observe the market becoming less efficient - manifested by an increased prevalence in cyclic arbitrage opportunities.

preprint2022arXiv

Analyzing Voting Power in Decentralized Governance: Who controls DAOs?

We empirically study the state of three prominent DAO governance systems on the Ethereum blockchain: Compound, Uniswap and ENS. In particular, we examine how the voting power is distributed in these systems. Using a comprehensive dataset of all governance token holders, delegates, proposals and votes, we analyze who holds the voting rights and how they are used to influence governance decisions.

preprint2022arXiv

Asynchronous Neural Networks for Learning in Graphs

This paper studies asynchronous message passing (AMP), a new paradigm for applying neural network based learning to graphs. Existing graph neural networks use the synchronous distributed computing model and aggregate their neighbors in each round, which causes problems such as oversmoothing and limits their expressiveness. On the other hand, AMP is based on the asynchronous model, where nodes react to messages of their neighbors individually. We prove that (i) AMP can simulate synchronous GNNs and that (ii) AMP can theoretically distinguish any pair of graphs. We experimentally validate AMP's expressiveness. Further, we show that AMP might be better suited to propagate messages over large distances in graphs and performs well on several graph classification benchmarks.

preprint2022arXiv

Better Incentives for Proof-of-Work

This work proposes a novel proof-of-work blockchain incentive scheme such that, barring exogenous motivations, following the protocol is guaranteed to be the optimal strategy for miners. Our blockchain takes the form of a directed acyclic graph, resulting in improvements with respect to throughput and speed. More importantly, for our blockchain to function, it is not expected that the miners conform to some presupposed protocol in the interest of the system's operability. Instead, our system works if miners act selfishly, trying to get the maximum possible rewards, with no consideration for the overall health of the blockchain.

preprint2022arXiv

Cyclic Arbitrage in Decentralized Exchanges

Decentralized Exchanges (DEXes) enable users to create markets for exchanging any pair of cryptocurrencies. The direct exchange rate of two tokens may not match the cross-exchange rate in the market, and such price discrepancies open up arbitrage possibilities with trading through different cryptocurrencies cyclically. In this paper, we conduct a systematic investigation on cyclic arbitrages in DEXes. We propose a theoretical framework for studying cyclic arbitrage. With our framework, we analyze the profitability conditions and optimal trading strategies of cyclic transactions. We further examine exploitable arbitrage opportunities and the market size of cyclic arbitrages with transaction-level data of Uniswap V2. We find that traders have executed 292,606 cyclic arbitrages over eleven months and exploited more than 138 million USD in revenue. However, the revenue of the most profitable unexploited opportunity is persistently higher than 1 ETH (4,000 USD), which indicates that DEX markets may not be efficient enough. By analyzing how traders implement cyclic arbitrages, we find that traders can utilize smart contracts to issue atomic transactions and the atomic implementations could mitigate users' financial loss in cyclic arbitrage from the price impact.

preprint2022arXiv

Deterministic Graph-Walking Program Mining

Owing to their versatility, graph structures admit representations of intricate relationships between the separate entities comprising the data. We formalise the notion of connection between two vertex sets in terms of edge and vertex features by introducing graph-walking programs. We give two algorithms for mining of deterministic graph-walking programs that yield programs in the order of increasing length. These programs characterise linear long-distance relationships between the given two vertex sets in the context of the whole graph.

preprint2022arXiv

DT+GNN: A Fully Explainable Graph Neural Network using Decision Trees

We propose the fully explainable Decision Tree Graph Neural Network (DT+GNN) architecture. In contrast to existing black-box GNNs and post-hoc explanation methods, the reasoning of DT+GNN can be inspected at every step. To achieve this, we first construct a differentiable GNN layer, which uses a categorical state space for nodes and messages. This allows us to convert the trained MLPs in the GNN into decision trees. These trees are pruned using our newly proposed method to ensure they are small and easy to interpret. We can also use the decision trees to compute traditional explanations. We demonstrate on both real-world datasets and synthetic GNN explainability benchmarks that this architecture works as well as traditional GNNs. Furthermore, we leverage the explainability of DT+GNNs to find interesting insights into many of these datasets, with some surprising results. We also provide an interactive web tool to inspect DT+GNN's decision making.

preprint2022arXiv

Eliminating Sandwich Attacks with the Help of Game Theory

Predatory trading bots lurking in Ethereum's mempool present invisible taxation of traders on automated market makers (AMMs). AMM traders specify a slippage tolerance to indicate the maximum price movement they are willing to accept. This way, traders avoid automatic transaction failure in case of small price movements before their trade request executes. However, while a too-small slippage tolerance may lead to trade failures, a too-large slippage tolerance allows predatory trading bots to profit from sandwich attacks. These bots can extract the difference between the slippage tolerance and the actual price movement as profit. In this work, we introduce the sandwich game to analyze sandwich attacks analytically from both the attacker and victim perspectives. Moreover, we provide a simple and highly effective algorithm that traders can use to set the slippage tolerance. We unveil that most broadcasted transactions can avoid sandwich attacks while simultaneously only experiencing a low risk of transaction failure. Thereby, we demonstrate that a constant auto-slippage cannot adjust to varying trade sizes and pool characteristics. Our algorithm outperforms the constant auto-slippage suggested by the biggest AMM, Uniswap, in all performed tests. Specifically, our algorithm repeatedly demonstrates a cost reduction exceeding a factor of 100.

preprint2022arXiv

Online Matching with Convex Delay Costs

We investigate the problem of Min-cost Perfect Matching with Delays (MPMD) in which requests are pairwise matched in an online fashion with the objective to minimize the sum of space cost and time cost. Though linear-MPMD (i.e., time cost is linear in delay) has been thoroughly studied in the literature, it does not well model impatient requests that are common in practice. Thus, we propose convex-MPMD where time cost functions are convex, capturing the situation where time cost increases faster and faster. Since the existing algorithms for linear-MPMD are not competitive any more, we devise a new deterministic algorithm for convex-MPMD problems. For a large class of convex time cost functions, our algorithm achieves a competitive ratio of $O(k)$ on any $k$-point uniform metric space, or $O(kΔ)$ when the metric space has aspect ratio $Δ$. Moreover, we prove lower bounds for the competitive ratio of deterministic and randomized algorithms, indicating that our deterministic algorithm is optimal. This optimality uncover a substantial difference between convex-MPMD and linear-MPMD, since linear-MPMD allows a deterministic algorithm with constant competitive ratio on any uniform metric space.

preprint2022arXiv

Self-Supervised Contrastive Learning with Adversarial Perturbations for Defending Word Substitution-based Attacks

In this paper, we present an approach to improve the robustness of BERT language models against word substitution-based adversarial attacks by leveraging adversarial perturbations for self-supervised contrastive learning. We create a word-level adversarial attack generating hard positives on-the-fly as adversarial examples during contrastive learning. In contrast to previous works, our method improves model robustness without using any labeled data. Experimental results show that our method improves robustness of BERT against four different word substitution-based adversarial attacks, and combining our method with adversarial training gives higher robustness than adversarial training alone. As our method improves the robustness of BERT purely with unlabeled data, it opens up the possibility of using large text datasets to train robust language models against word substitution-based adversarial attacks.

preprint2022arXiv

SPECTRE: Spectral Conditioning Helps to Overcome the Expressivity Limits of One-shot Graph Generators

We approach the graph generation problem from a spectral perspective by first generating the dominant parts of the graph Laplacian spectrum and then building a graph matching these eigenvalues and eigenvectors. Spectral conditioning allows for direct modeling of the global and local graph structure and helps to overcome the expressivity and mode collapse issues of one-shot graph generators. Our novel GAN, called SPECTRE, enables the one-shot generation of much larger graphs than previously possible with one-shot models. SPECTRE outperforms state-of-the-art deep autoregressive generators in terms of modeling fidelity, while also avoiding expensive sequential generation and dependence on node ordering. A case in point, in sizable synthetic and real-world graphs SPECTRE achieves a 4-to-170 fold improvement over the best competitor that does not overfit and is 23-to-30 times faster than autoregressive generators.

preprint2022arXiv

Understanding the Relationship Between Core Constraints and Core-Selecting Payment Rules in Combinatorial Auctions

Combinatorial auctions (CAs) allow bidders to express complex preferences for bundles of goods being auctioned. However, the behavior of bidders under different payment rules is often unclear. In this paper, we aim to understand how core constraints interact with different core-selecting payment rules. In particular, we examine the natural and desirable non-decreasing property of payment rules, which states that bidders cannot decrease their payments by increasing their bids. Previous work showed that, in general, the widely used VCG-nearest payment rule violates the non-decreasing property in single-minded CAs. We prove that under a single effective core constraint, the VCG-nearest payment rule is non-decreasing. In order to determine in which auctions single effective core constraints occur, we introduce a conflict graph representation of single-minded CAs and find sufficient conditions for the single effective core constraint in CAs. Finally, we study the consequences on the behavior of the bidders and show that no over-bidding exists in any Nash equilibrium for non-decreasing core-selecting payment rules.

preprint2022arXiv

Voting in Two-Crossing Elections

We introduce two-crossing elections as a generalization of single-crossing elections, showing a number of new results. First, we show that two-crossing elections can be recognized in polynomial time, by reduction to the well-studied consecutive ones problem. We also conjecture that recognizing $k$-crossing elections is NP-complete in general, providing evidence by relating to a problem similar to consecutive ones proven to be hard in the literature. Single-crossing elections exhibit a transitive majority relation, from which many important results follow. On the other hand, we show that the classical Debord-McGarvey theorem can still be proven two-crossing, implying that any weighted majority tournament is inducible by a two-crossing election. This shows that many voting rules are NP-hard under two-crossing elections, including Kemeny and Slater. This is in contrast to the single-crossing case and outlines an important complexity boundary between single- and two-crossing. Subsequently, we show that for two-crossing elections the Young scores of all candidates can be computed in polynomial time, by formulating a totally unimodular linear program. Finally, we consider the Chamberlin-Courant rule with arbitrary disutilities and show that a winning committee can be computed in polynomial time, using an approach based on dynamic programming.

preprint2021arXiv

Byzantine Agreement with Unknown Participants and Failures

A set of mutually distrusting participants that want to agree on a common opinion must solve an instance of a Byzantine agreement problem. These problems have been extensively studied in the literature. However, most of the existing solutions assume that the participants are aware of $n$ -- the total number of participants in the system -- and $f$ -- an upper bound on the number of Byzantine participants. In this paper, we show that most of the fundamental agreement problems can be solved without affecting resiliency even if the participants do not know the values of (possibly changing) $n$ and $f$. Specifically, we consider a synchronous system where the participants have unique but not necessarily consecutive identifiers, and give Byzantine agreement algorithms for reliable broadcast, approximate agreement, rotor-coordinator, early terminating consensus and total ordering in static and dynamic systems, all with the optimal resiliency of $n> 3f$. Moreover, we show that synchrony is necessary as an agreement with probabilistic termination is impossible in a semi-synchronous or asynchronous system if the participants are unaware of $n$ and $f$.

preprint2021arXiv

Telling BERT's full story: from Local Attention to Global Aggregation

We take a deep look into the behavior of self-attention heads in the transformer architecture. In light of recent work discouraging the use of attention distributions for explaining a model's behavior, we show that attention distributions can nevertheless provide insights into the local behavior of attention heads. This way, we propose a distinction between local patterns revealed by attention and global patterns that refer back to the input, and analyze BERT from both angles. We use gradient attribution to analyze how the output of an attention attention head depends on the input tokens, effectively extending the local attention-based analysis to account for the mixing of information throughout the transformer layers. We find that there is a significant discrepancy between attention and attribution distributions, caused by the mixing of context inside the model. We quantify this discrepancy and observe that interestingly, there are some patterns that persist across all layers despite the mixing.

preprint2021arXiv

Towards Robust Graph Contrastive Learning

We study the problem of adversarially robust self-supervised learning on graphs. In the contrastive learning framework, we introduce a new method that increases the adversarial robustness of the learned representations through i) adversarial transformations and ii) transformations that not only remove but also insert edges. We evaluate the learned representations in a preliminary set of experiments, obtaining promising results. We believe this work takes an important step towards incorporating robustness as a viable auxiliary task in graph contrastive learning.

preprint2020arXiv

A General Stabilization Bound for Influence Propagation in Graphs

We study the stabilization time of a wide class of processes on graphs, in which each node can only switch its state if it is motivated to do so by at least a $\frac{1+λ}{2}$ fraction of its neighbors, for some $0 < λ< 1$. Two examples of such processes are well-studied dynamically changing colorings in graphs: in majority processes, nodes switch to the most frequent color in their neighborhood, while in minority processes, nodes switch to the least frequent color in their neighborhood. We describe a non-elementary function $f(λ)$, and we show that in the sequential model, the worst-case stabilization time of these processes can completely be characterized by $f(λ)$. More precisely, we prove that for any $ε>0$, $O(n^{1+f(λ)+ε})$ is an upper bound on the stabilization time of any proportional majority/minority process, and we also show that there are graph constructions where stabilization indeed takes $Ω(n^{1+f(λ)-ε})$ steps.

preprint2020arXiv

ABC: Proof-of-Stake without Consensus

We introduce a new permissionless blockchain architecture called ABC. ABC is completely asynchronous, and does rely on neither randomness nor proof-of-work. ABC can be parallelized, and transactions have finality within one round trip of communication. However, ABC satisfies only a relaxed form of consensus by introducing a weaker termination property. Without full consensus, ABC cannot support certain applications, in particular ABC cannot support general smart contracts. However, many important applications do not need general smart contracts, and ABC is a better solution for these applications. In particular, ABC can implement the functionality of a cryptocurrency like Bitcoin, replacing Bitcoin's energy-hungry proof-of-work with a proof-of-stake validation.

preprint2020arXiv

Asynchronous Byzantine Agreement in Incomplete Networks [Technical Report]

The Byzantine agreement problem is considered to be a core problem in distributed systems. For example, Byzantine agreement is needed to build a blockchain, a totally ordered log of records. Blockchains are asynchronous distributed systems, fault-tolerant against Byzantine nodes. In the literature, the asynchronous byzantine agreement problem is studied in a fully connected network model where every node can directly send messages to every other node. This assumption is questionable in many real-world environments. In the reality, nodes might need to communicate by means of an incomplete network, and Byzantine nodes might not forward messages. Furthermore, Byzantine nodes might not behave correctly and, for example, corrupt messages. Therefore, in order to truly understand Byzantine Agreement, we need both ingredients: asynchrony and incomplete communication networks. In this paper, we study the asynchronous Byzantine agreement problem in incomplete networks. A classic result by Danny Dolev proved that in a distributed system with n nodes in the presence of f Byzantine nodes, the vertex connectivity of the system communication graph should be at least (2f+1). While Dolev's result was for synchronous deterministic systems, we demonstrate that the same bound also holds for asynchronous randomized systems. We show that the bound is tight by presenting a randomized algorithm, and a matching lower bound.

preprint2020arXiv

Brick: Asynchronous Payment Channels

Off-chain protocols (channels) are a promising solution to the scalability and privacy challenges of blockchain payments. Current proposals, however, require synchrony assumptions to preserve the safety of a channel, leaking to an adversary the exact amount of time needed to control the network for a successful attack. In this paper, we introduce Brick, the first payment channel that remains secure under network asynchrony and concurrently provides correct incentives. The core idea is to incorporate the conflict resolution process within the channel by introducing a rational committee of external parties, called Wardens. Hence, if a party wants to close a channel unilaterally, it can only get the committee's approval for the last valid state. Brick provides sub-second latency because it does not employ heavy-weight consensus. Instead, Brick uses consistent broadcast to announce updates and close the channel, a light-weight abstraction that is powerful enough to preserve safety and liveness to any rational parties. Furthermore, we consider permissioned blockchains, where the additional property of auditability might be desired for regulatory purposes. We introduce Brick+, an off-chain construction that provides auditability on top of Brick without conflicting with its privacy guarantees. We formally define the properties our payment channel construction should fulfill, and prove that both Brick and Brick+ satisfy them. We also design incentives for Brick such that honest and rational behavior aligns. Finally, we provide a reference implementation of the smart contracts in Solidity.

preprint2020arXiv

Directed Beam Search: Plug-and-Play Lexically Constrained Language Generation

Large pre-trained language models are capable of generating realistic text. However, controlling these models so that the generated text satisfies lexical constraints, i.e., contains specific words, is a challenging problem. Given that state-of-the-art language models are too large to be trained from scratch in a manageable time, it is desirable to control these models without re-training them. Methods capable of doing this are called plug-and-play. Recent plug-and-play methods have been successful in constraining small bidirectional language models as well as forward models in tasks with a restricted search space, e.g., machine translation. However, controlling large transformer-based models to meet lexical constraints without re-training them remains a challenge. In this work, we propose Directed Beam Search (DBS), a plug-and-play method for lexically constrained language generation. Our method can be applied to any language model, is easy to implement and can be used for general language generation. In our experiments we use DBS to control GPT-2. We demonstrate its performance on keyword-to-phrase generation and we obtain comparable results as a state-of-the-art non-plug-and-play model for lexically constrained story generation.

preprint2020arXiv

Medley2K: A Dataset of Medley Transitions

The automatic generation of medleys, i.e., musical pieces formed by different songs concatenated via smooth transitions, is not well studied in the current literature. To facilitate research on this topic, we make available a dataset called Medley2K that consists of 2,000 medleys and 7,712 labeled transitions. Our dataset features a rich variety of song transitions across different music genres. We provide a detailed description of this dataset and validate it by training a state-of-the-art generative model in the task of generating transitions between songs.

preprint2020arXiv

Network-Aware Strategies in Financial Systems

We study the incentives of banks in a financial network, where the network consists of debt contracts and credit default swaps (CDSs) between banks. One of the most important questions in such a system is the problem of deciding which of the banks are in default, and how much of their liabilities these banks can pay. We study the payoff and preferences of the banks in the different solutions to this problem. We also introduce a more refined model which allows assigning priorities to payment obligations; this provides a more expressive and realistic model of real-life financial systems, while it always ensures the existence of a solution. The main focus of the paper is an analysis of the actions that a single bank can execute in a financial system in order to influence the outcome to its advantage. We show that removing an incoming debt, or donating funds to another bank can result in a single new solution that is strictly more favorable to the acting bank. We also show that increasing the bank's external funds or modifying the priorities of outgoing payments cannot introduce a more favorable new solution into the system, but may allow the bank to remove some unfavorable solutions, or to increase its recovery rate. Finally, we show how the actions of two banks in a simple financial system can result in classical game theoretic situations like the prisoner's dilemma or the dollar auction, demonstrating the wide expressive capability of the financial system model.

preprint2020arXiv

Normalized Attention Without Probability Cage

Attention architectures are widely used; they recently gained renewed popularity with Transformers yielding a streak of state of the art results. Yet, the geometrical implications of softmax-attention remain largely unexplored. In this work we highlight the limitations of constraining attention weights to the probability simplex and the resulting convex hull of value vectors. We show that Transformers are sequence length dependent biased towards token isolation at initialization and contrast Transformers to simple max- and sum-pooling - two strong baselines rarely reported. We propose to replace the softmax in self-attention with normalization, yielding a hyperparameter and data-bias robust, generally applicable architecture. We support our insights with empirical results from more than 25,000 trained models. All results and implementations are made available.

preprint2020arXiv

On Identifiability in Transformers

In this paper we delve deep in the Transformer architecture by investigating two of its core components: self-attention and contextual embeddings. In particular, we study the identifiability of attention weights and token embeddings, and the aggregation of context into hidden tokens. We show that, for sequences longer than the attention head dimension, attention weights are not identifiable. We propose effective attention as a complementary tool for improving explanatory interpretations based on attention. Furthermore, we show that input tokens retain to a large degree their identity across the model. We also find evidence suggesting that identity information is mainly encoded in the angle of the embeddings and gradually decreases with depth. Finally, we demonstrate strong mixing of input information in the generation of contextual embeddings by means of a novel quantification method based on gradient attribution. Overall, we show that self-attention distributions are not directly interpretable and present tools to better understand and further investigate Transformer models.

preprint2020arXiv

On the Hardness of Red-Blue Pebble Games

Red-blue pebble games model the computation cost of a two-level memory hierarchy. We present various hardness results in different red-blue pebbling variants, with a focus on the oneshot model. We first study the relationship between previously introduced red-blue pebble models (base, oneshot, nodel). We also analyze a new variant (compcost) to obtain a more realistic model of computation. We then prove that red-blue pebbling is NP-hard in all of these model variants. Furthermore, we show that in the oneshot model, a $δ$-approximation algorithm for $δ<2$ is only possible if the unique games conjecture is false. Finally, we show that greedy algorithms are not good candidates for approximation, since they can return significantly worse solutions than the optimum.

preprint2020arXiv

Online Graph Exploration on a Restricted Graph Class: Optimal Solutions for Tadpole Graphs

We study the problem of online graph exploration on undirected graphs, where a searcher has to visit every vertex and return to the origin. Once a new vertex is visited, the searcher learns of all neighboring vertices and the connecting edge weights. The goal such an exploration is to minimize its total cost, where each edge traversal incurs a cost of the corresponding edge weight. We investigate the problem on tadpole graphs (also known as dragons, kites), which consist of a cycle with an attached path. Miyazaki et al. (The online graph exploration problem on restricted graphs, IEICE Transactions 92-D (9), 2009) showed that every online algorithm on these graphs must have a competitive ratio of 2-epsilon, but did not provide upper bounds for non-unit edge weights. We show via amortized analysis that a greedy approach yields a matching competitive ratio of 2 on tadpole graphs, for arbitrary non-negative edge weights.

preprint2016arXiv

CLEX: Yet Another Supercomputer Architecture?

We propose the CLEX supercomputer topology and routing scheme. We prove that CLEX can utilize a constant fraction of the total bandwidth for point-to-point communication, at delays proportional to the sum of the number of intermediate hops and the maximum physical distance between any two nodes. Moreover, % applying an asymmetric bandwidth assignment to the links, all-to-all communication can be realized $(1+o(1))$-optimally both with regard to bandwidth and delays. This is achieved at node degrees of $n^{\varepsilon}$, for an arbitrary small constant $\varepsilon\in (0,1]$. In contrast, these results are impossible in any network featuring constant or polylogarithmic node degrees. Through simulation, we assess the benefits of an implementation of the proposed communication strategy. Our results indicate that, for a million processors, CLEX can increase bandwidth utilization and reduce average routing path length by at least factors $10$ respectively $5$ in comparison to a torus network. Furthermore, the CLEX communication scheme features several other properties, such as deadlock-freedom, inherent fault-tolerance, and canonical partition into smaller subsystems.

preprint2016arXiv

Distributed Local Multi-Aggregation and Centrality Approximation

We study local aggregation and graph analysis in distributed environments using the message passing model. We provide a flexible framework, where each of the nodes in a set $S$--which is a subset of all nodes in the network--can perform a large range of common aggregation functions in its $k$-neighborhood. We study this problem in the CONGEST model, where in each synchronous round, every node can transmit a different (but short) message to each of its neighbors. While the $k$-neighborhoods of nodes in $S$ might overlap and aggregation could cause congestion in this model, we present an algorithm that needs time $O(|S|+k)$ even when each of the nodes in $S$ performs a different aggregation on its $k$-neighborhood. The framework is not restricted to aggregation-trees such that it can be used for more advanced graph analysis. We demonstrate this by providing efficient approximations of centrality measures and approximation of minimum routing cost trees.

preprint2016arXiv

Local Computation: Lower and Upper Bounds

The question of what can be computed, and how efficiently, are at the core of computer science. Not surprisingly, in distributed systems and networking research, an equally fundamental question is what can be computed in a \emph{distributed} fashion. More precisely, if nodes of a network must base their decision on information in their local neighborhood only, how well can they compute or approximate a global (optimization) problem? In this paper we give the first poly-logarithmic lower bound on such local computation for (optimization) problems including minimum vertex cover, minimum (connected) dominating set, maximum matching, maximal independent set, and maximal matching. In addition we present a new distributed algorithm for solving general covering and packing linear programs. For some problems this algorithm is tight with the lower bounds, for others it is a distributed approximation scheme. Together, our lower and upper bounds establish the local computability and approximability of a large class of problems, characterizing how much local information is required to solve these tasks.

preprint2016arXiv

Online Matching: Haste makes Waste!

This paper studies a new online problem, referred to as \emph{min-cost perfect matching with delays (MPMD)}, defined over a finite metric space (i.e., a complete graph with positive edge weights obeying the triangle inequality) $\mathcal{M}$ that is known to the algorithm in advance. Requests arrive in a continuous time online fashion at the points of $\mathcal{M}$ and should be served by matching them to each other. The algorithm is allowed to delay its request matching commitments, but this does not come for free: the total cost of the algorithm is the sum of metric distances between matched requests \emph{plus} the sum of times each request waited since it arrived until it was matched. A randomized online MPMD algorithm is presented whose competitive ratio is $O (\log^{2} n + \log Δ)$, where $n$ is the number of points in $\mathcal{M}$ and $Δ$ is its aspect ratio. The analysis is based on a machinery developed in the context of a new stochastic process that can be viewed as two interleaved Poisson processes; surprisingly, this new process captures precisely the behavior of our algorithm. A related problem in which the algorithm is allowed to clear any unmatched request at a fixed penalty is also addressed. It is suggested that the MPMD problem is merely the tip of the iceberg for a general framework of online problems with delayed service that captures many more natural problems.

preprint2016arXiv

Opening the Frey/Osborne Black Box: Which Tasks of a Job are Susceptible to Computerization?

In their seminal paper, Frey and Osborne quantified the automation of jobs, by assigning each job in the O*NET database a probability to be automated. In this paper, we refine their results in the following way: Every O*NET job consists of a set of tasks, and these tasks can be related. We use a linear program to assign probabilities to tasks, such that related tasks have a similar probability and the tasks can explain the computerization probability of a job. Analyzing jobs on the level of tasks helps comprehending the results, as experts as well as laymen can more easily criticize and refine what parts of a job are susceptible to computerization.

preprint2014arXiv

Bitcoin Meets Strong Consistency

The Bitcoin system only provides eventual consistency. For everyday life, the time to confirm a Bitcoin transaction is prohibitively slow. In this paper we propose a new system, built on the Bitcoin blockchain, which enables strong consistency. Our system, PeerCensus, acts as a certification authority, manages peer identities in a peer-to-peer network, and ultimately enhances Bitcoin and similar systems with strong consistency. Our extensive analysis shows that PeerCensus is in a secure state with high probability. We also show how Discoin, a Bitcoin variant that decouples block creation and transaction confirmation, can be built on top of PeerCensus, enabling real-time payments. Unlike Bitcoin, once transactions in Discoin are committed, they stay committed.

preprint2014arXiv

Bitcoin Transaction Malleability and MtGox

In Bitcoin, transaction malleability describes the fact that the signatures that prove the ownership of bitcoins being transferred in a transaction do not provide any integrity guarantee for the signatures themselves. This allows an attacker to mount a malleability attack in which it intercepts, modifies, and rebroadcasts a transaction, causing the transaction issuer to believe that the original transaction was not confirmed. In February 2014 MtGox, once the largest Bitcoin exchange, closed and filed for bankruptcy claiming that attackers used malleability attacks to drain its accounts. In this work we use traces of the Bitcoin network for over a year preceding the filing to show that, while the problem is real, there was no widespread use of malleability attacks before the closure of MtGox.

preprint2014arXiv

Distributed Approximation of Minimum Routing Cost Trees

We study the NP-hard problem of approximating a Minimum Routing Cost Spanning Tree in the message passing model with limited bandwidth (CONGEST model). In this problem one tries to find a spanning tree of a graph $G$ over $n$ nodes that minimizes the sum of distances between all pairs of nodes. In the considered model every node can transmit a different (but short) message to each of its neighbors in each synchronous round. We provide a randomized $(2+ε)$-approximation with runtime $O(D+\frac{\log n}ε)$ for unweighted graphs. Here, $D$ is the diameter of $G$. This improves over both, the (expected) approximation factor $O(\log n)$ and the runtime $O(D\log^2 n)$ of the best previously known algorithm. Due to stating our results in a very general way, we also derive an (optimal) runtime of $O(D)$ when considering $O(\log n)$-approximations as done by the best previously known algorithm. In addition we derive a deterministic $2$-approximation.

preprint2013arXiv

Ants: Mobile Finite State Machines

Consider the Ants Nearby Treasure Search (ANTS) problem introduced by Feinerman, Korman, Lotker, and Sereni (PODC 2012), where $n$ mobile agents, initially placed at the origin of an infinite grid, collaboratively search for an adversarially hidden treasure. In this paper, the model of Feinerman et al. is adapted such that the agents are controlled by a (randomized) finite state machine: they possess a constant-size memory and are able to communicate with each other through constant-size messages. Despite the restriction to constant-size memory, we show that their collaborative performance remains the same by presenting a distributed algorithm that matches a lower bound established by Feinerman et al. on the run-time of any ANTS algorithm.

preprint2013arXiv

The Power of Non-Uniform Wireless Power

We study a fundamental measure for wireless interference in the SINR model known as (weighted) inductive independence. This measure characterizes the effectiveness of using oblivious power --- when the power used by a transmitter only depends on the distance to the receiver --- as a mechanism for improving wireless capacity. We prove optimal bounds for inductive independence, implying a number of algorithmic applications. An algorithm is provided that achieves --- due to existing lower bounds --- capacity that is asymptotically best possible using oblivious power assignments. Improved approximation algorithms are provided for a number of problems for oblivious power and for power control, including distributed scheduling, connectivity, secondary spectrum auctions, and dynamic packet scheduling.

preprint2012arXiv

Algorithms for Wireless Capacity

In this paper we address two basic questions in wireless communication: First, how long does it take to schedule an arbitrary set of communication requests? Second, given a set of communication requests, how many of them can be scheduled concurrently? Our results are derived in an interference model with geometric path loss and consist of efficient algorithms that find a constant approximation for the second problem and a logarithmic approximation for the first problem. In addition, we analyze some important properties of the interference model and show that it is robust to various factors that can influence the signal attenuation. More specifically, we prove that as long as such influences on the signal attenuation are constant, they affect the capacity only by a constant factor.

preprint2012arXiv

Stone Age Distributed Computing

The traditional models of distributed computing focus mainly on networks of computer-like devices that can exchange large messages with their neighbors and perform arbitrary local computations. Recently, there is a trend to apply distributed computing methods to networks of sub-microprocessor devices, e.g., biological cellular networks or networks of nano-devices. However, the suitability of the traditional distributed computing models to these types of networks is questionable: do tiny bio/nano nodes "compute" and/or "communicate" essentially the same as a computer? In this paper, we introduce a new model that depicts a network of randomized finite state machines operating in an asynchronous environment. Although the computation and communication capabilities of each individual device in the new model are, by design, much weaker than those of a computer, we show that some of the most important and extensively studied distributed computing problems can still be solved efficiently.

preprint2012arXiv

The Price of Matching Selfish Vertices

We analyze the setting of minimum-cost perfect matchings with selfish vertices through the price of anarchy (PoA) and price of stability (PoS) lens. The underlying solution concept used for this analysis is the Gale-Shapley stable matching notion, where the preferences are determined so that each player (vertex) wishes to minimize the cost of her own matching edge.

preprint2011arXiv

Distributed Verification and Hardness of Distributed Approximation

We study the {\em verification} problem in distributed networks, stated as follows. Let $H$ be a subgraph of a network $G$ where each vertex of $G$ knows which edges incident on it are in $H$. We would like to verify whether $H$ has some properties, e.g., if it is a tree or if it is connected. We would like to perform this verification in a decentralized fashion via a distributed algorithm. The time complexity of verification is measured as the number of rounds of distributed communication. In this paper we initiate a systematic study of distributed verification, and give almost tight lower bounds on the running time of distributed verification algorithms for many fundamental problems such as connectivity, spanning connected subgraph, and $s-t$ cut verification. We then show applications of these results in deriving strong unconditional time lower bounds on the {\em hardness of distributed approximation} for many classical optimization problems including minimum spanning tree, shortest paths, and minimum cut. Many of these results are the first non-trivial lower bounds for both exact and approximate distributed computation and they resolve previous open questions. Moreover, our unconditional lower bound of approximating minimum spanning tree (MST) subsumes and improves upon the previous hardness of approximation bound of Elkin [STOC 2004] as well as the lower bound for (exact) MST computation of Peleg and Rubinovich [FOCS 1999]. Our result implies that there can be no distributed approximation algorithm for MST that is significantly faster than the current exact algorithm, for {\em any} approximation factor. Our lower bound proofs show an interesting connection between communication complexity and distributed computing which turns out to be useful in establishing the time complexity of exact and approximate distributed computation of many problems.

preprint2011arXiv

On the Windfall and Price of Friendship: Inoculation Strategies on Social Networks

This article investigates selfish behavior in games where players are embedded in a social context. A framework is presented which allows us to measure the Windfall of Friendship, i.e., how much players benefit (compared to purely selfish environments) if they care about the welfare of their friends in the social network graph. As a case study, a virus inoculation game is examined. We analyze the corresponding Nash equilibria and show that the Windfall of Friendship can never be negative. However, we find that if the valuation of a friend is independent of the total number of friends, the social welfare may not increase monotonically with the extent to which players care for each other; intriguingly, in the corresponding scenario where the relative importance of a friend declines, the Windfall is monotonic again. This article also studies convergence of best-response sequences. It turns out that in social networks, convergence times are typically higher and hence constitute a price of friendship. While such phenomena may be known on an anecdotal level, our framework allows us to quantify these effects analytically. Our formal insights on the worst case equilibria are complemented by simulations shedding light onto the structure of other equilibria.

preprint2011arXiv

Tight Bounds for Parallel Randomized Load Balancing

We explore the fundamental limits of distributed balls-into-bins algorithms. We present an adaptive symmetric algorithm that achieves a bin load of two in log* n+O(1) communication rounds using O(n) messages in total. Larger bin loads can be traded in for smaller time complexities. We prove a matching lower bound of (1-o(1))log* n on the time complexity of symmetric algorithms that guarantee small bin loads at an asymptotically optimal message complexity of O(n). For each assumption of the lower bound, we provide an algorithm violating it, in turn achieving a constant maximum bin load in constant time. As an application, we consider the following problem. Given a fully connected graph of n nodes, where each node needs to send and receive up to n messages, and in each round each node may send one message over each link, deliver all messages as quickly as possible to their destinations. We give a simple and robust algorithm of time complexity O(log* n) for this task and provide a generalization to the case where all nodes initially hold arbitrary sets of messages. A less practical algorithm terminates within asymptotically optimal O(1) rounds. All these bounds hold with high probability.

Roger Wattenhofer

What is connected

Connect this record

See the researcher in context

Building this map preview

52 published item(s)

Benchmarking Positional Encodings for GNNs and Graph Transformers

Broadcast in Almost Mixing Time

From Message-Passing to Linearized Graph Sequence Models

N-vium: Mixture-of-Exits Transformer for Accelerated Exact Generation

WorldSpeech: A Multilingual Speech Corpus from Around the World

Decentralized Federated Policy Gradient with Byzantine Fault-Tolerance and Provably Fast Convergence

Provably Powerful Graph Neural Networks for Directed Multigraphs

A Deep Learning Approach for the Segmentation of Electroencephalography Data in Eye Tracking Applications

A Theoretical Comparison of Graph Neural Network Extensions

An Empirical Study of Market Inefficiencies in Uniswap and SushiSwap

Analyzing Voting Power in Decentralized Governance: Who controls DAOs?

Asynchronous Neural Networks for Learning in Graphs

Better Incentives for Proof-of-Work

Cyclic Arbitrage in Decentralized Exchanges

Deterministic Graph-Walking Program Mining

DT+GNN: A Fully Explainable Graph Neural Network using Decision Trees

Eliminating Sandwich Attacks with the Help of Game Theory

Online Matching with Convex Delay Costs

Self-Supervised Contrastive Learning with Adversarial Perturbations for Defending Word Substitution-based Attacks

SPECTRE: Spectral Conditioning Helps to Overcome the Expressivity Limits of One-shot Graph Generators

Understanding the Relationship Between Core Constraints and Core-Selecting Payment Rules in Combinatorial Auctions

Voting in Two-Crossing Elections

Byzantine Agreement with Unknown Participants and Failures

Telling BERT's full story: from Local Attention to Global Aggregation

Towards Robust Graph Contrastive Learning

A General Stabilization Bound for Influence Propagation in Graphs

ABC: Proof-of-Stake without Consensus

Asynchronous Byzantine Agreement in Incomplete Networks [Technical Report]

Brick: Asynchronous Payment Channels

Directed Beam Search: Plug-and-Play Lexically Constrained Language Generation

Medley2K: A Dataset of Medley Transitions

Network-Aware Strategies in Financial Systems

Normalized Attention Without Probability Cage

On Identifiability in Transformers

On the Hardness of Red-Blue Pebble Games

Online Graph Exploration on a Restricted Graph Class: Optimal Solutions for Tadpole Graphs

CLEX: Yet Another Supercomputer Architecture?

Distributed Local Multi-Aggregation and Centrality Approximation

Local Computation: Lower and Upper Bounds

Online Matching: Haste makes Waste!

Opening the Frey/Osborne Black Box: Which Tasks of a Job are Susceptible to Computerization?

Bitcoin Meets Strong Consistency

Bitcoin Transaction Malleability and MtGox

Distributed Approximation of Minimum Routing Cost Trees

Ants: Mobile Finite State Machines

The Power of Non-Uniform Wireless Power

Algorithms for Wireless Capacity

Stone Age Distributed Computing

The Price of Matching Selfish Vertices

Distributed Verification and Hardness of Distributed Approximation

On the Windfall and Price of Friendship: Inoculation Strategies on Social Networks

Tight Bounds for Parallel Randomized Load Balancing