Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
36works
0followers
18topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

36 published item(s)

preprint2026arXiv

Benchmarking Positional Encodings for GNNs and Graph Transformers

Positional Encodings (PEs) are essential for injecting structural information into Graph Neural Networks (GNNs), particularly Graph Transformers, yet their empirical impact remains insufficiently understood. We introduce a unified benchmarking framework that decouples PEs from architectural choices, enabling a fair comparison across 8 GNN and Transformer models, 9 PEs, and 10 synthetic and real-world datasets. Across more than 500 model-PE-dataset configurations, we find that commonly used expressiveness proxies, including Weisfeiler-Lehman distinguishability, do not reliably predict downstream performance. In particular, highly expressive PEs frequently fail to improve, and can even degrade performance on real-world tasks. At the same time, we identify several simple and previously overlooked model-PE combinations that match or outperform recent state-of-the-art methods. Our results demonstrate the strong task-dependence of PEs and underscore the need for empirical validation beyond theoretical expressiveness. To support reproducible research, we release an open-source benchmarking framework for evaluating PEs for graph learning tasks.

preprint2026arXiv

Broadcast in Almost Mixing Time

We study the problem of broadcasting multiple messages in the CONGEST model. In this problem, a dedicated source node $s$ possesses a set $M$ of messages with every message of size $O(\log n)$ where $n$ is the total number of nodes. The objective is to ensure that every node in the network learns all messages in $M$. The execution of an algorithm progresses in rounds, and we focus on optimizing the round complexity of broadcasting multiple messages. Our primary contribution is a randomized algorithm for networks with expander topology, which are widely used in practice for building scalable and robust distributed systems. The algorithm succeeds with high probability and achieves a round complexity that is optimal up to a factor of the network's mixing time and polylogarithmic terms. It leverages a multi-COBRA primitive, which uses multiple branching random walks running in parallel. To the best of our knowledge, this approach has not been applied in distributed algorithms before. A crucial aspect of our method is the use of these branching random walks to construct an optimal (up to a polylogarithmic factor) tree packing of a random graph, which is then used for efficient broadcasting. This result is of independent interest. We also prove the problem to be NP-hard in a centralized setting and provide insights into why straightforward lower bounds for general graphs, namely graph diameter and $\frac{|M|}{\textit{minCut}}$, cannot be tight.

preprint2026arXiv

From Message-Passing to Linearized Graph Sequence Models

Message-passing based approaches form the default backbone of most learning architectures on graph-structured data. However, the rapid progress of modern deep learning architectures in other domains, particularly sequence modeling, raises the question of how graph learning can benefit from these advances. We introduce Linearized Graph Sequence Models, a framework that recasts message-passing graph computation from the perspective of sequence modeling to simplify architectural choices. Our approach systematically separates the computational processing depth from the information propagation depth, allowing core graph architectural decisions to be treated as sequence modeling choices. Specifically, we analyze, both empirically and theoretically, what sequence properties make methods effective for learning and preserving the graph inductive bias. In particular, we validate our findings, demonstrating improved performance on long-range information tasks in graphs. Our findings provide a principled way to integrate modern sequence modeling advances into message-passing based graph learning. Beyond this, our work demonstrates how the separation of processing and information depth can recast central architectural questions as input modeling choices.

preprint2026arXiv

N-vium: Mixture-of-Exits Transformer for Accelerated Exact Generation

Improving the inference efficiency of autoregressive transformers typically means reducing FLOPs per token, usually through approximations that degrade model quality. We introduce N-vium, a mixture-of-exits transformer that partially parallelizes computation across depth on standard hardware, increasing effective FLOPs per second rather than minimizing compute per token. N-vium attaches prediction heads at multiple depths and defines the next-token distribution as a learned mixture over these exits, with token-adaptive routing. This formulation strictly generalizes the standard transformer, which is recovered exactly when routing assigns zero mass to all intermediate heads. Sampling from the mixture is exact, and complete KV caches are recovered by deferring the upper-layer computation and batching it with later tokens. We pretrain N-vium at scales up to 1.5B parameters. Our largest model reaches 57.9% wall-clock speedup over a parameter- and data-matched standard transformer at no perplexity cost.

preprint2026arXiv

WorldSpeech: A Multilingual Speech Corpus from Around the World

Automatic speech recognition (ASR) performs well for high-resource languages with abundant paired audio-transcript data, but its accuracy degrades sharply for most languages due to limited publicly available aligned data. To this end, we introduce WorldSpeech, a 24 kHz multilingual speech corpus comprising 65k hours of aligned audio-transcript data across 76 languages, collected from diverse public sources including parliamentary proceedings, international broadcasts, and public-domain audiobooks. For 37 languages, WorldSpeech provides more than 200 hours of aligned speech, with 28 exceeding 500 hours and 24 surpassing 1k hours. Fine-tuning existing ASR models on WorldSpeech results in an average relative Word-Error-Rate reduction of 63.5% across 11 typologically diverse languages.

preprint2024arXiv

Decentralized Federated Policy Gradient with Byzantine Fault-Tolerance and Provably Fast Convergence

In Federated Reinforcement Learning (FRL), agents aim to collaboratively learn a common task, while each agent is acting in its local environment without exchanging raw trajectories. Existing approaches for FRL either (a) do not provide any fault-tolerance guarantees (against misbehaving agents), or (b) rely on a trusted central agent (a single point of failure) for aggregating updates. We provide the first decentralized Byzantine fault-tolerant FRL method. Towards this end, we first propose a new centralized Byzantine fault-tolerant policy gradient (PG) algorithm that improves over existing methods by relying only on assumptions standard for non-fault-tolerant PG. Then, as our main contribution, we show how a combination of robust aggregation and Byzantine-resilient agreement methods can be leveraged in order to eliminate the need for a trusted central entity. Since our results represent the first sample complexity analysis for Byzantine fault-tolerant decentralized federated non-convex optimization, our technical contributions may be of independent interest. Finally, we corroborate our theoretical results experimentally for common RL environments, demonstrating the speed-up of decentralized federations w.r.t. the number of participating agents and resilience against various Byzantine attacks.

preprint2024arXiv

Provably Powerful Graph Neural Networks for Directed Multigraphs

This paper analyses a set of simple adaptations that transform standard message-passing Graph Neural Networks (GNN) into provably powerful directed multigraph neural networks. The adaptations include multigraph port numbering, ego IDs, and reverse message passing. We prove that the combination of these theoretically enables the detection of any directed subgraph pattern. To validate the effectiveness of our proposed adaptations in practice, we conduct experiments on synthetic subgraph detection tasks, which demonstrate outstanding performance with almost perfect results. Moreover, we apply our proposed adaptations to two financial crime analysis tasks. We observe dramatic improvements in detecting money laundering transactions, improving the minority-class F1 score of a standard message-passing GNN by up to 30%, and closely matching or outperforming tree-based and GNN baselines. Similarly impressive results are observed on a real-world phishing detection dataset, boosting three standard GNNs' F1 scores by around 15% and outperforming all baselines.

preprint2022arXiv

A Deep Learning Approach for the Segmentation of Electroencephalography Data in Eye Tracking Applications

The collection of eye gaze information provides a window into many critical aspects of human cognition, health and behaviour. Additionally, many neuroscientific studies complement the behavioural information gained from eye tracking with the high temporal resolution and neurophysiological markers provided by electroencephalography (EEG). One of the essential eye-tracking software processing steps is the segmentation of the continuous data stream into events relevant to eye-tracking applications, such as saccades, fixations, and blinks. Here, we introduce DETRtime, a novel framework for time-series segmentation that creates ocular event detectors that do not require additionally recorded eye-tracking modality and rely solely on EEG data. Our end-to-end deep learning-based framework brings recent advances in Computer Vision to the forefront of the times series segmentation of EEG data. DETRtime achieves state-of-the-art performance in ocular event detection across diverse eye-tracking experiment paradigms. In addition to that, we provide evidence that our model generalizes well in the task of EEG sleep stage segmentation.

preprint2022arXiv

A Theoretical Comparison of Graph Neural Network Extensions

We study and compare different Graph Neural Network extensions that increase the expressive power of GNNs beyond the Weisfeiler-Leman test. We focus on (i) GNNs based on higher order WL methods, (ii) GNNs that preprocess small substructures in the graph, (iii) GNNs that preprocess the graph up to a small radius, and (iv) GNNs that slightly perturb the graph to compute an embedding. We begin by presenting a simple improvement for this last extension that strictly increases the expressive power of this GNN variant. Then, as our main result, we compare the expressiveness of these extensions to each other through a series of example constructions that can be distinguished by one of the extensions, but not by another one. We also show negative examples that are particularly challenging for each of the extensions, and we prove several claims about the ability of these extensions to count cliques and cycles in the graph.

preprint2022arXiv

An Empirical Study of Market Inefficiencies in Uniswap and SushiSwap

Decentralized exchanges are revolutionizing finance. With their ever-growing increase in popularity, a natural question that begs to be asked is: how efficient are these new markets? We find that nearly 30% of analyzed trades are executed at an unfavorable rate. Additionally, we observe that, especially during the DeFi summer in 2020, price inaccuracies across the market plagued DEXes. Uniswap and SushiSwap, however, quickly adapt to their increased volumes. We see an increase in market efficiency with time during the observation period. Nonetheless, the DEXes still struggle to track the reference market when cryptocurrency prices are highly volatile. During such periods of high volatility, we observe the market becoming less efficient - manifested by an increased prevalence in cyclic arbitrage opportunities.

preprint2022arXiv

Analyzing Voting Power in Decentralized Governance: Who controls DAOs?

We empirically study the state of three prominent DAO governance systems on the Ethereum blockchain: Compound, Uniswap and ENS. In particular, we examine how the voting power is distributed in these systems. Using a comprehensive dataset of all governance token holders, delegates, proposals and votes, we analyze who holds the voting rights and how they are used to influence governance decisions.

preprint2022arXiv

Asynchronous Neural Networks for Learning in Graphs

This paper studies asynchronous message passing (AMP), a new paradigm for applying neural network based learning to graphs. Existing graph neural networks use the synchronous distributed computing model and aggregate their neighbors in each round, which causes problems such as oversmoothing and limits their expressiveness. On the other hand, AMP is based on the asynchronous model, where nodes react to messages of their neighbors individually. We prove that (i) AMP can simulate synchronous GNNs and that (ii) AMP can theoretically distinguish any pair of graphs. We experimentally validate AMP's expressiveness. Further, we show that AMP might be better suited to propagate messages over large distances in graphs and performs well on several graph classification benchmarks.

preprint2022arXiv

Better Incentives for Proof-of-Work

This work proposes a novel proof-of-work blockchain incentive scheme such that, barring exogenous motivations, following the protocol is guaranteed to be the optimal strategy for miners. Our blockchain takes the form of a directed acyclic graph, resulting in improvements with respect to throughput and speed. More importantly, for our blockchain to function, it is not expected that the miners conform to some presupposed protocol in the interest of the system's operability. Instead, our system works if miners act selfishly, trying to get the maximum possible rewards, with no consideration for the overall health of the blockchain.

preprint2022arXiv

Cyclic Arbitrage in Decentralized Exchanges

Decentralized Exchanges (DEXes) enable users to create markets for exchanging any pair of cryptocurrencies. The direct exchange rate of two tokens may not match the cross-exchange rate in the market, and such price discrepancies open up arbitrage possibilities with trading through different cryptocurrencies cyclically. In this paper, we conduct a systematic investigation on cyclic arbitrages in DEXes. We propose a theoretical framework for studying cyclic arbitrage. With our framework, we analyze the profitability conditions and optimal trading strategies of cyclic transactions. We further examine exploitable arbitrage opportunities and the market size of cyclic arbitrages with transaction-level data of Uniswap V2. We find that traders have executed 292,606 cyclic arbitrages over eleven months and exploited more than 138 million USD in revenue. However, the revenue of the most profitable unexploited opportunity is persistently higher than 1 ETH (4,000 USD), which indicates that DEX markets may not be efficient enough. By analyzing how traders implement cyclic arbitrages, we find that traders can utilize smart contracts to issue atomic transactions and the atomic implementations could mitigate users' financial loss in cyclic arbitrage from the price impact.

preprint2022arXiv

Deterministic Graph-Walking Program Mining

Owing to their versatility, graph structures admit representations of intricate relationships between the separate entities comprising the data. We formalise the notion of connection between two vertex sets in terms of edge and vertex features by introducing graph-walking programs. We give two algorithms for mining of deterministic graph-walking programs that yield programs in the order of increasing length. These programs characterise linear long-distance relationships between the given two vertex sets in the context of the whole graph.

preprint2022arXiv

DT+GNN: A Fully Explainable Graph Neural Network using Decision Trees

We propose the fully explainable Decision Tree Graph Neural Network (DT+GNN) architecture. In contrast to existing black-box GNNs and post-hoc explanation methods, the reasoning of DT+GNN can be inspected at every step. To achieve this, we first construct a differentiable GNN layer, which uses a categorical state space for nodes and messages. This allows us to convert the trained MLPs in the GNN into decision trees. These trees are pruned using our newly proposed method to ensure they are small and easy to interpret. We can also use the decision trees to compute traditional explanations. We demonstrate on both real-world datasets and synthetic GNN explainability benchmarks that this architecture works as well as traditional GNNs. Furthermore, we leverage the explainability of DT+GNNs to find interesting insights into many of these datasets, with some surprising results. We also provide an interactive web tool to inspect DT+GNN's decision making.

preprint2022arXiv

Eliminating Sandwich Attacks with the Help of Game Theory

Predatory trading bots lurking in Ethereum's mempool present invisible taxation of traders on automated market makers (AMMs). AMM traders specify a slippage tolerance to indicate the maximum price movement they are willing to accept. This way, traders avoid automatic transaction failure in case of small price movements before their trade request executes. However, while a too-small slippage tolerance may lead to trade failures, a too-large slippage tolerance allows predatory trading bots to profit from sandwich attacks. These bots can extract the difference between the slippage tolerance and the actual price movement as profit. In this work, we introduce the sandwich game to analyze sandwich attacks analytically from both the attacker and victim perspectives. Moreover, we provide a simple and highly effective algorithm that traders can use to set the slippage tolerance. We unveil that most broadcasted transactions can avoid sandwich attacks while simultaneously only experiencing a low risk of transaction failure. Thereby, we demonstrate that a constant auto-slippage cannot adjust to varying trade sizes and pool characteristics. Our algorithm outperforms the constant auto-slippage suggested by the biggest AMM, Uniswap, in all performed tests. Specifically, our algorithm repeatedly demonstrates a cost reduction exceeding a factor of 100.

preprint2022arXiv

Online Matching with Convex Delay Costs

We investigate the problem of Min-cost Perfect Matching with Delays (MPMD) in which requests are pairwise matched in an online fashion with the objective to minimize the sum of space cost and time cost. Though linear-MPMD (i.e., time cost is linear in delay) has been thoroughly studied in the literature, it does not well model impatient requests that are common in practice. Thus, we propose convex-MPMD where time cost functions are convex, capturing the situation where time cost increases faster and faster. Since the existing algorithms for linear-MPMD are not competitive any more, we devise a new deterministic algorithm for convex-MPMD problems. For a large class of convex time cost functions, our algorithm achieves a competitive ratio of $O(k)$ on any $k$-point uniform metric space, or $O(kΔ)$ when the metric space has aspect ratio $Δ$. Moreover, we prove lower bounds for the competitive ratio of deterministic and randomized algorithms, indicating that our deterministic algorithm is optimal. This optimality uncover a substantial difference between convex-MPMD and linear-MPMD, since linear-MPMD allows a deterministic algorithm with constant competitive ratio on any uniform metric space.

preprint2022arXiv

Self-Supervised Contrastive Learning with Adversarial Perturbations for Defending Word Substitution-based Attacks

In this paper, we present an approach to improve the robustness of BERT language models against word substitution-based adversarial attacks by leveraging adversarial perturbations for self-supervised contrastive learning. We create a word-level adversarial attack generating hard positives on-the-fly as adversarial examples during contrastive learning. In contrast to previous works, our method improves model robustness without using any labeled data. Experimental results show that our method improves robustness of BERT against four different word substitution-based adversarial attacks, and combining our method with adversarial training gives higher robustness than adversarial training alone. As our method improves the robustness of BERT purely with unlabeled data, it opens up the possibility of using large text datasets to train robust language models against word substitution-based adversarial attacks.

preprint2022arXiv

SPECTRE: Spectral Conditioning Helps to Overcome the Expressivity Limits of One-shot Graph Generators

We approach the graph generation problem from a spectral perspective by first generating the dominant parts of the graph Laplacian spectrum and then building a graph matching these eigenvalues and eigenvectors. Spectral conditioning allows for direct modeling of the global and local graph structure and helps to overcome the expressivity and mode collapse issues of one-shot graph generators. Our novel GAN, called SPECTRE, enables the one-shot generation of much larger graphs than previously possible with one-shot models. SPECTRE outperforms state-of-the-art deep autoregressive generators in terms of modeling fidelity, while also avoiding expensive sequential generation and dependence on node ordering. A case in point, in sizable synthetic and real-world graphs SPECTRE achieves a 4-to-170 fold improvement over the best competitor that does not overfit and is 23-to-30 times faster than autoregressive generators.

preprint2022arXiv

Understanding the Relationship Between Core Constraints and Core-Selecting Payment Rules in Combinatorial Auctions

Combinatorial auctions (CAs) allow bidders to express complex preferences for bundles of goods being auctioned. However, the behavior of bidders under different payment rules is often unclear. In this paper, we aim to understand how core constraints interact with different core-selecting payment rules. In particular, we examine the natural and desirable non-decreasing property of payment rules, which states that bidders cannot decrease their payments by increasing their bids. Previous work showed that, in general, the widely used VCG-nearest payment rule violates the non-decreasing property in single-minded CAs. We prove that under a single effective core constraint, the VCG-nearest payment rule is non-decreasing. In order to determine in which auctions single effective core constraints occur, we introduce a conflict graph representation of single-minded CAs and find sufficient conditions for the single effective core constraint in CAs. Finally, we study the consequences on the behavior of the bidders and show that no over-bidding exists in any Nash equilibrium for non-decreasing core-selecting payment rules.

preprint2022arXiv

Voting in Two-Crossing Elections

We introduce two-crossing elections as a generalization of single-crossing elections, showing a number of new results. First, we show that two-crossing elections can be recognized in polynomial time, by reduction to the well-studied consecutive ones problem. We also conjecture that recognizing $k$-crossing elections is NP-complete in general, providing evidence by relating to a problem similar to consecutive ones proven to be hard in the literature. Single-crossing elections exhibit a transitive majority relation, from which many important results follow. On the other hand, we show that the classical Debord-McGarvey theorem can still be proven two-crossing, implying that any weighted majority tournament is inducible by a two-crossing election. This shows that many voting rules are NP-hard under two-crossing elections, including Kemeny and Slater. This is in contrast to the single-crossing case and outlines an important complexity boundary between single- and two-crossing. Subsequently, we show that for two-crossing elections the Young scores of all candidates can be computed in polynomial time, by formulating a totally unimodular linear program. Finally, we consider the Chamberlin-Courant rule with arbitrary disutilities and show that a winning committee can be computed in polynomial time, using an approach based on dynamic programming.

preprint2021arXiv

Byzantine Agreement with Unknown Participants and Failures

A set of mutually distrusting participants that want to agree on a common opinion must solve an instance of a Byzantine agreement problem. These problems have been extensively studied in the literature. However, most of the existing solutions assume that the participants are aware of $n$ -- the total number of participants in the system -- and $f$ -- an upper bound on the number of Byzantine participants. In this paper, we show that most of the fundamental agreement problems can be solved without affecting resiliency even if the participants do not know the values of (possibly changing) $n$ and $f$. Specifically, we consider a synchronous system where the participants have unique but not necessarily consecutive identifiers, and give Byzantine agreement algorithms for reliable broadcast, approximate agreement, rotor-coordinator, early terminating consensus and total ordering in static and dynamic systems, all with the optimal resiliency of $n> 3f$. Moreover, we show that synchrony is necessary as an agreement with probabilistic termination is impossible in a semi-synchronous or asynchronous system if the participants are unaware of $n$ and $f$.

preprint2021arXiv

Telling BERT's full story: from Local Attention to Global Aggregation

We take a deep look into the behavior of self-attention heads in the transformer architecture. In light of recent work discouraging the use of attention distributions for explaining a model's behavior, we show that attention distributions can nevertheless provide insights into the local behavior of attention heads. This way, we propose a distinction between local patterns revealed by attention and global patterns that refer back to the input, and analyze BERT from both angles. We use gradient attribution to analyze how the output of an attention attention head depends on the input tokens, effectively extending the local attention-based analysis to account for the mixing of information throughout the transformer layers. We find that there is a significant discrepancy between attention and attribution distributions, caused by the mixing of context inside the model. We quantify this discrepancy and observe that interestingly, there are some patterns that persist across all layers despite the mixing.

preprint2021arXiv

Towards Robust Graph Contrastive Learning

We study the problem of adversarially robust self-supervised learning on graphs. In the contrastive learning framework, we introduce a new method that increases the adversarial robustness of the learned representations through i) adversarial transformations and ii) transformations that not only remove but also insert edges. We evaluate the learned representations in a preliminary set of experiments, obtaining promising results. We believe this work takes an important step towards incorporating robustness as a viable auxiliary task in graph contrastive learning.

preprint2020arXiv

A General Stabilization Bound for Influence Propagation in Graphs

We study the stabilization time of a wide class of processes on graphs, in which each node can only switch its state if it is motivated to do so by at least a $\frac{1+λ}{2}$ fraction of its neighbors, for some $0 < λ< 1$. Two examples of such processes are well-studied dynamically changing colorings in graphs: in majority processes, nodes switch to the most frequent color in their neighborhood, while in minority processes, nodes switch to the least frequent color in their neighborhood. We describe a non-elementary function $f(λ)$, and we show that in the sequential model, the worst-case stabilization time of these processes can completely be characterized by $f(λ)$. More precisely, we prove that for any $ε>0$, $O(n^{1+f(λ)+ε})$ is an upper bound on the stabilization time of any proportional majority/minority process, and we also show that there are graph constructions where stabilization indeed takes $Ω(n^{1+f(λ)-ε})$ steps.

preprint2020arXiv

ABC: Proof-of-Stake without Consensus

We introduce a new permissionless blockchain architecture called ABC. ABC is completely asynchronous, and does rely on neither randomness nor proof-of-work. ABC can be parallelized, and transactions have finality within one round trip of communication. However, ABC satisfies only a relaxed form of consensus by introducing a weaker termination property. Without full consensus, ABC cannot support certain applications, in particular ABC cannot support general smart contracts. However, many important applications do not need general smart contracts, and ABC is a better solution for these applications. In particular, ABC can implement the functionality of a cryptocurrency like Bitcoin, replacing Bitcoin&#39;s energy-hungry proof-of-work with a proof-of-stake validation.

preprint2020arXiv

Asynchronous Byzantine Agreement in Incomplete Networks [Technical Report]

The Byzantine agreement problem is considered to be a core problem in distributed systems. For example, Byzantine agreement is needed to build a blockchain, a totally ordered log of records. Blockchains are asynchronous distributed systems, fault-tolerant against Byzantine nodes. In the literature, the asynchronous byzantine agreement problem is studied in a fully connected network model where every node can directly send messages to every other node. This assumption is questionable in many real-world environments. In the reality, nodes might need to communicate by means of an incomplete network, and Byzantine nodes might not forward messages. Furthermore, Byzantine nodes might not behave correctly and, for example, corrupt messages. Therefore, in order to truly understand Byzantine Agreement, we need both ingredients: asynchrony and incomplete communication networks. In this paper, we study the asynchronous Byzantine agreement problem in incomplete networks. A classic result by Danny Dolev proved that in a distributed system with n nodes in the presence of f Byzantine nodes, the vertex connectivity of the system communication graph should be at least (2f+1). While Dolev&#39;s result was for synchronous deterministic systems, we demonstrate that the same bound also holds for asynchronous randomized systems. We show that the bound is tight by presenting a randomized algorithm, and a matching lower bound.

preprint2020arXiv

Brick: Asynchronous Payment Channels

Off-chain protocols (channels) are a promising solution to the scalability and privacy challenges of blockchain payments. Current proposals, however, require synchrony assumptions to preserve the safety of a channel, leaking to an adversary the exact amount of time needed to control the network for a successful attack. In this paper, we introduce Brick, the first payment channel that remains secure under network asynchrony and concurrently provides correct incentives. The core idea is to incorporate the conflict resolution process within the channel by introducing a rational committee of external parties, called Wardens. Hence, if a party wants to close a channel unilaterally, it can only get the committee&#39;s approval for the last valid state. Brick provides sub-second latency because it does not employ heavy-weight consensus. Instead, Brick uses consistent broadcast to announce updates and close the channel, a light-weight abstraction that is powerful enough to preserve safety and liveness to any rational parties. Furthermore, we consider permissioned blockchains, where the additional property of auditability might be desired for regulatory purposes. We introduce Brick+, an off-chain construction that provides auditability on top of Brick without conflicting with its privacy guarantees. We formally define the properties our payment channel construction should fulfill, and prove that both Brick and Brick+ satisfy them. We also design incentives for Brick such that honest and rational behavior aligns. Finally, we provide a reference implementation of the smart contracts in Solidity.

preprint2020arXiv

Directed Beam Search: Plug-and-Play Lexically Constrained Language Generation

Large pre-trained language models are capable of generating realistic text. However, controlling these models so that the generated text satisfies lexical constraints, i.e., contains specific words, is a challenging problem. Given that state-of-the-art language models are too large to be trained from scratch in a manageable time, it is desirable to control these models without re-training them. Methods capable of doing this are called plug-and-play. Recent plug-and-play methods have been successful in constraining small bidirectional language models as well as forward models in tasks with a restricted search space, e.g., machine translation. However, controlling large transformer-based models to meet lexical constraints without re-training them remains a challenge. In this work, we propose Directed Beam Search (DBS), a plug-and-play method for lexically constrained language generation. Our method can be applied to any language model, is easy to implement and can be used for general language generation. In our experiments we use DBS to control GPT-2. We demonstrate its performance on keyword-to-phrase generation and we obtain comparable results as a state-of-the-art non-plug-and-play model for lexically constrained story generation.

preprint2020arXiv

Medley2K: A Dataset of Medley Transitions

The automatic generation of medleys, i.e., musical pieces formed by different songs concatenated via smooth transitions, is not well studied in the current literature. To facilitate research on this topic, we make available a dataset called Medley2K that consists of 2,000 medleys and 7,712 labeled transitions. Our dataset features a rich variety of song transitions across different music genres. We provide a detailed description of this dataset and validate it by training a state-of-the-art generative model in the task of generating transitions between songs.

preprint2020arXiv

Network-Aware Strategies in Financial Systems

We study the incentives of banks in a financial network, where the network consists of debt contracts and credit default swaps (CDSs) between banks. One of the most important questions in such a system is the problem of deciding which of the banks are in default, and how much of their liabilities these banks can pay. We study the payoff and preferences of the banks in the different solutions to this problem. We also introduce a more refined model which allows assigning priorities to payment obligations; this provides a more expressive and realistic model of real-life financial systems, while it always ensures the existence of a solution. The main focus of the paper is an analysis of the actions that a single bank can execute in a financial system in order to influence the outcome to its advantage. We show that removing an incoming debt, or donating funds to another bank can result in a single new solution that is strictly more favorable to the acting bank. We also show that increasing the bank&#39;s external funds or modifying the priorities of outgoing payments cannot introduce a more favorable new solution into the system, but may allow the bank to remove some unfavorable solutions, or to increase its recovery rate. Finally, we show how the actions of two banks in a simple financial system can result in classical game theoretic situations like the prisoner&#39;s dilemma or the dollar auction, demonstrating the wide expressive capability of the financial system model.

preprint2020arXiv

Normalized Attention Without Probability Cage

Attention architectures are widely used; they recently gained renewed popularity with Transformers yielding a streak of state of the art results. Yet, the geometrical implications of softmax-attention remain largely unexplored. In this work we highlight the limitations of constraining attention weights to the probability simplex and the resulting convex hull of value vectors. We show that Transformers are sequence length dependent biased towards token isolation at initialization and contrast Transformers to simple max- and sum-pooling - two strong baselines rarely reported. We propose to replace the softmax in self-attention with normalization, yielding a hyperparameter and data-bias robust, generally applicable architecture. We support our insights with empirical results from more than 25,000 trained models. All results and implementations are made available.

preprint2020arXiv

On Identifiability in Transformers

In this paper we delve deep in the Transformer architecture by investigating two of its core components: self-attention and contextual embeddings. In particular, we study the identifiability of attention weights and token embeddings, and the aggregation of context into hidden tokens. We show that, for sequences longer than the attention head dimension, attention weights are not identifiable. We propose effective attention as a complementary tool for improving explanatory interpretations based on attention. Furthermore, we show that input tokens retain to a large degree their identity across the model. We also find evidence suggesting that identity information is mainly encoded in the angle of the embeddings and gradually decreases with depth. Finally, we demonstrate strong mixing of input information in the generation of contextual embeddings by means of a novel quantification method based on gradient attribution. Overall, we show that self-attention distributions are not directly interpretable and present tools to better understand and further investigate Transformer models.

preprint2020arXiv

On the Hardness of Red-Blue Pebble Games

Red-blue pebble games model the computation cost of a two-level memory hierarchy. We present various hardness results in different red-blue pebbling variants, with a focus on the oneshot model. We first study the relationship between previously introduced red-blue pebble models (base, oneshot, nodel). We also analyze a new variant (compcost) to obtain a more realistic model of computation. We then prove that red-blue pebbling is NP-hard in all of these model variants. Furthermore, we show that in the oneshot model, a $δ$-approximation algorithm for $δ<2$ is only possible if the unique games conjecture is false. Finally, we show that greedy algorithms are not good candidates for approximation, since they can return significantly worse solutions than the optimum.

preprint2020arXiv

Online Graph Exploration on a Restricted Graph Class: Optimal Solutions for Tadpole Graphs

We study the problem of online graph exploration on undirected graphs, where a searcher has to visit every vertex and return to the origin. Once a new vertex is visited, the searcher learns of all neighboring vertices and the connecting edge weights. The goal such an exploration is to minimize its total cost, where each edge traversal incurs a cost of the corresponding edge weight. We investigate the problem on tadpole graphs (also known as dragons, kites), which consist of a cycle with an attached path. Miyazaki et al. (The online graph exploration problem on restricted graphs, IEICE Transactions 92-D (9), 2009) showed that every online algorithm on these graphs must have a competitive ratio of 2-epsilon, but did not provide upper bounds for non-unit edge weights. We show via amortized analysis that a greedy approach yields a matching competitive ratio of 2 on tadpole graphs, for arbitrary non-negative edge weights.