Source author record

Pierre Sutra

Pierre Sutra appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing

Catalog footprint

What is connected

7works

1topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Revisiting Lower Bounds for Two-Step Consensus

A seminal result by Lamport shows that at least $\max\{2e+f+1,2f+1\}$ processes are required to implement partially synchronous consensus that tolerates $f$ process failures and can furthermore decide in two message delays under $e$ failures. This lower bound is matched by the classical Fast Paxos protocol. However, more recent practical protocols, such as Egalitarian Paxos, provide two-step decisions with fewer processes, seemingly contradicting the lower bound. We show that this discrepancy arises because the classical bound requires two-step decisions under a wide range of scenarios, not all of which are relevant in practice. We propose a more pragmatic condition for which we establish tight bounds on the number of processes required. Interestingly, these bounds depend on whether consensus is implemented as an atomic object or a decision task. For consensus as an object, $\max\{2e+f-1,2f+1\}$ processes are necessary and sufficient for two-step decisions, while for a task the tight bound is $\max\{2e+f, 2f+1\}$.

preprint2022arXiv

The Weakest Failure Detector for Genuine Atomic Multicast (Extended Version)

Atomic broadcast is a group communication primitive to order messages across a set of distributed processes. Atomic multicast is its natural generalization where each message $m$ is addressed to $dst(m)$, a subset of the processes called its destination group. A solution to atomic multicast is genuine when a process takes steps only if a message is addressed to it. Genuine solutions are the ones used in practice because they have better performance. Let $G$ be all the destination groups and $F$ be the cyclic families in it, that is the subsets of $G$ whose intersection graph is hamiltonian. This paper establishes that the weakest failure detector to solve genuine atomic multicast is $μ=(\wedge_{g,h \in G}~Σ_{g \cap h}) \wedge (\wedge_{g \in G}~Ω_g) \wedge γ$, where (i) $Σ_P$ and $Ω_P$ are the quorum and leader failure detectors restricted to the processes in $P$, and (ii) $γ$ is a new failure detector that informs the processes in a cyclic family $f \in F$ when $f$ is faulty. We also study two classical variations of atomic multicast. The first variation requires that message delivery follows the real-time order. In this case, $μ$ must be strengthened with $1^{g \cap h}$, the indicator failure detector that informs each process in $g \cup h$ when $g \cap h$ is faulty. The second variation requires a message to be delivered when the destination group runs in isolation. We prove that its weakest failure detector is at least $μ\wedge (\wedge_{g, h \in G}~Ω_{g \cap h})$. This value is attained when $F=\varnothing$.

preprint2020arXiv

Leaderless State-Machine Replication: Specification, Properties, Limits (Extended Version)

Modern Internet services commonly replicate critical data across several geographical locations using state-machine replication (SMR). Due to their reliance on a leader replica, classical SMR protocols offer limited scalability and availability in this setting. To solve this problem, recent protocols follow instead a leaderless approach, in which each replica is able to make progress using a quorum of its peers. In this paper, we study this new emerging class of SMR protocols and states some of their limits. We first propose a framework that captures the essence of leaderless state-machine replication (Leaderless SMR). Then, we introduce a set of desirable properties for these protocols: (R)eliability, (O)ptimal (L)atency and (L)oad Balancing. We show that protocols matching all of the ROLL properties are subject to a trade-off between performance and reliability. We also establish a lower bound on the message delay to execute a command in protocols optimal for the ROLL properties. This lower bound explains the persistent chaining effect observed in experimental results.

preprint2020arXiv

State-Machine Replication for Planet-Scale Systems (Extended Version)

Online applications now routinely replicate their data at multiple sites around the world. In this paper we present Atlas, the first state-machine replication protocol tailored for such planet-scale systems. Atlas does not rely on a distinguished leader, so clients enjoy the same quality of service independently of their geographical locations. Furthermore, client-perceived latency improves as we add sites closer to clients. To achieve this, Atlas minimizes the size of its quorums using an observation that concurrent data center failures are rare. It also processes a high percentage of accesses in a single round trip, even when these conflict. We experimentally demonstrate that Atlas consistently outperforms state-of-the-art protocols in planet-scale scenarios. In particular, Atlas is up to two times faster than Flexible Paxos with identical failure assumptions, and more than doubles the performance of Egalitarian Paxos in the YCSB benchmark.

preprint2015arXiv

Anonymous Obstruction-free $(n,k)$-Set Agreement with $n-k+1$ Atomic Read/Write Registers

The $k$-set agreement problem is a generalization of the consensus problem. Namely, assuming each process proposes a value, each non-faulty process has to decide a value such that each decided value was proposed, and no more than $k$ different values are decided. This is a hard problem in the sense that it cannot be solved in asynchronous systems as soon as $k$ or more processes may crash. One way to circumvent this impossibility consists in weakening its termination property, requiring that a process terminates (decides) only if it executes alone during a long enough period. This is the well-known obstruction-freedom progress condition. Considering a system of $n$ {\it anonymous asynchronous} processes, which communicate through atomic {\it read/write registers only}, and where {\it any number of processes may crash}, this paper addresses and solves the challenging open problem of designing an obstruction-free $k$-set agreement algorithm with $(n-k+1)$ atomic registers only. From a shared memory cost point of view, this algorithm is the best algorithm known so far, thereby establishing a new upper bound on the number of registers needed to solve the problem (its gain is $(n-k)$ with respect to the previous upper bound). The algorithm is then extended to address the repeated version of $(n,k)$-set agreement. As it is optimal in the number of atomic read/write registers, this algorithm closes the gap on previously established lower/upper bounds for both the anonymous and non-anonymous versions of the repeated $(n,k)$-set agreement problem. Finally, for $1 \leq x\leq k \textless{} n$, a generalization suited to $x$-obstruction-freedom is also described, which requires $(n-k+x)$ atomic registers only.

preprint2014arXiv

A Practical Distributed Universal Construction with Unknown Participants

Modern distributed systems employ atomic read-modify-write primitives to coordinate concurrent operations. Such primitives are typically built on top of a central server, or rely on an agreement protocol. Both approaches provide a universal construction, that is, a general mechanism to construct atomic and responsive objects. These two techniques are however known to be inherently costly. As a consequence, they may result in bottlenecks in applications using them for coordination. In this paper, we investigate another direction to implement a universal construction. Our idea is to delegate the implementation of the universal construction to the clients, and solely implement a distributed shared atomic memory at the servers side. The construction we propose is obstruction-free. It can be implemented in a purely asynchronous manner, and it does not assume the knowledge of the participants. It is built on top of grafarius and racing objects, two novel shared abstractions that we introduce in detail. To assess the benefits of our approach, we present a prototype implementation on top of the Cassandra data store, and compare it empirically to the Zookeeper coordination service.

preprint2013arXiv

Non-Monotonic Snapshot Isolation

Many distributed applications require transactions. However, transactional protocols that require strong synchronization are costly in large scale environments. Two properties help with scalability of a transactional system: genuine partial replication (GPR), which leverages the intrinsic parallelism of a workload, and snapshot isolation (SI), which decreases the need for synchronization. We show that, under standard assumptions (data store accesses are not known in advance, and transactions may access arbitrary objects in the data store), it is impossible to have both SI and GPR. To circumvent this impossibility, we propose a weaker consistency criterion, called Non-monotonic Snapshot Isolation (NMSI). NMSI retains the most important properties of SI, i.e., read-only transactions always commit, and two write-conflicting updates do not both commit. We present a GPR protocol that ensures NMSI, and has lower message cost (i.e., it contacts fewer replicas and/or commits faster) than previous approaches.