Researcher profile

Paulo Sérgio Almeida

Paulo Sérgio Almeida contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Space-Optimal, Computation-Optimal, Topology-Agnostic, Throughput-Scalable Causal Delivery through Hybrid Buffering

Message delivery respecting causal ordering (causal delivery) is one of the most classic and widely useful abstraction for inter-process communication in a distributed system. Most approaches tag messages with causality information and buffer them at the receiver until they can be safely delivered. Except for specific approaches that exploit communication topology, therefore not generally applicable, they incur a metadata overhead which is prohibitive for a large number of processes. Much less used are the approaches that enforce causal order by buffering messages at the sender, until it is safe to release them to the network, as the classic algorithm has too many drawbacks. In this paper, first we discuss the limitations of sender-only buffering approaches and introduce the Sender Permission to Send (SPS) enforcement strategy, showing that SPS + FIFO implies Causal. We analyze a recent sender-buffering algorithm, Cykas, which follows SPS + FIFO, albeit very conservatively, pointing out throughput scalability and liveness issues. Then, we introduce a novel SPS + FIFO based algorithm, which adopts a new hybrid approach: enforcing causality by combining sender-buffering to enforce SPS and receiver-buffering to enforce FIFO. The algorithm overcomes limitations of sender-only buffering, and achieves effectively constant metadata size per message. By a careful choice of data-structures, the algorithm is also computationally-optimal, with amortized effectively constant processing overhead. As far as we know, there is no other topology-agnostic causal delivery algorithm with these properties.

preprint2020arXiv

Age-Partitioned Bloom Filters

Bloom filters (BF) are widely used for approximate membership queries over a set of elements. BF variants allow removals, sets of unbounded size or querying a sliding window over an unbounded stream. However, for this last case the best current approaches are dictionary based (e.g., based on Cuckoo Filters or TinyTable), and it may seem that BF-based approaches will never be competitive to dictionary-based ones. In this paper we present Age-Partitioned Bloom Filters, a BF-based approach for duplicate detection in sliding windows that not only is competitive in time-complexity, but has better space usage than current dictionary-based approaches (e.g., SWAMP), at the cost of some moderate slack. APBFs retain the BF simplicity, unlike dictionary-based approaches, important for hardware-based implementations, and can integrate known improvements such as double hashing or blocking. We present an Age-Partitioned Blocked Bloom Filter variant which can operate with 2-3 cache-line accesses per insertion and around 2-4 per query, even for high accuracy filters.

preprint2013arXiv

Scalable Eventually Consistent Counters over Unreliable Networks

Counters are an important abstraction in distributed computing, and play a central role in large scale geo-replicated systems, counting events such as web page impressions or social network "likes". Classic distributed counters, strongly consistent, cannot be made both available and partition-tolerant, due to the CAP Theorem, being unsuitable to large scale scenarios. This paper defines Eventually Consistent Distributed Counters (ECDC) and presents an implementation of the concept, Handoff Counters, that is scalable and works over unreliable networks. By giving up the sequencer aspect of classic distributed counters, ECDC implementations can be made AP in the CAP design space, while retaining the essence of counting. Handoff Counters are the first CRDT (Conflict-free Replicated Data Type) based mechanism that overcomes the identity explosion problem in naive CRDTs, such as G-Counters (where state size is linear in the number of independent actors that ever incremented the counter), by managing identities towards avoiding global propagation and garbage collecting temporary entries. The approach used in Handoff Counters is not restricted to counters, being more generally applicable to other data types with associative and commutative operations.

preprint2012arXiv

Spectra: Robust Estimation of Distribution Functions in Networks

Distributed aggregation allows the derivation of a given global aggregate property from many individual local values in nodes of an interconnected network system. Simple aggregates such as minima/maxima, counts, sums and averages have been thoroughly studied in the past and are important tools for distributed algorithms and network coordination. Nonetheless, this kind of aggregates may not be comprehensive enough to characterize biased data distributions or when in presence of outliers, making the case for richer estimates of the values on the network. This work presents Spectra, a distributed algorithm for the estimation of distribution functions over large scale networks. The estimate is available at all nodes and the technique depicts important properties, namely: robust when exposed to high levels of message loss, fast convergence speed and fine precision in the estimate. It can also dynamically cope with changes of the sampled local property, not requiring algorithm restarts, and is highly resilient to node churn. The proposed approach is experimentally evaluated and contrasted to a competing state of the art distribution aggregation technique.

preprint2011arXiv

A Survey of Distributed Data Aggregation Algorithms

Distributed data aggregation is an important task, allowing the decentralized determination of meaningful global properties, that can then be used to direct the execution of other applications. The resulting values result from the distributed computation of functions like COUNT, SUM and AVERAGE. Some application examples can found to determine the network size, total storage capacity, average load, majorities and many others. In the last decade, many different approaches have been proposed, with different trade-offs in terms of accuracy, reliability, message and time complexity. Due to the considerable amount and variety of aggregation algorithms, it can be difficult and time consuming to determine which techniques will be more appropriate to use in specific settings, justifying the existence of a survey to aid in this task. This work reviews the state of the art on distributed data aggregation algorithms, providing three main contributions. First, it formally defines the concept of aggregation, characterizing the different types of aggregation functions. Second, it succinctly describes the main aggregation techniques, organizing them in a taxonomy. Finally, it provides some guidelines toward the selection and use of the most relevant techniques, summarizing their principal characteristics.

preprint2011arXiv

Fast Distributed Computation of Distances in Networks

This paper presents a distributed algorithm to simultaneously compute the diameter, radius and node eccentricity in all nodes of a synchronous network. Such topological information may be useful as input to configure other algorithms. Previous approaches have been modular, progressing in sequential phases using building blocks such as BFS tree construction, thus incurring longer executions than strictly required. We present an algorithm that, by timely propagation of available estimations, achieves a faster convergence to the correct values. We show local criteria for detecting convergence in each node. The algorithm avoids the creation of BFS trees and simply manipulates sets of node ids and hop counts. For the worst scenario of variable start times, each node i with eccentricity ecc(i) can compute: the node eccentricity in diam(G)+ecc(i)+2 rounds; the diameter in 2*diam(G)+ecc(i)+2 rounds; and the radius in diam(G)+ecc(i)+2*radius(G) rounds.

preprint2010arXiv

Dependability in Aggregation by Averaging

Aggregation is an important building block of modern distributed applications, allowing the determination of meaningful properties (e.g. network size, total storage capacity, average load, majorities, etc.) that are used to direct the execution of the system. However, the majority of the existing aggregation algorithms exhibit relevant dependability issues, when prospecting their use in real application environments. In this paper, we reveal some dependability issues of aggregation algorithms based on iterative averaging techniques, giving some directions to solve them. This class of algorithms is considered robust (when compared to common tree-based approaches), being independent from the used routing topology and providing an aggregation result at all nodes. However, their robustness is strongly challenged and their correctness often compromised, when changing the assumptions of their working environment to more realistic ones. The correctness of this class of algorithms relies on the maintenance of a fundamental invariant, commonly designated as "mass conservation". We will argue that this main invariant is often broken in practical settings, and that additional mechanisms and modifications are required to maintain it, incurring in some degradation of the algorithms performance. In particular, we discuss the behavior of three representative algorithms Push-Sum Protocol, Push-Pull Gossip protocol and Distributed Random Grouping under asynchronous and faulty (with message loss and node crashes) environments. More specifically, we propose and evaluate two new versions of the Push-Pull Gossip protocol, which solve its message interleaving problem (evidenced even in a synchronous operation mode).