Source author record

Rodrigo Rodrigues

Rodrigo Rodrigues appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Databases Performance Programming Languages

Catalog footprint

What is connected

5works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Alea-BFT: Practical Asynchronous Byzantine Fault Tolerance

Traditional Byzantine Fault Tolerance (BFT) state machine replication protocols assume a partial synchrony model, leading to a design where a leader replica drives the protocol and is replaced after a timeout. Recently, we witnessed a surge of asynchronous BFT protocols that use randomization to remove the assumptions of bounds on message delivery times, making them more resilient to adverse network conditions. However, these protocols still fall short of being practical across a broad range of scenarios due to their cubic communication costs, use of expensive primitives, and overall protocol complexity. In this paper, we present Alea-BFT, the first asynchronous BFT protocol to achieve quadratic communication complexity, allowing it to scale to large networks. Alea-BFT brings the key design insight from classical protocols of concentrating part of the work on a single designated replica, and incorporates this principle in a two stage pipelined design, with an efficient broadcast led by the designated replica followed by an inexpensive binary agreement. We evaluated our prototype implementation across 10 sites in 4 continents, and our results show significant scalability gains from the proposed design.

preprint2022arXiv

Efficient and Eventually Consistent Collective Operations

Collective operations are common features of parallel programming models that are frequently used in High-Performance (HPC) and machine/ deep learning (ML/ DL) applications. In strong scaling scenarios, collective operations can negatively impact the overall application performance: with the increase in core count, the load per rank decreases, while the time spent in collective operations increases logarithmically. In this article, we propose a design for eventually consistent collectives suitable for ML/ DL computations by reducing communication in Broadcast and Reduce, as well as by exploring the Stale Synchronous Parallel (SSP) synchronization model for the Allreduce collective. Moreover, we also enrich the GASPI ecosystem with frequently used classic/ consistent collective operations -- such as Allreduce for large messages and AlltoAll used in an HPC code. Our implementations show promising preliminary results with significant improvements, especially for Allreduce and AlltoAll, compared to the vendor-provided MPI alternatives.

preprint2020arXiv

Aion: Better Late than Never in Event-Time Streams

Processing data streams in near real-time is an increasingly important task. In the case of event-timestamped data, the stream processing system must promptly handle late events that arrive after the corresponding window has been processed. To enable this late processing, the window state must be maintained for a long period of time. However, current systems maintain this state in memory, which either imposes a maximum period of tolerated lateness, or causes the system to degrade performance or even crash when the system memory runs out. In this paper, we propose AION, a comprehensive solution for handling late events in an efficient manner, implemented on top of Flink. In designing AION, we go beyond a naive solution that transfers state between memory and persistent storage on demand. In particular, we introduce a proactive caching scheme, where we leverage the semantics of stream processing to anticipate the need for bringing data to memory. Furthermore, we propose a predictive cleanup scheme to permanently discard window state based on the likelihood of receiving more late events, to prevent storage consumption from growing without bounds. Our evaluation shows that AION is capable of maintaining sustainable levels of memory utilization while still preserving high throughput, low latency, and low staleness.

preprint2020arXiv

Lazy State Determination: More concurrency for contending linearizable transactions

The concurrency control algorithms in transactional systems limits concurrency to provide strong semantics, which leads to poor performance under high contention. As a consequence, many transactional systems eschew strong semantics to achieve acceptable performance. We show that by leveraging semantic information associated with the transactional programs to increase concurrency, it is possible to significantly improve performance while maintaining linearizability. To this end, we introduce the lazy state determination API to easily expose the semantics of application transactions to the database, and propose new optimistic and pessimistic concurrency control algorithms that leverage this information to safely increase concurrency in the presence of contention. Our evaluation shows that our approach can achieve up to 5x more throughput with 1.5c less latency than standard techniques in the popular TPC-C benchmark.

preprint2015arXiv

Extending Eventually Consistent Cloud Databases for Enforcing Numeric Invariants

Geo-replicated databases often operate under the principle of eventual consistency to offer high-availability with low latency on a simple key/value store abstraction. Recently, some have adopted commutative data types to provide seamless reconciliation for special purpose data types, such as counters. Despite this, the inability to enforce numeric invariants across all replicas still remains a key shortcoming of relying on the limited guarantees of eventual consistency storage. We present a new replicated data type, called bounded counter, which adds support for numeric invariants to eventually consistent geo-replicated databases. We describe how this can be implemented on top of existing cloud stores without modifying them, using Riak as an example. Our approach adapts ideas from escrow transactions to devise a solution that is decentralized, fault-tolerant and fast. Our evaluation shows much lower latency and better scalability than the traditional approach of using strong consistency to enforce numeric invariants, thus alleviating the tension between consistency and availability.