Researcher profile

Hengfeng Wei

Hengfeng Wei contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

MET: Model Checking-Driven Explorative Testing of CRDT Designs and Implementations

Internet-scale distributed systems often replicate data at multiple geographic locations to provide low latency and high availability. The Conflict-free Replicated Data Type (CRDT) is a framework that provides a principled approach to maintaining eventual consistency among data replicas. CRDTs have been notoriously difficult to design and implement correctly. Subtle deep bugs lie in the complex and tedious handling of all possible cases of conflicting data updates. We argue that the CRDT design should be formally specified and model-checked to uncover deep bugs. The implementation further needs to be systematically tested. On the one hand, the testing needs to inherit the exhaustive nature of the model checking and ensures the coverage of testing. On the other hand, the testing is expected to find coding errors which cannot be detected by design level verification. Towards the challenges above, we propose the Model Checking-driven Explorative Testing (MET) framework. At the design level, MET uses TLA+ to specify and model check CRDT designs. At the implementation level, MET conducts model checking-driven explorative testing, in the sense that the test cases are automatically generated from the model checking traces. The system execution is controlled to proceed deterministically, following the model checking trace. The explorative testing systematically controls and permutes all nondeterministic message reorderings. We apply MET in our practical development of CRDTs. The bugs in both designs and implementations of CRDTs are found. As for bugs which can be found by traditional testing techniques, MET greatly reduces the cost of fixing the bugs. Moreover, MET can find subtle deep bugs which cannot be found by existing techniques at a reasonable cost. We further discuss how MET provides us with sufficient confidence in the correctness of our CRDT designs and implementations.

preprint2022arXiv

Remove-Win: a Design Framework for Conflict-free Replicated Data Types

Distributed storage systems employ replication to improve performance and reliability. To provide low latency data access, replicas are often required to accept updates without coordination with each other, and the updates are then propagated asynchronously. This brings the critical challenge of conflict resolution among concurrent updates. Conflict-free Replicated Data Type (CRDT) is a principled approach to addressing this challenge. However, existing CRDT designs are tricky, and hard to be generalized to other data types. A design framework is in great need to guide the systematic design of new CRDTs. To address this challenge, we propose RWF -- the Remove-Win design Framework for CRDTs. RWF leverages the simple but powerful remove-win strategy to resolve conflicting updates, and provides generic design for a variety of data container types. Two exemplar implementations following RWF are given over the Redis data type store, which demonstrate the effectiveness of RWF. Performance measurements of our implementations further show the efficiency of CRDT designs following RWF.

preprint2022arXiv

Verifying Transactional Consistency of MongoDB

MongoDB is a popular general-purpose, document-oriented, distributed NoSQL database. It supports transactions in three different deployments: single-document transactions utilizing the WiredTiger storage engine in a standalone node, multi-document transactions in a replica set which consists of a primary node and several secondary nodes, and distributed transactions in a sharded cluster which is a group of multiple replica sets, among which data is sharded. A natural and fundamental question about MongoDB transactions is: What transactional consistency guarantee do MongoDB Transactions in each deployment provide? However, it lacks both concise pseudocode of MongoDB transactions in each deployment and formal specification of the consistency guarantees which MongoDB claimed to provide. In this work, we formally specify and verify the transactional consistency protocols of MongoDB. Specifically, we provide a concise pseudocode for the transactional consistency protocols in each MongoDB deployment, namely WIREDTIGER, REPLICASET, and SHARDEDCLUSTER, based on the official documents and source code. We then prove that WIREDTIGER, REPLICASET, and SHARDEDCLUSTER satisfy different variants of snapshot isolation, namely Strong-SI, Realtime-SI, and Session-SI, respectively. We also propose and evaluate efficient white-box checking algorithms for MongoDB transaction protocols against their consistency guarantees, effectively circumventing the NP-hard obstacle in theory.

preprint2022arXiv

ViSearch: Weak Consistency Measurement for Replicated Data Types

Large-scale replicated data type stores often resort to eventual consistency to guarantee low latency and high availability. It is widely accepted that programming over eventually consistent data stores is challenging, since arbitrary divergence among replicas is allowed. Moreover, pragmatic protocols actually achieve consistency guarantees stronger than eventual consistency, which can be and need to be utilized to facilitate the reasoning of and programming over replicated data types. Toward the challenges above, we propose the ViSearch framework for precise measurement of eventual consistency semantics. ViSearch employs the visibility-arbitration specification methodology in concurrent programming, which extends the linearizability-based specification methodology with a dynamic visibility relation among operations, in addition to the standard dynamic happen-before and linearization relations. The consistency measurement using ViSearch is NP-hard in general. To enable practical and efficient consistency measurement in replicated data type stores, the ViSearch framework refactors the existing brute-force checking algorithm to a generic algorithm skeleton, which further enables efficient pruning of the search space and effective parallelization. We employ the ViSearch framework for consistency measurement in two replicated data type stores Riak and CRDT-Redis. The experimental evaluation shows the usefulness and cost-effectiveness of consistency measurement based on the ViSearch framework in realistic scenarios.

preprint2020arXiv

Fine-grained Analysis on Fast Implementations of Distributed Multi-writer Atomic Registers

Distributed multi-writer atomic registers are at the heart of a large number of distributed algorithms. While enjoying the benefits of atomicity, researchers further explore fast implementations of atomic reigsters which are optimal in terms of data access latency. Though it is proved that multi-writer atomic register implementations are impossible when both read and write are required to be fast, it is still open whether implementations are impossible when only write or read is required to be fast. This work proves the impossibility of fast write implementations based on a series of chain arguments among indistiguishable executions. We also show the necessary and sufficient condition for fast read implementations by extending the results in the single-writer case. This work concludes a series of studies on fast implementations of distributed atomic registers.