Researcher profile

Naama Ben-David

Naama Ben-David contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Lock-Free Locks Revisited

This paper presents a new and practical approach to lock-free locks based on helping, which allows the user to write code using fine-grained locks, but run it in a lock-free manner. Although lock-free locks have been suggested in the past, they are widely viewed as impractical, have some key limitations, and, as far as we know, have never been implemented. The paper presents some key techniques that make lock-free locks practical and more general. The most important technique is an approach to idempotence -- i.e. making code that runs multiple times appear as if it ran once. The idea is based on using a shared log among processes running the same protected code. Importantly, the approach can be library based, requiring very little if any change to standard code -- code just needs to use the idempotent versions of memory operations (load, store, LL/SC, allocation, free). We have implemented a C++ library called Flock based on the ideas. Flock allows lock-based data structures to run in either lock-free or blocking (traditional locks) mode. We implemented a variety of tree and list-based data structures with Flock and compare the performance of the lock-free and blocking modes under a variety of workloads. The lock-free mode is almost as fast as blocking mode under almost all workloads, and significantly faster when threads are oversubscribed (more threads than processors). We also compare with several existing lock-based and lock-free alternatives.

preprint2022arXiv

Survey of Persistent Memory Correctness Conditions

The study of concurrent persistent programs has seen a surge of activity in recent years due to the introduction of non-volatile random access memories (NVRAM), yielding many models and correctness notions that are difficult to compare. In this paper, we survey existing correctness properties for this setting, placing them into the same context and comparing them. We present a hierarchy of these persistence properties based on the generality of the histories they deem correct, and show how this hierarchy shifts based on different model assumptions.

preprint2021arXiv

The Impact of RDMA on Agreement

Remote Direct Memory Access (RDMA) is becoming widely available in data centers. This technology allows a process to directly read and write the memory of a remote host, with a mechanism to control access permissions. In this paper, we study the fundamental power of these capabilities. We consider the well-known problem of achieving consensus despite failures, and find that RDMA can improve the inherent trade-off in distributed computing between failure resilience and performance. Specifically, we show that RDMA allows algorithms that simultaneously achieve high resilience and high performance, while traditional algorithms had to choose one or another. With Byzantine failures, we give an algorithm that only requires $n \geq 2f_P + 1$ processes (where $f_P$ is the maximum number of faulty processes) and decides in two (network) delays in common executions. With crash failures, we give an algorithm that only requires $n \geq f_P + 1$ processes and also decides in two delays. Both algorithms tolerate a minority of memory failures inherent to RDMA, and they provide safety in asynchronous systems and liveness with standard additional assumptions.

preprint2020arXiv

Constant-Time Snapshots with Applications to Concurrent Data Structures

We present an approach for efficiently taking snapshots of the state of a collection of CAS objects. Taking a snapshot allows later operations to read the value that each CAS object had at the time the snapshot was taken. Taking a snapshot requires a constant number of steps and returns a handle to the snapshot. Reading a snapshotted value of an individual CAS object using this handle is wait-free, taking time proportional to the number of successful CASes on the object since the snapshot was taken. Our fast, flexible snapshots yield simple, efficient implementations of atomic multi-point queries on concurrent data structures built from CAS objects. For example, in a search tree where child pointers are updated using CAS, once a snapshot is taken, one can atomically search for ranges of keys, find the first key that matches some criteria, or check if a collection of keys are all present, simply by running a standard sequential algorithm on a snapshot of the tree. To evaluate the performance of our approach, we apply it to two search trees, one balanced and one not. Experiments show that the overhead of supporting snapshots is low across a variety of workloads. Moreover, in almost all cases, range queries on the trees built from our snapshots perform as well as or better than state-of-the-art concurrent data structures that support atomic range queries.

preprint2020arXiv

Delay-Free Concurrency on Faulty Persistent Memory

Non-volatile memory (NVM) promises persistent main memory that remains correct despite loss of power. This has sparked a line of research into algorithms that can recover from a system crash. Since caches are expected to remain volatile, concurrent data structures and algorithms must be redesigned to guarantee that they are left in a consistent state after a system crash, and that the execution can be continued upon recovery. However, the prospect of redesigning every concurrent data structure or algorithm before it can be used in NVM architectures is daunting. In this paper, we present a construction that takes any concurrent program with reads, writes and CASs to shared memory and makes it persistent, i.e., can be continued after one or more processes fault and have to restart. Importantly the converted algorithm has constant computational delay (preserves instruction counts on each process within a constant factor), as well as constant recovery delay (a process can recover from a fault in a constant number of instructions). We show this first for a simple transformation, and then present optimizations to make it more practical, allowing for a tradeoff for better constant factors in computational delay, for sometimes increased recovery delay. We also provide an optimized transformation that works for any normalized lock-free data structure, thus allowing more efficient constructions for a large class of concurrent algorithms. We experimentally evaluate our transformations by applying them to a queue.