Source author record

Wojciech Golab

Wojciech Golab appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Databases

Catalog footprint

What is connected

9works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

OptCon: An Adaptable SLA-Aware Consistency Tuning Framework for Quorum-based Stores

Users of distributed datastores that employ quorum-based replication are burdened with the choice of a suitable client-centric consistency setting for each storage operation. The above matching choice is difficult to reason about as it requires deliberating about the tradeoff between the latency and staleness, i.e., how stale (old) the result is. The latency and staleness for a given operation depend on the client-centric consistency setting applied, as well as dynamic parameters such as the current workload and network condition.We present OptCon, a novel machine learning-based predictive framework, that can automate the choice of client-centric consistency setting under user-specified latency and staleness thresholds given in the service level agreement (SLA). Under a given SLA, OptCon predicts a client-centric consistency setting that is matching, i.e., it is weak enough to satisfy the latency threshold, while being strong enough to satisfy the staleness threshold. While manually tuned consistency settings remain fixed unless explicitly reconfigured, OptCon tunes consistency settings on a per-operation basis with respect to changing workload and network state. Using decision tree learning, OptCon yields 0.14 cross validation error in predicting matching consistency settings under latency and staleness thresholds given in the SLA. We demonstrate experimentally that OptCon is at least as effective as any manually chosen consistency settings in adapting to the SLA thresholds for different use cases. We also demonstrate that OptCon adapts to variations in workload, whereas a given manually chosen fixed consistency setting satisfies the SLA only for a characteristic workload.

preprint2016arXiv

OptEx: A Deadline-Aware Cost Optimization Model for Spark

We present OptEx, a closed-form model of job execution on Apache Spark, a popular parallel processing engine. To the best of our knowledge, OptEx is the first work that analytically models job completion time on Spark. The model can be used to estimate the completion time of a given Spark job on a cloud, with respect to the size of the input dataset, the number of iterations, the number of nodes comprising the underlying cluster. Experimental results demonstrate that OptEx yields a mean relative error of 6% in estimating the job completion time. Furthermore, the model can be applied for estimating the cost optimal cluster composition for running a given Spark job on a cloud under a completion deadline specified in the SLO (i.e., Service Level Objective). We show experimentally that OptEx is able to correctly estimate the cost optimal cluster composition for running a given Spark job under an SLO deadline with an accuracy of 98%.

preprint2015arXiv

Continuous Partial Quorums for Consistency-Latency Tuning in Distributed NoSQL Storage Systems

NoSQL storage systems are used extensively by web applications and provide an attractive alternative to conventional databases when the need for scalability outweighs the need for transactions. Several of these systems provide quorum-based replication and present the application developer with a choice of multiple client-side "consistency levels" that determine the number of replicas accessed by reads and writes, which in turn affects both latency and the consistency observed by the client application. Since using a fixed combination of read and write consistency levels for a given application provides only a limited number of discrete options, we investigate techniques that allow more fine-grained tuning of the consistency-latency trade-off, as may be required to support consistency-based service level agreements (SLAs). We propose a novel technique called \emph{continuous partial quorums} (CPQ) that assigns the consistency level on a per-operation basis by choosing randomly between two options, such as eventual and strong consistency, with a tunable probability. We evaluate our technique experimentally using Apache Cassandra and demonstrate that it outperforms an alternative tuning technique that delays operations artificially.

preprint2013arXiv

Deconstructing Queue-Based Mutual Exclusion

We formulate a modular approach to the design and analysis of a particular class of mutual exclusion algorithms for shared memory multiprocessor systems. Specifically, we consider algorithms that organize waiting processes into a queue. Such algorithms can achieve O(1) remote memory reference (RMR) complexity, which minimizes (asymptotically) the amount of traffic through the processor-memory interconnect. We first describe a generic mutual exclusion algorithm that relies on a linearizable implementation of a particular queue-like data structure that we call MutexQueue. Next, we show two implementations of MutexQueue using O(1) RMRs per operation based on synchronization primitives commonly available in multiprocessors. These implementations follow closely the queuing code embedded in previously published mutual exclusion algorithms. We provide rigorous correctness proofs and RMR complexity analyses of the algorithms we present.

preprint2013arXiv

On the k-Atomicity-Verification Problem

Modern Internet-scale storage systems often provide weak consistency in exchange for better performance and resilience. An important weak consistency property is k-atomicity, which bounds the staleness of values returned by read operations. The k-atomicity-verification problem (or k-AV for short) is the problem of deciding whether a given history of operations is k-atomic. The 1-AV problem is equivalent to verifying atomicity/linearizability, a well-known and solved problem. However, for k > 2, no polynomial-time k-AV algorithm is known. This paper makes the following contributions towards solving the k-AV problem. First, we present a simple 2- AV algorithm called LBT, which is likely to be efficient (quasilinear) for histories that arise in practice, although it is less efficient (quadratic) in the worst case. Second, we present a more involved 2-AV algorithm called FZF, which runs efficiently (quasilinear) even in the worst case. To our knowledge, these are the first algorithms that solve the 2-AV problem fully. Third, we show that the weighted k-AV problem, a natural extension of the k-AV problem, is NP-complete.

preprint2012arXiv

Minuet: A Scalable Distributed Multiversion B-Tree

Data management systems have traditionally been designed to support either long-running analytics queries or short-lived transactions, but an increasing number of applications need both. For example, online games, socio-mobile apps, and e-commerce sites need to not only maintain operational state, but also analyze that data quickly to make predictions and recommendations that improve user experience. In this paper, we present Minuet, a distributed, main-memory B-tree that supports both transactions and copy-on-write snapshots for in-situ analytics. Minuet uses main-memory storage to enable low-latency transactional operations as well as analytics queries without compromising transaction performance. In addition to supporting read-only analytics queries on snapshots, Minuet supports writable clones, so that users can create branching versions of the data. This feature can be quite useful, e.g. to support complex "what-if" analysis or to facilitate wide-area replication. Our experiments show that Minuet outperforms a commercial main-memory database in many ways. It scales to hundreds of cores and TBs of memory, and can process hundreds of thousands of B-tree operations per second while executing long-running scans.

preprint2012arXiv

Toward a Principled Framework for Benchmarking Consistency

Large-scale key-value storage systems sacrifice consistency in the interest of dependability (i.e., partition tolerance and availability), as well as performance (i.e., latency). Such systems provide eventual consistency,which---to this point---has been difficult to quantify in real systems. Given the many implementations and deployments of eventually-consistent systems (e.g., NoSQL systems), attempts have been made to measure this consistency empirically, but they suffer from important drawbacks. For example, state-of-the art consistency benchmarks exercise the system only in restricted ways and disrupt the workload, which limits their accuracy. In this paper, we take the position that a consistency benchmark should paint a comprehensive picture of the relationship between the storage system under consideration, the workload, the pattern of failures, and the consistency observed by clients. To illustrate our point, we first survey prior efforts to quantify eventual consistency. We then present a benchmarking technique that overcomes the shortcomings of existing techniques to measure the consistency observed by clients as they execute the workload under consideration. This method is versatile and minimally disruptive to the system under test. As a proof of concept, we demonstrate this tool on Cassandra.

preprint2011arXiv

A Complexity Separation Between the Cache-Coherent and Distributed Shared Memory Models

We consider asynchronous multiprocessor systems where processes communicate by accessing shared memory. Exchange of information among processes in such a multiprocessor necessitates costly memory accesses called \emph{remote memory references} (RMRs), which generate communication on the interconnect joining processors and main memory. In this paper we compare two popular shared memory architecture models, namely the \emph{cache-coherent} (CC) and \emph{distributed shared memory} (DSM) models, in terms of their power for solving synchronization problems efficiently with respect to RMRs. The particular problem we consider entails one process sending a "signal" to a subset of other processes. We show that a variant of this problem can be solved very efficiently with respect to RMRs in the CC model, but not so in the DSM model, even when we consider amortized RMR complexity. To our knowledge, this is the first separation in terms of amortized RMR complexity between the CC and DSM models. It is also the first separation in terms of RMR complexity (for asynchronous systems) that does not rely in any way on wait-freedom---the requirement that a process makes progress in a bounded number of its own steps.