Source author record

Bruhathi Sundarmurthy

Bruhathi Sundarmurthy appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases

Catalog footprint

What is connected

2works

1topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Providing Insights for Queries affected by Failures and Stragglers

Interactive time responses are a crucial requirement for users analyzing large amounts of data. Such analytical queries are typically run in a distributed setting, with data being sharded across thousands of nodes for high throughput. However, providing real-time analytics is still a very big challenge; with data distributed across thousands of nodes, the probability that some of the required nodes are unavailable or very slow during query execution is very high and unavailability may result in slow execution or even failures. The sheer magnitude of data and users increase resource contention and this exacerbates the phenomenon of stragglers and node failures during execution. In this paper, we propose a novel solution to alleviate the straggler/failure problem that exploits existing efficient partitioning properties of the data, particularly, co-hash partitioned data, and provides approximate answers along with confidence bounds to queries affected by failed/straggler nodes. We consider aggregate queries that involve joins, group bys, having clauses and a subclass of nested subqueries. Finally, we validate our approach through extensive experiments on the TPC-H dataset.

preprint2020arXiv

To pipeline or not to pipeline, that is the question

In designing query processing primitives, a crucial design choice is the method for data transfer between two operators in a query plan. As we were considering this critical design mechanism for an in-memory database system that we are building, we quickly realized that (surprisingly) there isn't a clear definition of this concept. Papers are full or ad hoc use of terms like pipelining and blocking, but as these terms are not crisply defined, it is hard to fully understand the results attributed to these concepts. To address this limitation, we introduce a clear terminology for how to think about data transfer between operators in a query pipeline. We show that there isn't a clear definition of pipelining and blocking, and that there is a full spectrum of techniques based on a simple concept called unit-of-transfer. Next, we develop an analytical model for inter-operator communication, and highlight the key parameters that impact performance (for in-memory database settings). Armed with this model, we then apply it to the system we are designing and highlight the insights we gathered from this exercise. We find that the gap between pipelining and non-pipelining query execution, w.r.t. key factors such as performance and memory footprint is quite narrow, and thus system designers should likely rethink the notion of pipelining vs. blocking for in-memory database systems.

Bruhathi Sundarmurthy

What is connected

Connect this record

See the researcher in context

Building this map preview

2 published item(s)

Providing Insights for Queries affected by Failures and Stragglers

To pipeline or not to pipeline, that is the question