Researcher profile

Bruhathi Sundarmurthy

Bruhathi Sundarmurthy contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
1topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2020arXiv

Providing Insights for Queries affected by Failures and Stragglers

Interactive time responses are a crucial requirement for users analyzing large amounts of data. Such analytical queries are typically run in a distributed setting, with data being sharded across thousands of nodes for high throughput. However, providing real-time analytics is still a very big challenge; with data distributed across thousands of nodes, the probability that some of the required nodes are unavailable or very slow during query execution is very high and unavailability may result in slow execution or even failures. The sheer magnitude of data and users increase resource contention and this exacerbates the phenomenon of stragglers and node failures during execution. In this paper, we propose a novel solution to alleviate the straggler/failure problem that exploits existing efficient partitioning properties of the data, particularly, co-hash partitioned data, and provides approximate answers along with confidence bounds to queries affected by failed/straggler nodes. We consider aggregate queries that involve joins, group bys, having clauses and a subclass of nested subqueries. Finally, we validate our approach through extensive experiments on the TPC-H dataset.

preprint2020arXiv

To pipeline or not to pipeline, that is the question

In designing query processing primitives, a crucial design choice is the method for data transfer between two operators in a query plan. As we were considering this critical design mechanism for an in-memory database system that we are building, we quickly realized that (surprisingly) there isn't a clear definition of this concept. Papers are full or ad hoc use of terms like pipelining and blocking, but as these terms are not crisply defined, it is hard to fully understand the results attributed to these concepts. To address this limitation, we introduce a clear terminology for how to think about data transfer between operators in a query pipeline. We show that there isn't a clear definition of pipelining and blocking, and that there is a full spectrum of techniques based on a simple concept called unit-of-transfer. Next, we develop an analytical model for inter-operator communication, and highlight the key parameters that impact performance (for in-memory database settings). Armed with this model, we then apply it to the system we are designing and highlight the insights we gathered from this exercise. We find that the gap between pipelining and non-pipelining query execution, w.r.t. key factors such as performance and memory footprint is quite narrow, and thus system designers should likely rethink the notion of pipelining vs. blocking for in-memory database systems.