Researcher profile

Manas Joglekar

Manas Joglekar contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - Emerging
8works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2016arXiv

Interactive Data Exploration with Smart Drill-Down

We present {\em smart drill-down}, an operator for interactively exploring a relational table to discover and summarize "interesting" groups of tuples. Each group of tuples is described by a {\em rule}. For instance, the rule $(a, b, \star, 1000)$ tells us that there are a thousand tuples with value $a$ in the first column and $b$ in the second column (and any value in the third column). Smart drill-down presents an analyst with a list of rules that together describe interesting aspects of the table. The analyst can tailor the definition of interesting, and can interactively apply smart drill-down on an existing rule to explore that part of the table. We demonstrate that the underlying optimization problems are {\sc NP-Hard}, and describe an algorithm for finding the approximately optimal list of rules to display when the user uses a smart drill-down, and a dynamic sampling scheme for efficiently interacting with large tables. Finally, we perform experiments on real datasets on our experimental prototype to demonstrate the usefulness of smart drill-down and study the performance of our algorithms.

preprint2016arXiv

SLiMFast: Guaranteed Results for Data Fusion and Source Reliability

We focus on data fusion, i.e., the problem of unifying conflicting data from data sources into a single representation by estimating the source accuracies. We propose SLiMFast, a framework that expresses data fusion as a statistical learning problem over discriminative probabilistic models, which in many cases correspond to logistic regression. In contrast to previous approaches that use complex generative models, discriminative models make fewer distributional assumptions over data sources and allow us to obtain rigorous theoretical guarantees. Furthermore, we show how SLiMFast enables incorporating domain knowledge into data fusion, yielding accuracy improvements of up to 50\% over state-of-the-art baselines. Building upon our theoretical results, we design an optimizer that obviates the need for users to manually select an algorithm for learning SLiMFast's parameters. We validate our optimizer on multiple real-world datasets and show that it can accurately predict the learning algorithm that yields the best data fusion results.

preprint2015arXiv

Aggregations over Generalized Hypertree Decompositions

We study a class of aggregate-join queries with multiple aggregation operators evaluated over annotated relations. We show that straightforward extensions of standard multiway join algorithms and generalized hypertree decompositions (GHDs) provide best-known runtime guarantees. In contrast, prior work uses bespoke algorithms and data structures and does not match these guarantees. Our extensions to the standard techniques are a pair of simple tests that (1) determine if two orderings of aggregation operators are equivalent and (2) determine if a GHD is compatible with a given ordering. These tests provide a means to find an optimal GHD that, when provided to standard join algorithms, will correctly answer a given aggregate-join query. The second class of our contributions is a pair of complete characterizations of (1) the set of orderings equivalent to a given ordering and (2) the set of GHDs compatible with some equivalent ordering. We show by example that previous approaches are incomplete. The key technical consequence of our characterizations is a decomposition of a compatible GHD into a set of (smaller) {\em unconstrained} GHDs, i.e. into a set of GHDs of sub-queries without aggregations. Since this decomposition is comprised of unconstrained GHDs, we are able to connect to the wide literature on GHDs for join query processing, thereby obtaining improved runtime bounds, MapReduce variants, and an efficient method to find approximately optimal GHDs.

preprint2014arXiv

Average Case Analysis of the Classical Algorithm for Markov Decision Processes with Büchi Objectives

We consider Markov decision processes (MDPs) with $ω$-regular specifications given as parity objectives. We consider the problem of computing the set of almost-sure winning vertices from where the objective can be ensured with probability 1. The algorithms for the computation of the almost-sure winning set for parity objectives iteratively use the solutions for the almost-sure winning set for Büchi objectives (a special case of parity objectives). We study for the first time the average case complexity of the classical algorithm for computing almost-sure winning vertices for MDPs with Büchi objectives. Our contributions are as follows: First, we show that for MDPs with constant out-degree the expected number of iterations is at most logarithmic and the average case running time is linear (as compared to the worst case linear number of iterations and quadratic time complexity). Second, we show that for general MDPs the expected number of iterations is constant and the average case running time is linear (again as compared to the worst case linear number of iterations and quadratic time complexity). Finally we also show that given all graphs are equally likely, the probability that the classical algorithm requires more than constant number of iterations is exponentially small.

preprint2014arXiv

Comprehensive and Reliable Crowd Assessment Algorithms

Evaluating workers is a critical aspect of any crowdsourcing system. In this paper, we devise techniques for evaluating workers by finding confidence intervals on their error rates. Unlike prior work, we focus on "conciseness"---that is, giving as tight a confidence interval as possible. Conciseness is of utmost importance because it allows us to be sure that we have the best guarantee possible on worker error rate. Also unlike prior work, we provide techniques that work under very general scenarios, such as when not all workers have attempted every task (a fairly common scenario in practice), when tasks have non-boolean responses, and when workers have different biases for positive and negative tasks. We demonstrate conciseness as well as accuracy of our confidence intervals by testing them on a variety of conditions and multiple real-world datasets.

preprint2014arXiv

Evaluating the Crowd with Confidence

Worker quality control is a crucial aspect of crowdsourcing systems; typically occupying a large fraction of the time and money invested on crowdsourcing. In this work, we devise techniques to generate confidence intervals for worker error rate estimates, thereby enabling a better evaluation of worker quality. We show that our techniques generate correct confidence intervals on a range of real-world datasets, and demonstrate wide applicability by using them to evict poorly performing workers, and provide confidence intervals on the accuracy of the answers.

preprint2014arXiv

Exploiting Correlations for Expensive Predicate Evaluation

User Defined Function(UDFs) are used increasingly to augment query languages with extra, application dependent functionality. Selection queries involving UDF predicates tend to be expensive, either in terms of monetary cost or latency. In this paper, we study ways to efficiently evaluate selection queries with UDF predicates. We provide a family of techniques for processing queries at low cost while satisfying user-specified precision and recall constraints. Our techniques are applicable to a variety of scenarios including when selection probabilities of tuples are available beforehand, when this information is available but noisy, or when no such prior information is available. We also generalize our techniques to more complex queries. Finally, we test our techniques on real datasets, and show that they achieve significant savings in cost of up to $80\%$, while incurring only a small reduction in accuracy.

preprint2014arXiv

Symbolic Algorithms for Qualitative Analysis of Markov Decision Processes with Büchi Objectives

We consider Markov decision processes (MDPs) with ω-regular specifications given as parity objectives. We consider the problem of computing the set of almost-sure winning states from where the objective can be ensured with probability 1. The algorithms for the computation of the almost-sure winning set for parity objectives iteratively use the solutions for the almost-sure winning set for Büchi objectives (a special case of parity objectives). Our contributions are as follows: First, we present the first subquadratic symbolic algorithm to compute the almost-sure winning set for MDPs with Büchi objectives; our algorithm takes O(n \sqrt{m}) symbolic steps as compared to the previous known algorithm that takes O(n^2) symbolic steps, where $n$ is the number of states and $m$ is the number of edges of the MDP. In practice MDPs have constant out-degree, and then our symbolic algorithm takes O(n \sqrt{n}) symbolic steps, as compared to the previous known $O(n^2)$ symbolic steps algorithm. Second, we present a new algorithm, namely win-lose algorithm, with the following two properties: (a) the algorithm iteratively computes subsets of the almost-sure winning set and its complement, as compared to all previous algorithms that discover the almost-sure winning set upon termination; and (b) requires O(n \sqrt{K}) symbolic steps, where K is the maximal number of edges of strongly connected components (scc's) of the MDP. The win-lose algorithm requires symbolic computation of scc's. Third, we improve the algorithm for symbolic scc computation; the previous known algorithm takes linear symbolic steps, and our new algorithm improves the constants associated with the linear number of steps. In the worst case the previous known algorithm takes 5n symbolic steps, whereas our new algorithm takes 4n symbolic steps.