Source author record

Yuval Moskovitch

Yuval Moskovitch appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases

Catalog footprint

What is connected

5works

1topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

On Optimizing the Trade-off between Privacy and Utility in Data Provenance

Organizations that collect and analyze data may wish or be mandated by regulation to justify and explain their analysis results. At the same time, the logic that they have followed to analyze the data, i.e., their queries, may be proprietary and confidential. Data provenance, a record of the transformations that data underwent, was extensively studied as means of explanations. In contrast, only a few works have studied the tension between disclosing provenance and hiding the underlying query. This tension is the focus of the present paper, where we formalize and explore for the first time the tradeoff between the utility of presenting provenance information and the breach of privacy it poses with respect to the underlying query. Intuitively, our formalization is based on the notion of provenance abstraction, where the representation of some tuples in the provenance expressions is abstracted in a way that makes multiple tuples indistinguishable. The privacy of a chosen abstraction is then measured based on how many queries match the obfuscated provenance, in the same vein as k-anonymity. The utility is measured based on the entropy of the abstraction, intuitively how much information is lost with respect to the actual tuples participating in the provenance. Our formalization yields a novel optimization problem of choosing the best abstraction in terms of this tradeoff. We show that the problem is intractable in general, but design greedy heuristics that exploit the provenance structure towards a practically efficient exploration of the search space. We experimentally prove the effectiveness of our solution using the TPC-H benchmark and the IMDB dataset.

preprint2020arXiv

COBRA: Compression via Abstraction of Provenance for Hypothetical Reasoning

Data analytics often involves hypothetical reasoning: repeatedly modifying the data and observing the induced effect on the computation result of a data-centric application. Recent work has proposed to leverage ideas from data provenance tracking towards supporting efficient hypothetical reasoning: instead of a costly re-execution of the underlying application, one may assign values to a pre-computed provenance expression. A prime challenge in leveraging this approach for large-scale data and complex applications lies in the size of the provenance. To this end, we present a framework that allows to reduce provenance size. Our approach is based on reducing the provenance granularity using abstraction. We propose a demonstration of COBRA, a system that allows examine the effect of the provenance compression on the anticipated analysis results. We will demonstrate the usefulness of COBRA in the context of business data analysis.

preprint2020arXiv

Equivalence-Invariant Algebraic Provenance for Hyperplane Update Queries

The algebraic approach for provenance tracking, originating in the semiring model of Green et. al, has proven useful as an abstract way of handling metadata. Commutative Semirings were shown to be the "correct" algebraic structure for Union of Conjunctive Queries, in the sense that its use allows provenance to be invariant under certain expected query equivalence axioms. In this paper we present the first (to our knowledge) algebraic provenance model, for a fragment of update queries, that is invariant under set equivalence. The fragment that we focus on is that of hyperplane queries, previously studied in multiple lines of work. Our algebraic provenance structure and corresponding provenance-aware semantics are based on the sound and complete axiomatization of Karabeg and Vianu. We demonstrate that our construction can guide the design of concrete provenance model instances for different applications. We further study the efficient generation and storage of provenance for hyperplane update queries. We show that a naive algorithm can lead to an exponentially large provenance expression, but remedy this by presenting a normal form which we show may be efficiently computed alongside query evaluation. We experimentally study the performance of our solution and demonstrate its scalability and usefulness, and in particular the effectiveness of our normal form representation.

preprint2020arXiv

Hypothetical Reasoning via Provenance Abstraction

Data analytics often involves hypothetical reasoning: repeatedly modifying the data and observing the induced effect on the computation result of a data-centric application. Previous work has shown that fine-grained data provenance can help make such an analysis more efficient: instead of a costly re-execution of the underlying application, hypothetical scenarios are applied to a pre-computed provenance expression. However, storing provenance for complex queries and large-scale data leads to a significant overhead, which is often a barrier to the incorporation of provenance-based solutions. To this end, we present a framework that allows to reduce provenance size. Our approach is based on reducing the provenance granularity using user defined abstraction trees over the provenance variables; the granularity is based on the anticipated hypothetical scenarios. We formalize the tradeoff between provenance size and supported granularity of the hypothetical reasoning, and study the complexity of the resulting optimization problem, provide efficient algorithms for tractable cases and heuristics for others. We experimentally study the performance of our solution for various queries and abstraction trees. Our study shows that the algorithms generally lead to substantial speedup of hypothetical reasoning, with a reasonable loss of accuracy.

preprint2020arXiv

Towards Inferring Queries from Simple and Partial Provenance Examples

The field of query-by-example aims at inferring queries from output examples given by non-expert users, by finding the underlying logic that binds the examples. However, for a very small set of examples, it is difficult to correctly infer such logic. To bridge this gap, previous work suggested attaching explanations to each output example, modeled as provenance, allowing users to explain the reason behind their choice of example. In this paper, we explore the problem of inferring queries from a few output examples and intuitive explanations. We propose a two step framework: (1) convert the explanations into (partial) provenance and (2) infer a query that generates the output examples using a novel algorithm that employs a graph based approach. This framework is suitable for non-experts as it does not require the specification of the provenance in its entirety or an understanding of its structure. We show promising initial experimental results of our approach.

Yuval Moskovitch

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

On Optimizing the Trade-off between Privacy and Utility in Data Provenance

COBRA: Compression via Abstraction of Provenance for Hypothetical Reasoning

Equivalence-Invariant Algebraic Provenance for Hyperplane Update Queries

Hypothetical Reasoning via Provenance Abstraction

Towards Inferring Queries from Simple and Partial Provenance Examples