Researcher profile

Mostafa Milani

Mostafa Milani contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
1topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2020arXiv

Diversifying Anonymized Data with Diversity Constraints

Recently introduced privacy legislation has aimed to restrict and control the amount of personal data published by companies and shared to third parties. Much of this real data is not only sensitive requiring anonymization, but also contains characteristic details from a variety of individuals. This diversity is desirable in many applications ranging from Web search to drug and product development. Unfortunately, data anonymization techniques have largely ignored diversity in its published result. This inadvertently propagates underlying bias in subsequent data analysis. We study the problem of finding a diverse anonymized data instance where diversity is measured via a set of diversity constraints. We formalize diversity constraints and study their foundations such as implication and satisfiability. We show that determining the existence of a diverse, anonymized instance can be done in PTIME, and we present a clustering-based algorithm. We conduct extensive experiments using real and synthetic data showing the effectiveness of our techniques, and improvement over existing baselines. Our work aligns with recent trends towards responsible data science by coupling diversity with privacy-preserving data publishing.

preprint2020arXiv

Facilitating SQL Query Composition and Analysis

Formulating efficient SQL queries requires several cycles of tuning and execution, particularly for inexperienced users. We examine methods that can accelerate and improve this interaction by providing insights about SQL queries prior to execution. We achieve this by predicting properties such as the query answer size, its run-time, and error class. Unlike existing approaches, our approach does not rely on any statistics from the database instance or query execution plans. This is particularly important in settings with limited access to the database instance. Our approach is based on using data-driven machine learning techniques that rely on large query workloads to model SQL queries and their properties. We evaluate the utility of neural network models and traditional machine learning models. We use two real-world query workloads: the Sloan Digital Sky Survey (SDSS) and the SQLShare query workload. Empirical results show that the neural network models are more accurate in predicting the query error class, achieving a higher F-measure on classes with fewer samples as well as performing better on other problems such as run-time and answer size prediction. These results are encouraging and confirm that SQL query workloads and data-driven machine learning methods can be leveraged to facilitate query composition and analysis.

preprint2020arXiv

Privacy-Aware Data Cleaning-as-a-Service (Extended Version)

Data cleaning is a pervasive problem for organizations as they try to reap value from their data. Recent advances in networking and cloud computing technology have fueled a new computing paradigm called Database-as-a-Service, where data management tasks are outsourced to large service providers. In this paper, we consider a Data Cleaning-as-a-Service model that allows a client to interact with a data cleaning provider who hosts curated, and sensitive data. We present PACAS: a Privacy-Aware data Cleaning-As-a-Service model that facilitates interaction between the parties with client query requests for data, and a service provider using a data pricing scheme that computes prices according to data sensitivity. We propose new extensions to the model to define generalized data repairs that obfuscate sensitive data to allow data sharing between the client and service provider. We present a new semantic distance measure to quantify the utility of such repairs, and we re-define the notion of consistency in the presence of generalized values. The PACAS model uses (X,Y,L)-anonymity that extends existing data publishing techniques to consider the semantics in the data while protecting sensitive values. Our evaluation over real data show that PACAS safeguards semantically related sensitive values, and provides lower repair errors compared to existing privacy-aware cleaning techniques.