Researcher profile

Nadia Fawaz

Nadia Fawaz contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2015arXiv

Sequential Relevance Maximization with Binary Feedback

Motivated by online settings where users can provide explicit feedback about the relevance of products that are sequentially presented to them, we look at the recommendation process as a problem of dynamically optimizing this relevance feedback. Such an algorithm optimizes the fine tradeoff between presenting the products that are most likely to be relevant, and learning the preferences of the user so that more relevant recommendations can be made in the future. We assume a standard predictive model inspired by collaborative filtering, in which a user is sampled from a distribution over a set of possible types. For every product category, each type has an associated relevance feedback that is assumed to be binary: the category is either relevant or irrelevant. Assuming that the user stays for each additional recommendation opportunity with probability $β$ independent of the past, the problem is to find a policy that maximizes the expected number of recommendations that are deemed relevant in a session. We analyze this problem and prove key structural properties of the optimal policy. Based on these properties, we first present an algorithm that strikes a balance between recursion and dynamic programming to compute this policy. We further propose and analyze two heuristic policies: a `farsighted' greedy policy that attains at least $1-β$ factor of the optimal payoff, and a naive greedy policy that attains at least $\frac{1-β}{1+β}$ factor of the optimal payoff in the worst case. Extensive simulations show that these heuristics are very close to optimal in practice.

preprint2014arXiv

From the Information Bottleneck to the Privacy Funnel

We focus on the privacy-utility trade-off encountered by users who wish to disclose some information to an analyst, that is correlated with their private data, in the hope of receiving some utility. We rely on a general privacy statistical inference framework, under which data is transformed before it is disclosed, according to a probabilistic privacy mapping. We show that when the log-loss is introduced in this framework in both the privacy metric and the distortion metric, the privacy leakage and the utility constraint can be reduced to the mutual information between private data and disclosed data, and between non-private data and disclosed data respectively. We justify the relevance and generality of the privacy metric under the log-loss by proving that the inference threat under any bounded cost function can be upper-bounded by an explicit function of the mutual information between private data and disclosed data. We then show that the privacy-utility tradeoff under the log-loss can be cast as the non-convex Privacy Funnel optimization, and we leverage its connection to the Information Bottleneck, to provide a greedy algorithm that is locally optimal. We evaluate its performance on the US census dataset.

preprint2014arXiv

Guess Who Rated This Movie: Identifying Users Through Subspace Clustering

It is often the case that, within an online recommender system, multiple users share a common account. Can such shared accounts be identified solely on the basis of the userprovided ratings? Once a shared account is identified, can the different users sharing it be identified as well? Whenever such user identification is feasible, it opens the way to possible improvements in personalized recommendations, but also raises privacy concerns. We develop a model for composite accounts based on unions of linear subspaces, and use subspace clustering for carrying out the identification task. We show that a significant fraction of such accounts is identifiable in a reliable manner, and illustrate potential uses for personalized recommendation.

preprint2014arXiv

Privacy Tradeoffs in Predictive Analytics

Online services routinely mine user data to predict user preferences, make recommendations, and place targeted ads. Recent research has demonstrated that several private user attributes (such as political affiliation, sexual orientation, and gender) can be inferred from such data. Can a privacy-conscious user benefit from personalization while simultaneously protecting her private attributes? We study this question in the context of a rating prediction service based on matrix factorization. We construct a protocol of interactions between the service and users that has remarkable optimality properties: it is privacy-preserving, in that no inference algorithm can succeed in inferring a user's private attribute with a probability better than random guessing; it has maximal accuracy, in that no other privacy-preserving protocol improves rating prediction; and, finally, it involves a minimal disclosure, as the prediction accuracy strictly decreases when the service reveals less information. We extensively evaluate our protocol using several rating datasets, demonstrating that it successfully blocks the inference of gender, age and political affiliation, while incurring less than 5% decrease in the accuracy of rating prediction.

preprint2013arXiv

Nearly Optimal Private Convolution

We study computing the convolution of a private input $x$ with a public input $h$, while satisfying the guarantees of $(ε, δ)$-differential privacy. Convolution is a fundamental operation, intimately related to Fourier Transforms. In our setting, the private input may represent a time series of sensitive events or a histogram of a database of confidential personal information. Convolution then captures important primitives including linear filtering, which is an essential tool in time series analysis, and aggregation queries on projections of the data. We give a nearly optimal algorithm for computing convolutions while satisfying $(ε, δ)$-differential privacy. Surprisingly, we follow the simple strategy of adding independent Laplacian noise to each Fourier coefficient and bounding the privacy loss using the composition theorem of Dwork, Rothblum, and Vadhan. We derive a closed form expression for the optimal noise to add to each Fourier coefficient using convex programming duality. Our algorithm is very efficient -- it is essentially no more computationally expensive than a Fast Fourier Transform. To prove near optimality, we use the recent discrepancy lowerbounds of Muthukrishnan and Nikolov and derive a spectral lower bound using a characterization of discrepancy in terms of determinants.

preprint2012arXiv

Guess Who Rated This Movie: Identifying Users Through Subspace Clustering

It is often the case that, within an online recommender system, multiple users share a common account. Can such shared accounts be identified solely on the basis of the user- provided ratings? Once a shared account is identified, can the different users sharing it be identified as well? Whenever such user identification is feasible, it opens the way to possible improvements in personalized recommendations, but also raises privacy concerns. We develop a model for composite accounts based on unions of linear subspaces, and use subspace clustering for carrying out the identification task. We show that a significant fraction of such accounts is identifiable in a reliable manner, and illustrate potential uses for personalized recommendation.

preprint2012arXiv

Identifying Users From Their Rating Patterns

This paper reports on our analysis of the 2011 CAMRa Challenge dataset (Track 2) for context-aware movie recommendation systems. The train dataset comprises 4,536,891 ratings provided by 171,670 users on 23,974$ movies, as well as the household groupings of a subset of the users. The test dataset comprises 5,450 ratings for which the user label is missing, but the household label is provided. The challenge required to identify the user labels for the ratings in the test set. Our main finding is that temporal information (time labels of the ratings) is significantly more useful for achieving this objective than the user preferences (the actual ratings). Using a model that leverages on this fact, we are able to identify users within a known household with an accuracy of approximately 96% (i.e. misclassification rate around 4%).

preprint2012arXiv

Privacy Against Statistical Inference

We propose a general statistical inference framework to capture the privacy threat incurred by a user that releases data to a passive but curious adversary, given utility constraints. We show that applying this general framework to the setting where the adversary uses the self-information cost function naturally leads to a non-asymptotic information-theoretic approach for characterizing the best achievable privacy subject to utility constraints. Based on these results we introduce two privacy metrics, namely average information leakage and maximum information leakage. We prove that under both metrics the resulting design problem of finding the optimal mapping from the user's data to a privacy-preserving output can be cast as a modified rate-distortion problem which, in turn, can be formulated as a convex program. Finally, we compare our framework with differential privacy.

preprint2012arXiv

Privacy Auctions for Recommender Systems

We study a market for private data in which a data analyst publicly releases a statistic over a database of private information. Individuals that own the data incur a cost for their loss of privacy proportional to the differential privacy guarantee given by the analyst at the time of the release. The analyst incentivizes individuals by compensating them, giving rise to a \emph{privacy auction}. Motivated by recommender systems, the statistic we consider is a linear predictor function with publicly known weights. The statistic can be viewed as a prediction of the unknown data of a new individual, based on the data of individuals in the database. We formalize the trade-off between privacy and accuracy in this setting, and show that a simple class of estimates achieves an order-optimal trade-off. It thus suffices to focus on auction mechanisms that output such estimates. We use this observation to design a truthful, individually rational, proportional-purchase mechanism under a fixed budget constraint. We show that our mechanism is 5-approximate in terms of accuracy compared to the optimal mechanism, and that no truthful mechanism can achieve a $2-\varepsilon$ approximation, for any $\varepsilon > 0$.

preprint2012arXiv

Private Decayed Sum Estimation under Continual Observation

In monitoring applications, recent data is more important than distant data. How does this affect privacy of data analysis? We study a general class of data analyses - computing predicate sums - with privacy. Formally, we study the problem of estimating predicate sums {\em privately}, for sliding windows (and other well-known decay models of data, i.e. exponential and polynomial decay). We extend the recently proposed continual privacy model of Dwork et al. We present algorithms for decayed sum which are $\eps$-differentially private, and are accurate. For window and exponential decay sums, our algorithms are accurate up to additive $1/\eps$ and polylog terms in the range of the computed function; for polynomial decay sums which are technically more challenging because partial solutions do not compose easily, our algorithms incur additional relative error. Further, we show lower bounds, tight within polylog factors and tight with respect to the dependence on the probability of error.

preprint2012arXiv

Reducibility of joint relay positioning and flow optimization problem

This paper shows how to reduce the otherwise hard joint relay positioning and flow optimization problem into a sequence a two simpler decoupled problems. We consider a class of wireless multicast hypergraphs mainly characterized by their hyperarc rate functions, that are increasing and convex in power, and decreasing in distance between the transmit node and the farthest end node of the hyperarc. The set-up consists of a single multicast flow session involving a source, multiple destinations and a relay that can be positioned freely. The first problem formulates the relay positioning problem in a purely geometric sense, and once the optimal relay position is obtained the second problem addresses the flow optimization. Furthermore, we present simple and efficient algorithms to solve these problems.

preprint2011arXiv

On the geometry of wireless network multicast in 2-D

We provide a geometric solution to the problem of optimal relay positioning to maximize the multicast rate for low-SNR networks. The networks we consider, consist of a single source, multiple receivers and the only intermediate and locatable node as the relay. We construct network the hypergraph of the system nodes from the underlying information theoretic model of low-SNR regime that operates using superposition coding and FDMA in conjunction (which we call the "achievable hypergraph model"). We make the following contributions. 1) We show that the problem of optimal relay positioning maximizing the multicast rate can be completely decoupled from the flow optimization by noticing and exploiting geometric properties of multicast flow. 2) All the flow maximizing the multicast rate is sent over at most two paths, in succession. The relay position is dependent only on one path (out of the two), irrespective of the number of receiver nodes in the system. Subsequently, we propose simple and efficient geometric algorithms to compute the optimal relay position. 3) Finally, we show that in our model at the optimal relay position, the difference between the maximized multicast rate and the cut-set bound is minimum. We solve the problem for all (Ps,Pr) pairs of source and relay transmit powers and the path loss exponent αgreater than 2.

preprint2011arXiv

Optimal relay location and power allocation for low SNR broadcast relay channels

We consider the broadcast relay channel (BRC), where a single source transmits to multiple destinations with the help of a relay, in the limit of a large bandwidth. We address the problem of optimal relay positioning and power allocations at source and relay, to maximize the multicast rate from source to all destinations. To solve such a network planning problem, we develop a three-faceted approach based on an underlying information theoretic model, computational geometric aspects, and network optimization tools. Firstly, assuming superposition coding and frequency division between the source and the relay, the information theoretic framework yields a hypergraph model of the wideband BRC, which captures the dependency of achievable rate-tuples on the network topology. As the relay position varies, so does the set of hyperarcs constituting the hypergraph, rendering the combinatorial nature of optimization problem. We show that the convex hull C of all nodes in the 2-D plane can be divided into disjoint regions corresponding to distinct hyperarcs sets. These sets are obtained by superimposing all k-th order Voronoi tessellation of C. We propose an easy and efficient algorithm to compute all hyperarc sets, and prove they are polynomially bounded. Using the switched hypergraph approach, we model the original problem as a continuous yet non-convex network optimization program. Ultimately, availing on the techniques of geometric programming and $p$-norm surrogate approximation, we derive a good convex approximation. We provide a detailed characterization of the problem for collinearly located destinations, and then give a generalization for arbitrarily located destinations. Finally, we show strong gains for the optimal relay positioning compared to seemingly interesting positions.