Source author record

Nadia Fawaz

Nadia Fawaz appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Machine Learning Cryptography and Security Data Structures and Algorithms Information Retrieval Networking and Internet Architecture Artificial Intelligence Computer Science and Game Theory

Catalog footprint

What is connected

17works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2015arXiv

Sequential Relevance Maximization with Binary Feedback

Motivated by online settings where users can provide explicit feedback about the relevance of products that are sequentially presented to them, we look at the recommendation process as a problem of dynamically optimizing this relevance feedback. Such an algorithm optimizes the fine tradeoff between presenting the products that are most likely to be relevant, and learning the preferences of the user so that more relevant recommendations can be made in the future. We assume a standard predictive model inspired by collaborative filtering, in which a user is sampled from a distribution over a set of possible types. For every product category, each type has an associated relevance feedback that is assumed to be binary: the category is either relevant or irrelevant. Assuming that the user stays for each additional recommendation opportunity with probability $β$ independent of the past, the problem is to find a policy that maximizes the expected number of recommendations that are deemed relevant in a session. We analyze this problem and prove key structural properties of the optimal policy. Based on these properties, we first present an algorithm that strikes a balance between recursion and dynamic programming to compute this policy. We further propose and analyze two heuristic policies: a `farsighted' greedy policy that attains at least $1-β$ factor of the optimal payoff, and a naive greedy policy that attains at least $\frac{1-β}{1+β}$ factor of the optimal payoff in the worst case. Extensive simulations show that these heuristics are very close to optimal in practice.

preprint2014arXiv

From the Information Bottleneck to the Privacy Funnel

We focus on the privacy-utility trade-off encountered by users who wish to disclose some information to an analyst, that is correlated with their private data, in the hope of receiving some utility. We rely on a general privacy statistical inference framework, under which data is transformed before it is disclosed, according to a probabilistic privacy mapping. We show that when the log-loss is introduced in this framework in both the privacy metric and the distortion metric, the privacy leakage and the utility constraint can be reduced to the mutual information between private data and disclosed data, and between non-private data and disclosed data respectively. We justify the relevance and generality of the privacy metric under the log-loss by proving that the inference threat under any bounded cost function can be upper-bounded by an explicit function of the mutual information between private data and disclosed data. We then show that the privacy-utility tradeoff under the log-loss can be cast as the non-convex Privacy Funnel optimization, and we leverage its connection to the Information Bottleneck, to provide a greedy algorithm that is locally optimal. We evaluate its performance on the US census dataset.

preprint2014arXiv

Guess Who Rated This Movie: Identifying Users Through Subspace Clustering

It is often the case that, within an online recommender system, multiple users share a common account. Can such shared accounts be identified solely on the basis of the userprovided ratings? Once a shared account is identified, can the different users sharing it be identified as well? Whenever such user identification is feasible, it opens the way to possible improvements in personalized recommendations, but also raises privacy concerns. We develop a model for composite accounts based on unions of linear subspaces, and use subspace clustering for carrying out the identification task. We show that a significant fraction of such accounts is identifiable in a reliable manner, and illustrate potential uses for personalized recommendation.

preprint2014arXiv

Managing your Private and Public Data: Bringing down Inference Attacks against your Privacy

We propose a practical methodology to protect a user's private data, when he wishes to publicly release data that is correlated with his private data, in the hope of getting some utility. Our approach relies on a general statistical inference framework that captures the privacy threat under inference attacks, given utility constraints. Under this framework, data is distorted before it is released, according to a privacy-preserving probabilistic mapping. This mapping is obtained by solving a convex optimization problem, which minimizes information leakage under a distortion constraint. We address practical challenges encountered when applying this theoretical framework to real world data. On one hand, the design of optimal privacy-preserving mechanisms requires knowledge of the prior distribution linking private data and data to be released, which is often unavailable in practice. On the other hand, the optimization may become untractable and face scalability issues when data assumes values in large size alphabets, or is high dimensional. Our work makes three major contributions. First, we provide bounds on the impact on the privacy-utility tradeoff of a mismatched prior. Second, we show how to reduce the optimization size by introducing a quantization step, and how to generate privacy mappings under quantization. Third, we evaluate our method on three datasets, including a new dataset that we collected, showing correlations between political convictions and TV viewing habits. We demonstrate that good privacy properties can be achieved with limited distortion so as not to undermine the original purpose of the publicly released data, e.g. recommendations.

preprint2014arXiv

Privacy Tradeoffs in Predictive Analytics

Online services routinely mine user data to predict user preferences, make recommendations, and place targeted ads. Recent research has demonstrated that several private user attributes (such as political affiliation, sexual orientation, and gender) can be inferred from such data. Can a privacy-conscious user benefit from personalization while simultaneously protecting her private attributes? We study this question in the context of a rating prediction service based on matrix factorization. We construct a protocol of interactions between the service and users that has remarkable optimality properties: it is privacy-preserving, in that no inference algorithm can succeed in inferring a user's private attribute with a probability better than random guessing; it has maximal accuracy, in that no other privacy-preserving protocol improves rating prediction; and, finally, it involves a minimal disclosure, as the prediction accuracy strictly decreases when the service reveals less information. We extensively evaluate our protocol using several rating datasets, demonstrating that it successfully blocks the inference of gender, age and political affiliation, while incurring less than 5% decrease in the accuracy of rating prediction.

preprint2013arXiv

Nearly Optimal Private Convolution

We study computing the convolution of a private input $x$ with a public input $h$, while satisfying the guarantees of $(ε, δ)$-differential privacy. Convolution is a fundamental operation, intimately related to Fourier Transforms. In our setting, the private input may represent a time series of sensitive events or a histogram of a database of confidential personal information. Convolution then captures important primitives including linear filtering, which is an essential tool in time series analysis, and aggregation queries on projections of the data. We give a nearly optimal algorithm for computing convolutions while satisfying $(ε, δ)$-differential privacy. Surprisingly, we follow the simple strategy of adding independent Laplacian noise to each Fourier coefficient and bounding the privacy loss using the composition theorem of Dwork, Rothblum, and Vadhan. We derive a closed form expression for the optimal noise to add to each Fourier coefficient using convex programming duality. Our algorithm is very efficient -- it is essentially no more computationally expensive than a Fast Fourier Transform. To prove near optimality, we use the recent discrepancy lowerbounds of Muthukrishnan and Nikolov and derive a spectral lower bound using a characterization of discrepancy in terms of determinants.

preprint2012arXiv

Guess Who Rated This Movie: Identifying Users Through Subspace Clustering

It is often the case that, within an online recommender system, multiple users share a common account. Can such shared accounts be identified solely on the basis of the user- provided ratings? Once a shared account is identified, can the different users sharing it be identified as well? Whenever such user identification is feasible, it opens the way to possible improvements in personalized recommendations, but also raises privacy concerns. We develop a model for composite accounts based on unions of linear subspaces, and use subspace clustering for carrying out the identification task. We show that a significant fraction of such accounts is identifiable in a reliable manner, and illustrate potential uses for personalized recommendation.

preprint2012arXiv

Identifying Users From Their Rating Patterns

This paper reports on our analysis of the 2011 CAMRa Challenge dataset (Track 2) for context-aware movie recommendation systems. The train dataset comprises 4,536,891 ratings provided by 171,670 users on 23,974$ movies, as well as the household groupings of a subset of the users. The test dataset comprises 5,450 ratings for which the user label is missing, but the household label is provided. The challenge required to identify the user labels for the ratings in the test set. Our main finding is that temporal information (time labels of the ratings) is significantly more useful for achieving this objective than the user preferences (the actual ratings). Using a model that leverages on this fact, we are able to identify users within a known household with an accuracy of approximately 96% (i.e. misclassification rate around 4%).

preprint2012arXiv

Privacy Against Statistical Inference

We propose a general statistical inference framework to capture the privacy threat incurred by a user that releases data to a passive but curious adversary, given utility constraints. We show that applying this general framework to the setting where the adversary uses the self-information cost function naturally leads to a non-asymptotic information-theoretic approach for characterizing the best achievable privacy subject to utility constraints. Based on these results we introduce two privacy metrics, namely average information leakage and maximum information leakage. We prove that under both metrics the resulting design problem of finding the optimal mapping from the user's data to a privacy-preserving output can be cast as a modified rate-distortion problem which, in turn, can be formulated as a convex program. Finally, we compare our framework with differential privacy.

preprint2012arXiv

Privacy Auctions for Recommender Systems

We study a market for private data in which a data analyst publicly releases a statistic over a database of private information. Individuals that own the data incur a cost for their loss of privacy proportional to the differential privacy guarantee given by the analyst at the time of the release. The analyst incentivizes individuals by compensating them, giving rise to a \emph{privacy auction}. Motivated by recommender systems, the statistic we consider is a linear predictor function with publicly known weights. The statistic can be viewed as a prediction of the unknown data of a new individual, based on the data of individuals in the database. We formalize the trade-off between privacy and accuracy in this setting, and show that a simple class of estimates achieves an order-optimal trade-off. It thus suffices to focus on auction mechanisms that output such estimates. We use this observation to design a truthful, individually rational, proportional-purchase mechanism under a fixed budget constraint. We show that our mechanism is 5-approximate in terms of accuracy compared to the optimal mechanism, and that no truthful mechanism can achieve a $2-\varepsilon$ approximation, for any $\varepsilon > 0$.

preprint2012arXiv

Private Decayed Sum Estimation under Continual Observation

In monitoring applications, recent data is more important than distant data. How does this affect privacy of data analysis? We study a general class of data analyses - computing predicate sums - with privacy. Formally, we study the problem of estimating predicate sums {\em privately}, for sliding windows (and other well-known decay models of data, i.e. exponential and polynomial decay). We extend the recently proposed continual privacy model of Dwork et al. We present algorithms for decayed sum which are $\eps$-differentially private, and are accurate. For window and exponential decay sums, our algorithms are accurate up to additive $1/\eps$ and polylog terms in the range of the computed function; for polynomial decay sums which are technically more challenging because partial solutions do not compose easily, our algorithms incur additional relative error. Further, we show lower bounds, tight within polylog factors and tight with respect to the dependence on the probability of error.

preprint2012arXiv

Reducibility of joint relay positioning and flow optimization problem

This paper shows how to reduce the otherwise hard joint relay positioning and flow optimization problem into a sequence a two simpler decoupled problems. We consider a class of wireless multicast hypergraphs mainly characterized by their hyperarc rate functions, that are increasing and convex in power, and decreasing in distance between the transmit node and the farthest end node of the hyperarc. The set-up consists of a single multicast flow session involving a source, multiple destinations and a relay that can be positioned freely. The first problem formulates the relay positioning problem in a purely geometric sense, and once the optimal relay position is obtained the second problem addresses the flow optimization. Furthermore, we present simple and efficient algorithms to solve these problems.

preprint2011arXiv

On the geometry of wireless network multicast in 2-D

We provide a geometric solution to the problem of optimal relay positioning to maximize the multicast rate for low-SNR networks. The networks we consider, consist of a single source, multiple receivers and the only intermediate and locatable node as the relay. We construct network the hypergraph of the system nodes from the underlying information theoretic model of low-SNR regime that operates using superposition coding and FDMA in conjunction (which we call the "achievable hypergraph model"). We make the following contributions. 1) We show that the problem of optimal relay positioning maximizing the multicast rate can be completely decoupled from the flow optimization by noticing and exploiting geometric properties of multicast flow. 2) All the flow maximizing the multicast rate is sent over at most two paths, in succession. The relay position is dependent only on one path (out of the two), irrespective of the number of receiver nodes in the system. Subsequently, we propose simple and efficient geometric algorithms to compute the optimal relay position. 3) Finally, we show that in our model at the optimal relay position, the difference between the maximized multicast rate and the cut-set bound is minimum. We solve the problem for all (Ps,Pr) pairs of source and relay transmit powers and the path loss exponent αgreater than 2.

preprint2011arXiv

Optimal relay location and power allocation for low SNR broadcast relay channels

We consider the broadcast relay channel (BRC), where a single source transmits to multiple destinations with the help of a relay, in the limit of a large bandwidth. We address the problem of optimal relay positioning and power allocations at source and relay, to maximize the multicast rate from source to all destinations. To solve such a network planning problem, we develop a three-faceted approach based on an underlying information theoretic model, computational geometric aspects, and network optimization tools. Firstly, assuming superposition coding and frequency division between the source and the relay, the information theoretic framework yields a hypergraph model of the wideband BRC, which captures the dependency of achievable rate-tuples on the network topology. As the relay position varies, so does the set of hyperarcs constituting the hypergraph, rendering the combinatorial nature of optimization problem. We show that the convex hull C of all nodes in the 2-D plane can be divided into disjoint regions corresponding to distinct hyperarcs sets. These sets are obtained by superimposing all k-th order Voronoi tessellation of C. We propose an easy and efficient algorithm to compute all hyperarc sets, and prove they are polynomially bounded. Using the switched hypergraph approach, we model the original problem as a continuous yet non-convex network optimization program. Ultimately, availing on the techniques of geometric programming and $p$-norm surrogate approximation, we derive a good convex approximation. We provide a detailed characterization of the problem for collinearly located destinations, and then give a generalization for arbitrarily located destinations. Finally, we show strong gains for the optimal relay positioning compared to seemingly interesting positions.

preprint2010arXiv

On the Non-Coherent Wideband Multipath Fading Relay Channel

We investigate the multipath fading relay channel in the limit of a large bandwidth, and in the non-coherent setting, where the channel state is unknown to all terminals, including the relay and the destination. We propose a hypergraph model of the wideband multipath fading relay channel, and show that its min-cut is achieved by a non-coherent peaky frequency binning scheme. The so-obtained lower bound on the capacity of the wideband multipath fading relay channel turns out to coincide with the block-Markov lower bound on the capacity of the wideband frequency-division Gaussian (FD-AWGN) relay channel. In certain cases, this achievable rate also meets the cut-set upper-bound, and thus reaches the capacity of the non-coherent wideband multipath fading relay channel.

preprint2009arXiv

Asymptotic Capacity and Optimal Precoding in MIMO Multi-Hop Relay Networks

A multi-hop relaying system is analyzed where data sent by a multi-antenna source is relayed by successive multi-antenna relays until it reaches a multi-antenna destination. Assuming correlated fading at each hop, each relay receives a faded version of the signal from the previous level, performs linear precoding and retransmits it to the next level. Using free probability theory and assuming that the noise power at relaying levels-- but not at destination-- is negligible, the closed-form expression of the asymptotic instantaneous end-to-end mutual information is derived as the number of antennas at all levels grows large. The so-obtained deterministic expression is independent from the channel realizations while depending only on channel statistics. Moreover, it also serves as the asymptotic value of the average end-to-end mutual information. The optimal singular vectors of the precoding matrices that maximize the average mutual information with finite number of antennas at all levels are also provided. It turns out that the optimal precoding singular vectors are aligned to the eigenvectors of the channel correlation matrices. Thus they can be determined using only the known channel statistics. As the optimal precoding singular vectors are independent from the system size, they are also optimal in the asymptotic regime.

preprint2008arXiv

Asymptotic Capacity and Optimal Precoding Strategy of Multi-Level Precode & Forward in Correlated Channels

We analyze a multi-level MIMO relaying system where a multiple-antenna transmitter sends data to a multipleantenna receiver through several relay levels, also equipped with multiple antennas. Assuming correlated fading in each hop, each relay receives a faded version of the signal transmitted by the previous level, performs precoding on the received signal and retransmits it to the next level. Using free probability theory and assuming that the noise power at the relay levels - but not at the receiver - is negligible, a closed-form expression of the end-to-end asymptotic instantaneous mutual information is derived as the number of antennas in all levels grow large with the same rate. This asymptotic expression is shown to be independent from the channel realizations, to only depend on the channel statistics and to also serve as the asymptotic value of the end-to-end average mutual information. We also provide the optimal singular vectors of the precoding matrices that maximize the asymptotic mutual information : the optimal transmit directions represented by the singular vectors of the precoding matrices are aligned on the eigenvectors of the channel correlation matrices, therefore they can be determined only using the known statistics of the channel matrices and do not depend on a particular channel realization.

Nadia Fawaz

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Sequential Relevance Maximization with Binary Feedback

From the Information Bottleneck to the Privacy Funnel

Guess Who Rated This Movie: Identifying Users Through Subspace Clustering

Managing your Private and Public Data: Bringing down Inference Attacks against your Privacy

Privacy Tradeoffs in Predictive Analytics

Nearly Optimal Private Convolution

Guess Who Rated This Movie: Identifying Users Through Subspace Clustering

Identifying Users From Their Rating Patterns

Privacy Against Statistical Inference

Privacy Auctions for Recommender Systems

Private Decayed Sum Estimation under Continual Observation

Reducibility of joint relay positioning and flow optimization problem

On the geometry of wireless network multicast in 2-D

Optimal relay location and power allocation for low SNR broadcast relay channels

On the Non-Coherent Wideband Multipath Fading Relay Channel

Asymptotic Capacity and Optimal Precoding in MIMO Multi-Hop Relay Networks

Asymptotic Capacity and Optimal Precoding Strategy of Multi-Level Precode & Forward in Correlated Channels