Source author record

Giulia Fanti

Giulia Fanti appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Machine Learning Networking and Internet Architecture Distributed, Parallel, and Cluster Computing Social and Information Networks Artificial Intelligence Computer Science and Game Theory Data Structures and Algorithms eess.SY Information Theory math.IT math.OC math.PR Multiagent Systems Systems and Control

Catalog footprint

What is connected

16works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Smooth Partial Lotteries for Stable Randomized Selection

Competitive selection processes, from scientific funding to admissions and hiring, use evaluations to score candidates, and eventually choose a subset of them based on those scores. Recently, many organizations have adopted partial lotteries, which randomize selection based on evaluation scores. However, existing lottery designs are inherently unstable, as a small change to a single candidate's score can cause large shifts in their selection probabilities. This instability undermines a key goal of lotteries: reducing the influence of fine-grained score distinctions near the decision boundary. We propose smoothness as a design principle for partial lotteries, formalizing it as a Lipschitz condition on the mapping from review scores over candidates to selection probabilities. We introduce the Clipped Linear Lottery, a simple mechanism in which selection probabilities scale linearly with estimated quality between an upper threshold, above which we always accept, and a lower threshold, below which we always reject. We prove that the Clipped Linear Lottery's worst-case regret matches a lower bound for any smooth selection rule up to a factor of $(1 - k/n)$, where $k/n$ is the acceptance rate. We compare smooth selection to other stability notions like Individual Fairness and Differential Privacy, showing that the Clipped Linear Lottery achieves a better smoothness-regret tradeoff than alternatives. Experiments on real peer review data from ICLR 2025, NeurIPS 2024, and the Swiss National Science Foundation demonstrate that existing lottery designs are highly unstable in practice even under perturbations to a single score. Our experiments also confirm the tightness of our theoretical analysis and show that our proposed Clipped Linear Lottery achieves a better smoothness-utility tradeoff than alternatives in practice.

preprint2022arXiv

Distance-Aware Private Set Intersection

Private set intersection (PSI) allows two mutually untrusting parties to compute an intersection of their sets, without revealing information about items that are not in the intersection. This work introduces a PSI variant called distance-aware PSI (DA-PSI) for sets whose elements lie in a metric space. DA-PSI returns pairs of items that are within a specified distance threshold of each other. This paper puts forward DA-PSI constructions for two metric spaces: (i) Minkowski distance of order 1 over the set of integers (i.e., for integers $a$ and $b$, their distance is $|a-b|$); and (ii) Hamming distance over the set of binary strings of length $\ell$. In the Minkowski DA-PSI protocol, the communication complexity scales logarithmically in the distance threshold and linearly in the set size. In the Hamming DA-PSI protocol, the communication volume scales quadratically in the distance threshold and is independent of the dimensionality of string length $\ell$. Experimental results with real applications confirm that DA-PSI provides more effective matching at lower cost than naive solutions.

preprint2022arXiv

Locally Differentially Private Sparse Vector Aggregation

Vector mean estimation is a central primitive in federated analytics. In vector mean estimation, each user $i \in [n]$ holds a real-valued vector $v_i\in [-1, 1]^d$, and a server wants to estimate the mean of all $n$ vectors. Not only so, we would like to protect each individual user's privacy. In this paper, we consider the $k$-sparse version of the vector mean estimation problem, that is, suppose that each user's vector has at most $k$ non-zero coordinates in its $d$-dimensional vector, and moreover, $k \ll d$. In practice, since the universe size $d$ can be very large (e.g., the space of all possible URLs), we would like the per-user communication to be succinct, i.e., independent of or (poly-)logarithmic in the universe size. In this paper, we are the first to show matching upper- and lower-bounds for the $k$-sparse vector mean estimation problem under local differential privacy. Specifically, we construct new mechanisms that achieve asymptotically optimal error as well as succinct communication, either under user-level-LDP or event-level-LDP. We implement our algorithms and evaluate them on synthetic as well as real-world datasets. Our experiments show that we can often achieve one or two orders of magnitude reduction in error in comparison with prior works under typical choices of parameters, while incurring insignificant communication cost.

preprint2022arXiv

On the Privacy Properties of GAN-generated Samples

The privacy implications of generative adversarial networks (GANs) are a topic of great interest, leading to several recent algorithms for training GANs with privacy guarantees. By drawing connections to the generalization properties of GANs, we prove that under some assumptions, GAN-generated samples inherently satisfy some (weak) privacy guarantees. First, we show that if a GAN is trained on m samples and used to generate n samples, the generated samples are (epsilon, delta)-differentially-private for (epsilon, delta) pairs where delta scales as O(n/m). We show that under some special conditions, this upper bound is tight. Next, we study the robustness of GAN-generated samples to membership inference attacks. We model membership inference as a hypothesis test in which the adversary must determine whether a given sample was drawn from the training dataset or from the underlying data distribution. We show that this adversary can achieve an area under the ROC curve that scales no better than O(m^{-1/4}).

preprint2022arXiv

RareGAN: Generating Samples for Rare Classes

We study the problem of learning generative adversarial networks (GANs) for a rare class of an unlabeled dataset subject to a labeling budget. This problem is motivated from practical applications in domains including security (e.g., synthesizing packets for DNS amplification attacks), systems and networking (e.g., synthesizing workloads that trigger high resource usage), and machine learning (e.g., generating images from a rare class). Existing approaches are unsuitable, either requiring fully-labeled datasets or sacrificing the fidelity of the rare class for that of the common classes. We propose RareGAN, a novel synthesis of three key ideas: (1) extending conditional GANs to use labelled and unlabelled data for better generalization; (2) an active learning approach that requests the most useful labels; and (3) a weighted loss function to favor learning the rare class. We show that RareGAN achieves a better fidelity-diversity tradeoff on the rare class than prior work across different applications, budgets, rare class fractions, GAN losses, and architectures.

preprint2021arXiv

Efficient Algorithms for Federated Saddle Point Optimization

We consider strongly convex-concave minimax problems in the federated setting, where the communication constraint is the main bottleneck. When clients are arbitrarily heterogeneous, a simple Minibatch Mirror-prox achieves the best performance. As the clients become more homogeneous, using multiple local gradient updates at the clients significantly improves upon Minibatch Mirror-prox by communicating less frequently. Our goal is to design an algorithm that can harness the benefit of similarity in the clients while recovering the Minibatch Mirror-prox performance under arbitrary heterogeneity (up to log factors). We give the first federated minimax optimization algorithm that achieves this goal. The main idea is to combine (i) SCAFFOLD (an algorithm that performs variance reduction across clients for convex optimization) to erase the worst-case dependency on heterogeneity and (ii) Catalyst (a framework for acceleration based on modifying the objective) to accelerate convergence without amplifying client drift. We prove that this algorithm achieves our goal, and include experiments to validate the theory.

preprint2021arXiv

Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

Limited data access is a longstanding barrier to data-driven research and development in the networked systems community. In this work, we explore if and how generative adversarial networks (GANs) can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge. As a specific target, our focus in this paper is on time series datasets with metadata (e.g., packet loss rate measurements with corresponding ISPs). We identify key challenges of existing GAN approaches for such workloads with respect to fidelity (e.g., long-term dependencies, complex multidimensional relationships, mode collapse) and privacy (i.e., existing guarantees are poorly understood and can sacrifice fidelity). To improve fidelity, we design a custom workflow called DoppelGANger (DG) and demonstrate that across diverse real-world datasets (e.g., bandwidth measurements, cluster requests, web sessions) and use cases (e.g., structural characterization, predictive modeling, algorithm comparison), DG achieves up to 43% better fidelity than baseline models. Although we do not resolve the privacy problem in this work, we identify fundamental challenges with both classical notions of privacy and recent advances to improve the privacy properties of GANs, and suggest a potential roadmap for addressing these challenges. By shedding light on the promise and challenges, we hope our work can rekindle the conversation on workflows for data sharing.

preprint2020arXiv

High Throughput Cryptocurrency Routing in Payment Channel Networks

Despite growing adoption of cryptocurrencies, making fast payments at scale remains a challenge. Payment channel networks (PCNs) such as the Lightning Network have emerged as a viable scaling solution. However, completing payments on PCNs is challenging: payments must be routed on paths with sufficient funds. As payments flow over a single channel (link) in the same direction, the channel eventually becomes depleted and cannot support further payments in that direction; hence, naive routing schemes like shortest-path routing can deplete key payment channels and paralyze the system. Today's PCNs also route payments atomically, worsening the problem. In this paper, we present Spider, a routing solution that "packetizes" transactions and uses a multi-path transport protocol to achieve high-throughput routing in PCNs. Packetization allows Spider to complete even large transactions on low-capacity payment channels over time, while the multi-path congestion control protocol ensures balanced utilization of channels and fairness across flows. Extensive simulations comparing Spider with state-of-the-art approaches shows that Spider requires less than 25% of the funds to successfully route over 95% of transactions on balanced traffic demands, and offloads 4x more transactions onto the PCN on imbalanced demands.

preprint2020arXiv

InfoGAN-CR and ModelCentrality: Self-supervised Model Training and Selection for Disentangling GANs

Disentangled generative models map a latent code vector to a target space, while enforcing that a subset of the learned latent codes are interpretable and associated with distinct properties of the target distribution. Recent advances have been dominated by Variational AutoEncoder (VAE)-based methods, while training disentangled generative adversarial networks (GANs) remains challenging. In this work, we show that the dominant challenges facing disentangled GANs can be mitigated through the use of self-supervision. We make two main contributions: first, we design a novel approach for training disentangled GANs with self-supervision. We propose contrastive regularizer, which is inspired by a natural notion of disentanglement: latent traversal. This achieves higher disentanglement scores than state-of-the-art VAE- and GAN-based approaches. Second, we propose an unsupervised model selection scheme called ModelCentrality, which uses generated synthetic samples to compute the medoid (multi-dimensional generalization of median) of a collection of models. The current common practice of hyper-parameter tuning requires using ground-truths samples, each labelled with known perfect disentangled latent codes. As real datasets are not equipped with such labels, we propose an unsupervised model selection scheme and show that it finds a model close to the best one, for both VAEs and GANs. Combining contrastive regularization with ModelCentrality, we improve upon the state-of-the-art disentanglement scores significantly, without accessing the supervised data.

preprint2020arXiv

SquirRL: Automating Attack Analysis on Blockchain Incentive Mechanisms with Deep Reinforcement Learning

Incentive mechanisms are central to the functionality of permissionless blockchains: they incentivize participants to run and secure the underlying consensus protocol. Designing incentive-compatible incentive mechanisms is notoriously challenging, however. As a result, most public blockchains today use incentive mechanisms whose security properties are poorly understood and largely untested. In this work, we propose SquirRL, a framework for using deep reinforcement learning to analyze attacks on blockchain incentive mechanisms. We demonstrate SquirRL's power by first recovering known attacks: (1) the optimal selfish mining attack in Bitcoin [52], and (2) the Nash equilibrium in block withholding attacks [16]. We also use SquirRL to obtain several novel empirical results. First, we discover a counterintuitive flaw in the widely used rushing adversary model when applied to multi-agent Markov games with incomplete information. Second, we demonstrate that the optimal selfish mining strategy identified in [52] is actually not a Nash equilibrium in the multi-agent selfish mining setting. In fact, our results suggest (but do not prove) that when more than two competing agents engage in selfish mining, there is no profitable Nash equilibrium. This is consistent with the lack of observed selfish mining in the wild. Third, we find a novel attack on a simplified version of Ethereum's finalization mechanism, Casper the Friendly Finality Gadget (FFG) that allows a strategic agent to amplify her rewards by up to 30%. Notably, [10] show that honest voting is a Nash equilibrium in Casper FFG: our attack shows that when Casper FFG is composed with selfish mining, this is no longer the case. Altogether, our experiments demonstrate SquirRL's flexibility and promise as a framework for studying attack settings that have thus far eluded theoretical and empirical understanding.

preprint2019arXiv

Communication cost of consensus for nodes with limited memory

Motivated by applications in blockchains and sensor networks, we consider a model of $n$ nodes trying to reach consensus on their majority bit. Each node $i$ is assigned a bit at time zero, and is a finite automaton with $m$ bits of memory (i.e., $2^m$ states) and a Poisson clock. When the clock of $i$ rings, $i$ can choose to communicate, and is then matched to a uniformly chosen node $j$. The nodes $j$ and $i$ may update their states based on the state of the other node. Previous work has focused on minimizing the time to consensus and the probability of error, while our goal is minimizing the number of communications. We show that when $m>3 \log\log\log(n)$, consensus can be reached at linear communication cost, but this is impossible if $m<\log\log\log(n)$. We also study a synchronous variant of the model, where our upper and lower bounds on $m$ for achieving linear communication cost are $2\log\log\log(n)$ and $\log\log\log(n)$, respectively. A key step is to distinguish when nodes can become aware of knowing the majority bit and stop communicating. We show that this is impossible if their memory is too low.

preprint2016arXiv

Hiding the Rumor Source

Anonymous social media platforms like Secret, Yik Yak, and Whisper have emerged as important tools for sharing ideas without the fear of judgment. Such anonymous platforms are also important in nations under authoritarian rule, where freedom of expression and the personal safety of message authors may depend on anonymity. Whether for fear of judgment or retribution, it is sometimes crucial to hide the identities of users who post sensitive messages. In this paper, we consider a global adversary who wishes to identify the author of a message; it observes either a snapshot of the spread of a message at a certain time, sampled timestamp metadata, or both. Recent advances in rumor source detection show that existing messaging protocols are vulnerable against such an adversary. We introduce a novel messaging protocol, which we call adaptive diffusion, and show that under the snapshot adversarial model, adaptive diffusion spreads content fast and achieves perfect obfuscation of the source when the underlying contact network is an infinite regular tree. That is, all users with the message are nearly equally likely to have been the origin of the message. When the contact network is an irregular tree, we characterize the probability of maximum likelihood detection by proving a concentration result over Galton-Watson trees. Experiments on a sampled Facebook network demonstrate that adaptive diffusion effectively hides the location of the source even when the graph is finite, irregular and has cycles.

preprint2016arXiv

Rangzen: Anonymously Getting the Word Out in a Blackout

In recent years governments have shown themselves willing to impose blackouts to shut off key communication infrastructure during times of civil strife, and to surveil citizen communications whenever possible. However, it is exactly during such strife that citizens need reliable and anonymous communications the most. In this paper, we present Rangzen, a system for anonymous broadcast messaging during network blackouts. Rangzen is distinctive in both aim and design. Our aim is to provide an anonymous, one-to-many messaging layer that requires only users' smartphones and can withstand network-level attacks. Our design is a delay-tolerant mesh network which deprioritizes adversarial messages by means of a social graph while preserving user anonymity. We built a complete implementation that runs on Android smartphones, present benchmarks of its performance and battery usage, and present simulation results suggesting Rangzen's efficacy at scale.

preprint2015arXiv

Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries

Techniques based on randomized response enable the collection of potentially sensitive data from clients in a privacy-preserving manner with strong local differential privacy guarantees. One of the latest such technologies, RAPPOR, allows the marginal frequencies of an arbitrary set of strings to be estimated via privacy-preserving crowdsourcing. However, this original estimation process requires a known set of possible strings; in practice, this dictionary can often be extremely large and sometimes completely unknown. In this paper, we propose a novel decoding algorithm for the RAPPOR mechanism that enables the estimation of "unknown unknowns," i.e., strings we do not even know we should be estimating. To enable learning without explicit knowledge of the dictionary, we develop methodology for estimating the joint distribution of two or more variables collected with RAPPOR. This is a critical step towards understanding relationships between multiple variables collected in a privacy-preserving manner.

preprint2015arXiv

Spy vs. Spy: Rumor Source Obfuscation

Anonymous messaging platforms, such as Secret and Whisper, have emerged as important social media for sharing one's thoughts without the fear of being judged by friends, family, or the public. Further, such anonymous platforms are crucial in nations with authoritarian governments; the right to free expression and sometimes the personal safety of the author of the message depend on anonymity. Whether for fear of judgment or personal endangerment, it is crucial to keep anonymous the identity of the user who initially posted a sensitive message. In this paper, we consider an adversary who observes a snapshot of the spread of a message at a certain time. Recent advances in rumor source detection shows that the existing messaging protocols are vulnerable against such an adversary. We introduce a novel messaging protocol, which we call adaptive diffusion, and show that it spreads the messages fast and achieves a perfect obfuscation of the source when the underlying contact network is an infinite regular tree: all users with the message are nearly equally likely to have been the origin of the message. Experiments on a sampled Facebook network show that it effectively hides the location of the source even when the graph is finite, irregular and has cycles. We further consider a stronger adversarial model where a subset of colluding users track the reception of messages. We show that the adaptive diffusion provides a strong protection of the anonymity of the source even under this scenario.

preprint2012arXiv

Semi-Definite Programming Relaxation for Non-Line-of-Sight Localization

We consider the problem of estimating the locations of a set of points in a k-dimensional euclidean space given a subset of the pairwise distance measurements between the points. We focus on the case when some fraction of these measurements can be arbitrarily corrupted by large additive noise. Given that the problem is highly non-convex, we propose a simple semidefinite programming relaxation that can be efficiently solved using standard algorithms. We define a notion of non-contractibility and show that the relaxation gives the exact point locations when the underlying graph is non-contractible. The performance of the algorithm is evaluated on an experimental data set obtained from a network of 44 nodes in an indoor environment and is shown to be robust to non-line-of-sight errors.

Giulia Fanti

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

Smooth Partial Lotteries for Stable Randomized Selection

Distance-Aware Private Set Intersection

Locally Differentially Private Sparse Vector Aggregation

On the Privacy Properties of GAN-generated Samples

RareGAN: Generating Samples for Rare Classes

Efficient Algorithms for Federated Saddle Point Optimization

Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

High Throughput Cryptocurrency Routing in Payment Channel Networks

InfoGAN-CR and ModelCentrality: Self-supervised Model Training and Selection for Disentangling GANs

SquirRL: Automating Attack Analysis on Blockchain Incentive Mechanisms with Deep Reinforcement Learning

Communication cost of consensus for nodes with limited memory

Hiding the Rumor Source

Rangzen: Anonymously Getting the Word Out in a Blackout

Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries

Spy vs. Spy: Rumor Source Obfuscation

Semi-Definite Programming Relaxation for Non-Line-of-Sight Localization