Source author record

Anne-Marie Kermarrec

Anne-Marie Kermarrec appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Cryptography and Security Machine Learning Artificial Intelligence Computational Complexity Data Structures and Algorithms Discrete Mathematics Information Retrieval Information Theory math.CO math.IT Social and Information Networks

Catalog footprint

What is connected

11works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

The Evaluation Game: Beyond Static LLM Benchmarking

As jailbreaks, adversarially crafted inputs that bypass safety constraints, continue to be discovered in Large Language Models, practitioners increasingly rely on fine-tuning as a defensive strategy. Yet the theoretical foundations underlying this robustness fine-tuning remain underexplored. We introduce a game-theoretic framework in which the interaction between an evaluator (auditing the model for jailbreaks) and a trainer is formalized as a two-player game. A key feature of our approach is the use of group actions, a mathematical structure that captures symmetries and transformations, to formally represent data augmentation. The simplest non-trivial instance is the circle with cyclic translation groups, where we exhibit various regimes depending on the trainer's generalization range. Below a critical threshold, the evaluator maintains a constant miss ratio for linearly many rounds, whereas other settings can yield very different behaviors. We further provide empirical evidence supporting locality-dependence of the model: for the three model families we tested (Llama, Qwen and Mistral), we have significant evidence that fine-tuning on adversarial prompts induces only local generalization, with refusal rates on test examples highly correlated with the distance to the fine-tuning prompts. Our framework recasts the central object of adversarial evaluation: a benchmark is not a static set of prompts but an orbit under the evaluator's group action, and audit protocols that ignore trainer-side adaptation cannot distinguish a genuine fix from a memorized patch.

preprint2026arXiv

Your Neighbors Know: Leveraging Local Neighborhoods for Backdoor Detection in Decentralized Learning

Decentralized learning (DL) is an emerging machine learning paradigm where nodes collaboratively train models without a central server. However, the collaborative nature of DL makes it vulnerable to backdoor attacks, where a model is taught to behave normally on standard inputs while executing hidden, malicious actions when encountering data with specific triggers. Backdoor attacks in DL remain understudied and existing defenses often overlook DL constraints. We introduce Argus, a novel backdoor detection framework native to DL that requires neither a central coordinator nor prior knowledge of the trigger. In Argus, honest nodes locally analyze received model updates to identify potential backdoor triggers. Nodes then collectively share their triggers with their neighbors and use a structural similarity metric to separate true backdoors from false alarms induced by data heterogeneity. A key insight is that false positive triggers exhibit inconsistencies across participants while true positive ones show consistent patterns. Model updates that fail this collaborative test are rejected, and persistently malicious senders are eventually evicted. We provide the first theoretical convergence guarantees for a DL-specific backdoor detection mechanism, showing that filtering out suspicious model updates with high probability preserves a convergence rate comparable to standard DL. We implement and evaluate Argus on three standard datasets and against three state-of-the-art baselines. Across settings, Argus reduces attack success rates by up to 90 points compared to no defense, while preserving model utility within 5 percentage points of an omniscient oracle. Furthermore, the effectiveness of Argus compared to baselines improves as data heterogeneity increases.

preprint2022arXiv

TEE-based decentralized recommender systems: The raw data sharing redemption

Recommenders are central in many applications today. The most effective recommendation schemes, such as those based on collaborative filtering (CF), exploit similarities between user profiles to make recommendations, but potentially expose private data. Federated learning and decentralized learning systems address this by letting the data stay on user's machines to preserve privacy: each user performs the training on local data and only the model parameters are shared. However, sharing the model parameters across the network may still yield privacy breaches. In this paper, we present REX, the first enclave-based decentralized CF recommender. REX exploits Trusted execution environments (TEE), such as Intel software guard extensions (SGX), that provide shielded environments within the processor to improve convergence while preserving privacy. Firstly, REX enables raw data sharing, which ultimately speeds up convergence and reduces the network load. Secondly, REX fully preserves privacy. We analyze the impact of raw data sharing in both deep neural network (DNN) and matrix factorization (MF) recommenders and showcase the benefits of trusted environments in a full-fledged implementation of REX. Our experimental results demonstrate that through raw data sharing, REX significantly decreases the training time by 18.3x and the network load by 2 orders of magnitude over standard decentralized approaches that share only parameters, while fully protecting privacy by leveraging trustworthy hardware enclaves with very little overhead.

preprint2016arXiv

Bounds on the Voter Model in Dynamic Networks

In the voter model, each node of a graph has an opinion, and in every round each node chooses independently a random neighbour and adopts its opinion. We are interested in the consensus time, which is the first point in time where all nodes have the same opinion. We consider dynamic graphs in which the edges are rewired in every round (by an adversary) giving rise to the graph sequence $G_1, G_2, \dots $, where we assume that $G_i$ has conductance at least $ϕ_i$. We assume that the degrees of nodes don't change over time as one can show that the consensus time can become super-exponential otherwise. In the case of a sequence of $d$-regular graphs, we obtain asymptotically tight results. Even for some static graphs, such as the cycle, our results improve the state of the art. Here we show that the expected number of rounds until all nodes have the same opinion is bounded by $O(m/(d_{min} \cdot ϕ))$, for any graph with $m$ edges, conductance $ϕ$, and degrees at least $d_{min}$. In addition, we consider a biased dynamic voter model, where each opinion $i$ is associated with a probability $P_i$, and when a node chooses a neighbour with that opinion, it adopts opinion $i$ with probability $P_i$ (otherwise the node keeps its current opinion). We show for any regular dynamic graph, that if there is an $ε>0$ difference between the highest and second highest opinion probabilities, and at least $Ω(\log n)$ nodes have initially the opinion with the highest probability, then all nodes adopt w.h.p. that opinion. We obtain a bound on the convergences time, which becomes $O(\log n/ϕ)$ for static graphs.

preprint2015arXiv

Beyond One Third Byzantine Failures

The Byzantine agreement problem requires a set of $n$ processes to agree on a value sent by a transmitter, despite a subset of $b$ processes behaving in an arbitrary, i.e. Byzantine, manner and sending corrupted messages to all processes in the system. It is well known that the problem has a solution in a (an eventually) synchronous message passing distributed system iff the number of processes in the Byzantine subset is less than one third of the total number of processes, i.e. iff $n > 3b+1$. The rest of the processes are expected to be correct: they should never deviate from the algorithm assigned to them and send corrupted messages. But what if they still do? We show in this paper that it is possible to solve Byzantine agreement even if, beyond the $ b$ ($< n/3 $) Byzantine processes, some of the other processes also send corrupted messages, as long as they do not send them to all. More specifically, we generalize the classical Byzantine model and consider that Byzantine failures might be partial. In each communication step, some of the processes might send corrupted messages to a subset of the processes. This subset of processes - to which corrupted messages might be sent - could change over time. We compute the exact number of processes that can commit such faults, besides those that commit classical Byzantine failures, while still solving Byzantine agreement. We present a corresponding Byzantine agreement algorithm and prove its optimality by giving resilience and complexity bounds.

preprint2015arXiv

Heterogeneous Differential Privacy

The massive collection of personal data by personalization systems has rendered the preservation of privacy of individuals more and more difficult. Most of the proposed approaches to preserve privacy in personalization systems usually address this issue uniformly across users, thus ignoring the fact that users have different privacy attitudes and expectations (even among their own personal data). In this paper, we propose to account for this non-uniformity of privacy expectations by introducing the concept of heterogeneous differential privacy. This notion captures both the variation of privacy expectations among users as well as across different pieces of information related to the same user. We also describe an explicit mechanism achieving heterogeneous differential privacy, which is a modification of the Laplacian mechanism by Dwork, McSherry, Nissim, and Smith. In a nutshell, this mechanism achieves heterogeneous differential privacy by manipulating the sensitivity of the function using a linear transformation on the input domain. Finally, we evaluate on real datasets the impact of the proposed mechanism with respect to a semantic clustering task. The results of our experiments demonstrate that heterogeneous differential privacy can account for different privacy attitudes while sustaining a good level of utility as measured by the recall for the semantic clustering task.

preprint2014arXiv

Signed graph embedding: when everybody can sit closer to friends than enemies

Signed graphs are graphs with signed edges. They are commonly used to represent positive and negative relationships in social networks. While balance theory and clusterizable graphs deal with signed graphs to represent social interactions, recent empirical studies have proved that they fail to reflect some current practices in real social networks. In this paper we address the issue of drawing signed graphs and capturing such social interactions. We relax the previous assumptions to define a drawing as a model in which every vertex has to be placed closer to its neighbors connected via a positive edge than its neighbors connected via a negative edge in the resulting space. Based on this definition, we address the problem of deciding whether a given signed graph has a drawing in a given $\ell$-dimensional Euclidean space. We present forbidden patterns for signed graphs that admit the introduced definition of drawing in the Euclidean plane and line. We then focus on the $1$-dimensional case, where we provide a polynomial time algorithm that decides if a given complete signed graph has a drawing, and constructs it when applicable.

preprint2013arXiv

On Dynamic Distributed Computing

This paper shows for the first time that distributed computing can be both reliable and efficient in an environment that is both highly dynamic and hostile. More specifically, we show how to maintain clusters of size $O(\log N)$, each containing more than two thirds of honest nodes with high probability, within a system whose size can vary \textit{polynomially} with respect to its initial size. Furthermore, the communication cost induced by each node arrival or departure is polylogarithmic with respect to $N$, the maximal size of the system. Our clustering can be achieved despite the presence of a Byzantine adversary controlling a fraction $\bad \leq \{1}{3}-ε$ of the nodes, for some fixed constant $ε> 0$, independent of $N$. So far, such a clustering could only be performed for systems who size can vary constantly and it was not clear whether that was at all possible for polynomial variances.

preprint2013arXiv

Repairing Multiple Failures with Coordinated and Adaptive Regenerating Codes

Erasure correcting codes are widely used to ensure data persistence in distributed storage systems. This paper addresses the simultaneous repair of multiple failures in such codes. We go beyond existing work (i.e., regenerating codes by Dimakis et al.) by describing (i) coordinated regenerating codes (also known as cooperative regenerating codes) which support the simultaneous repair of multiple devices, and (ii) adaptive regenerating codes which allow adapting the parameters at each repair. Similarly to regenerating codes by Dimakis et al., these codes achieve the optimal tradeoff between storage and the repair bandwidth. Based on these extended regenerating codes, we study the impact of lazy repairs applied to regenerating codes and conclude that lazy repairs cannot reduce the costs in term of network bandwidth but allow reducing the disk-related costs (disk bandwidth and disk I/O).

preprint2012arXiv

Clustered Network Coding for Maintenance in Practical Storage Systems

Classical erasure codes, e.g. Reed-Solomon codes, have been acknowledged as an efficient alternative to plain replication to reduce the storage overhead in reliable distributed storage systems. Yet, such codes experience high overhead during the maintenance process. In this paper we propose a novel erasure-coded framework especially tailored for networked storage systems. Our approach relies on the use of random codes coupled with a clustered placement strategy, enabling the maintenance of a failed machine at the granularity of multiple files. Our repair protocol leverages network coding techniques to reduce by half the amount of data transferred during maintenance, as several files can be repaired simultaneously. This approach, as formally proven and demonstrated by our evaluation on a public experimental testbed, enables to dramatically decrease the bandwidth overhead during the maintenance process, as well as the time to repair a failure. In addition, the implementation is made as simple as possible, aiming at a deployment into practical systems.

preprint2011arXiv

Scalable and Secure Aggregation in Distributed Networks

We consider the problem of computing an aggregation function in a \emph{secure} and \emph{scalable} way. Whereas previous distributed solutions with similar security guarantees have a communication cost of $O(n^3)$, we present a distributed protocol that requires only a communication complexity of $O(n\log^3 n)$, which we prove is near-optimal. Our protocol ensures perfect security against a computationally-bounded adversary, tolerates $(1/2-ε)n$ malicious nodes for any constant $1/2 > ε> 0$ (not depending on $n$), and outputs the exact value of the aggregated function with high probability.