Source author record

Laurent Massoulié

Laurent Massoulié appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.PR Social and Information Networks math.OC Computer Science and Game Theory Distributed, Parallel, and Cluster Computing Networking and Internet Architecture physics.soc-ph Data Structures and Algorithms Discrete Mathematics Information Theory math.CO math.DS math.IT math.ST Multiagent Systems Statistics Theory

Catalog footprint

What is connected

19works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Learning with Shallow Neural Networks on Cluster-Structured Features

The success of deep learning in high-dimensional settings is often attributed to the presence of low-dimensional structure in real-world data. While standard theoretical models typically assume that this structure lies in the target function, projecting unstructured inputs onto a low-dimensional subspace, data such as images, text or genomic sequences exhibit strong spatial correlations within the input space itself. In this paper, we propose a tractable model to study how these correlations affect the sample complexity of learning with gradient descent on shallow neural networks. Specifically, we consider targets that depend on a small number of latent Boolean variables, and input features grouped into clusters and correlated with the latent variables. Under an identifiability assumption, we show that for a layerwise gradient-descent variant, the sample complexity scales with the number of hidden variables and, when the signal-to-noise ratio is sufficiently high, is independent of the input dimension, up to logarithmic terms. We empirically test our theoretical findings on both synthetic and real data.

preprint2026arXiv

The value of random zero-sum games

We study the value of a two-player zero-sum game on a random matrix $M\in \mathbb{R}^{n\times m}$, defined by $v(M) = \min_{x\inΔ_n}\max_{y\in Δ_m}x^T M y$. In the setting where $n=m$ and $M$ has i.i.d. standard Gaussian entries, we prove that the standard deviation of $v(M)$ is of order $\frac{1}{n}$. This confirms an experimental conjecture dating back to the 1980s. We also investigate the case where $M$ is a rectangular Gaussian matrix with $m = n+λ\sqrt{n}$, showing that the expected value of the game is of order $\fracλ{n}$, as well as the case where $M$ is a random orthogonal matrix. Our techniques are based on probabilistic arguments and convex geometry. We argue that the study of random games could shed new light on various problems in theoretical computer science.

preprint2022arXiv

Accelerating Abelian Random Walks with Hyperbolic Dynamics

Given integers $d \geq 2, n \geq 1$, we consider affine random walks on torii $(\mathbb{Z} / n \mathbb{Z})^{d}$ defined as $X_{t+1} = A X_{t} + B_{t} \mod n$, where $A \in \mathrm{GL}_{d}(\mathbb{Z})$ is an invertible matrix with integer entries and $(B_{t})_{t \geq 0}$ is a sequence of iid random increments on $\mathbb{Z}^{d}$. We show that when $A$ has no eigenvalues of modulus $1$, this random walk mixes in $O(\log n \log \log n)$ steps as $n \rightarrow \infty$, and mixes actually in $O(\log n)$ steps only for almost all $n$. These results generalize those on the so-called Chung-Diaconis-Graham process, which corresponds to the case $d=1$. Our proof is based on the initial arguments of Chung, Diaconis and Graham, and relies extensively on the properties of the dynamical system $x \mapsto A^{\top} x$ on the continuous torus $\mathbb{R}^{d} / \mathbb{Z}^{d}$. Having no eigenvalue of modulus one makes this dynamical system a hyperbolic toral automorphism, a typical example of a chaotic system known to have a rich behaviour. As such our proof sheds new light on the speed-up gained by applying a deterministic map to a Markov chain.

preprint2022arXiv

Non-backtracking spectra of weighted inhomogeneous random graphs

We study a model of random graphs where each edge is drawn independently (but not necessarily identically distributed) from the others, and then assigned a random weight. When the mean degree of such a graph is low, it is known that the spectrum of the adjacency matrix $A$ deviates significantly from that of its expected value $\mathbb E A$. In contrast, we show that over a wide range of parameters the top eigenvalues of the non-backtracking matrix $B$ -- a matrix whose powers count the non-backtracking walks between two edges -- are close to those of $\mathbb E A$, and all other eigenvalues are confined in a bulk with known radius. We also obtain a precise characterization of the scalar product between the eigenvectors of $B$ and their deterministic counterparts derived from the model parameters. This result has many applications, in domains ranging from (noisy) matrix completion to community detection, as well as matrix perturbation theory. In particular, we establish as a corollary that a result known as the Baik-Ben Arous-Péché phase transition, previously established only for rotationally invariant random matrices, holds more generally for matrices $A$ as above under a mild concentration hypothesis.

preprint2022arXiv

Partial Recovery in the Graph Alignment Problem

In this paper, we consider the graph alignment problem, which is the problem of recovering, given two graphs, a one-to-one mapping between nodes that maximizes edge overlap. This problem can be viewed as a noisy version of the well-known graph isomorphism problem and appears in many applications, including social network deanonymization and cellular biology. Our focus here is on partial recovery, i.e., we look for a one-to-one mapping which is correct on a fraction of the nodes of the graph rather than on all of them, and we assume that the two input graphs to the problem are correlated Erdős-Rényi graphs of parameters $(n,q,s)$. Our main contribution is then to give necessary and sufficient conditions on $(n,q,s)$ under which partial recovery is possible with high probability as the number of nodes $n$ goes to infinity. In particular, we show that it is possible to achieve partial recovery in the $nqs=Θ(1)$ regime under certain additional assumptions.

preprint2022arXiv

Sample Optimality and All-for-all Strategies in Personalized Federated and Collaborative Learning

In personalized Federated Learning, each member of a potentially large set of agents aims to train a model minimizing its loss function averaged over its local data distribution. We study this problem under the lens of stochastic optimization. Specifically, we introduce information-theoretic lower bounds on the number of samples required from all agents to approximately minimize the generalization error of a fixed agent. We then provide strategies matching these lower bounds, in the all-for-one and all-for-all settings where respectively one or all agents desire to minimize their own local function. Our strategies are based on a gradient filtering approach: provided prior knowledge on some notions of distances or discrepancies between local data distributions or functions, a given agent filters and aggregates stochastic gradients received from other agents, in order to achieve an optimal bias-variance trade-off.

preprint2021arXiv

Asynchrony and Acceleration in Gossip Algorithms

This paper considers the minimization of a sum of smooth and strongly convex functions dispatched over the nodes of a communication network. Previous works on the subject either focus on synchronous algorithms, which can be heavily slowed down by a few slow nodes (the straggler problem), or consider a model of asynchronous operation (Boyd et al., 2006) in which adjacent nodes communicate at the instants of Poisson point processes. We have two main contributions. 1) We propose CACDM (a Continuously Accelerated Coordinate Dual Method), and for the Poisson model of asynchronous operation, we prove CACDM to converge to optimality at an accelerated convergence rate in the sense of Nesterov et Stich, 2017. In contrast, previously proposed asynchronous algorithms have not been proven to achieve such accelerated rate. While CACDM is based on discrete updates, the proof of its convergence crucially depends on a continuous time analysis. 2) We introduce a new communication scheme based on Loss-Networks, that is programmable in a fully asynchronous and decentralized way, unlike the Poisson model of asynchronous operation that does not capture essential aspects of asynchrony such as non-instantaneous communications and computations. Under this Loss-Network model of asynchrony, we establish for CDM (a Coordinate Dual Method) a rate of convergence in terms of the eigengap of the Laplacian of the graph weighted by local effective delays. We believe this eigengap to be a fundamental bottleneck for convergence rates of asynchronous optimization. Finally, we verify empirically that CACDM enjoys an accelerated convergence rate in the Loss-Network model of asynchrony.

preprint2020arXiv

Dual-Free Stochastic Decentralized Optimization with Variance Reduction

We consider the problem of training machine learning models on distributed data in a decentralized way. For finite-sum problems, fast single-machine algorithms for large datasets rely on stochastic updates combined with variance reduction. Yet, existing decentralized stochastic algorithms either do not obtain the full speedup allowed by stochastic updates, or require oracles that are more expensive than regular gradients. In this work, we introduce a Decentralized stochastic algorithm with Variance Reduction called DVR. DVR only requires computing stochastic gradients of the local functions, and is computationally as fast as a standard stochastic variance-reduced algorithms run on a $1/n$ fraction of the dataset, where $n$ is the number of nodes. To derive DVR, we use Bregman coordinate descent on a well-chosen dual problem, and obtain a dual-free algorithm using a specific Bregman divergence. We give an accelerated version of DVR based on the Catalyst framework, and illustrate its effectiveness with simulations on real data.

preprint2015arXiv

Clustering and Inference From Pairwise Comparisons

Given a set of pairwise comparisons, the classical ranking problem computes a single ranking that best represents the preferences of all users. In this paper, we study the problem of inferring individual preferences, arising in the context of making personalized recommendations. In particular, we assume that there are $n$ users of $r$ types; users of the same type provide similar pairwise comparisons for $m$ items according to the Bradley-Terry model. We propose an efficient algorithm that accurately estimates the individual preferences for almost all users, if there are $r \max \{m, n\}\log m \log^2 n$ pairwise comparisons per type, which is near optimal in sample complexity when $r$ only grows logarithmically with $m$ or $n$. Our algorithm has three steps: first, for each user, compute the \emph{net-win} vector which is a projection of its $\binom{m}{2}$-dimensional vector of pairwise comparisons onto an $m$-dimensional linear subspace; second, cluster the users based on the net-win vectors; third, estimate a single preference for each cluster separately. The net-win vectors are much less noisy than the high dimensional vectors of pairwise comparisons and clustering is more accurate after the projection as confirmed by numerical experiments. Moreover, we show that, when a cluster is only approximately correct, the maximum likelihood estimation for the Bradley-Terry model is still close to the true preference.

preprint2015arXiv

Non-backtracking spectrum of random graphs: community detection and non-regular Ramanujan graphs

A non-backtracking walk on a graph is a directed path such that no edge is the inverse of its preceding edge. The non-backtracking matrix of a graph is indexed by its directed edges and can be used to count non-backtracking walks of a given length. It has been used recently in the context of community detection and has appeared previously in connection with the Ihara zeta function and in some generalizations of Ramanujan graphs. In this work, we study the largest eigenvalues of the non-backtracking matrix of the Erdos-Renyi random graph and of the Stochastic Block Model in the regime where the number of edges is proportional to the number of vertices. Our results confirm the "spectral redemption" conjecture that community detection can be made on the basis of the leading eigenvectors above the feasibility threshold.

preprint2015arXiv

Reconstruction in the Labeled Stochastic Block Model

The labeled stochastic block model is a random graph model representing networks with community structure and interactions of multiple types. In its simplest form, it consists of two communities of approximately equal size, and the edges are drawn and labeled at random with probability depending on whether their two endpoints belong to the same community or not. It has been conjectured in \cite{Heimlicher12} that correlated reconstruction (i.e.\ identification of a partition correlated with the true partition into the underlying communities) would be feasible if and only if a model parameter exceeds a threshold. We prove one half of this conjecture, i.e., reconstruction is impossible when below the threshold. In the positive direction, we introduce a weighted graph to exploit the label information. With a suitable choice of weight function, we show that when above the threshold by a specific constant, reconstruction is achieved by (1) minimum bisection, (2) a semidefinite relaxation of minimum bisection, and (3) a spectral method combined with removal of edges incident to vertices of high degree. Furthermore, we show that hypothesis testing between the labeled stochastic block model and the labeled Erdős-Rényi random graph model exhibits a phase transition at the conjectured reconstruction threshold.

preprint2015arXiv

Self-Organizing Flows in Social Networks

Social networks offer users new means of accessing information, essentially relying on "social filtering", i.e. propagation and filtering of information by social contacts. The sheer amount of data flowing in these networks, combined with the limited budget of attention of each user, makes it difficult to ensure that social filtering brings relevant content to the interested users. Our motivation in this paper is to measure to what extent self-organization of the social network results in efficient social filtering. To this end we introduce flow games, a simple abstraction that models network formation under selfish user dynamics, featuring user-specific interests and budget of attention. In the context of homogeneous user interests, we show that selfish dynamics converge to a stable network structure (namely a pure Nash equilibrium) with close-to-optimal information dissemination. We show in contrast, for the more realistic case of heterogeneous interests, that convergence, if it occurs, may lead to information dissemination that can be arbitrarily inefficient, as captured by an unbounded "price of anarchy". Nevertheless the situation differs when users' interests exhibit a particular structure, captured by a metric space with low doubling dimension. In that case, natural autonomous dynamics converge to a stable configuration. Moreover, users obtain all the information of interest to them in the corresponding dissemination, provided their budget of attention is logarithmic in the size of their interest set.

preprint2014arXiv

Adaptive Replication in Distributed Content Delivery Networks

We address the problem of content replication in large distributed content delivery networks, composed of a data center assisted by many small servers with limited capabilities and located at the edge of the network. The objective is to optimize the placement of contents on the servers to offload as much as possible the data center. We model the system constituted by the small servers as a loss network, each loss corresponding to a request to the data center. Based on large system / storage behavior, we obtain an asymptotic formula for the optimal replication of contents and propose adaptive schemes related to those encountered in cache networks but reacting here to loss events, and faster algorithms generating virtual events at higher rate while keeping the same target replication. We show through simulations that our adaptive schemes outperform significantly standard replication strategies both in terms of loss rates and adaptation speed.

preprint2014arXiv

Edge Label Inference in Generalized Stochastic Block Models: from Spectral Theory to Impossibility Results

The classical setting of community detection consists of networks exhibiting a clustered structure. To more accurately model real systems we consider a class of networks (i) whose edges may carry labels and (ii) which may lack a clustered structure. Specifically we assume that nodes possess latent attributes drawn from a general compact space and edges between two nodes are randomly generated and labeled according to some unknown distribution as a function of their latent attributes. Our goal is then to infer the edge label distributions from a partially observed network. We propose a computationally efficient spectral algorithm and show it allows for asymptotically correct inference when the average node degree could be as low as logarithmic in the total number of nodes. Conversely, if the average node degree is below a specific constant threshold, we show that no algorithm can achieve better inference than guessing without using the observations. As a byproduct of our analysis, we show that our model provides a general procedure to construct random graph models with a spectrum asymptotic to a pre-specified eigenvalue distribution such as a power-law distribution.

preprint2014arXiv

The Price of Privacy in Untrusted Recommendation Engines

Recent increase in online privacy concerns prompts the following question: can a recommender system be accurate if users do not entrust it with their private data? To answer this, we study the problem of learning item-clusters under local differential privacy, a powerful, formal notion of data privacy. We develop bounds on the sample-complexity of learning item-clusters from privatized user inputs. Significantly, our results identify a sample-complexity separation between learning in an information-rich and an information-scarce regime, thereby highlighting the interaction between privacy and the amount of information (ratings) available to each user. In the information-rich regime, where each user rates at least a constant fraction of items, a spectral clustering approach is shown to achieve a sample-complexity lower bound derived from a simple information-theoretic argument based on Fano's inequality. However, the information-scarce regime, where each user rates only a vanishing fraction of items, is found to require a fundamentally different approach both for lower bounds and algorithms. To this end, we develop new techniques for bounding mutual information under a notion of channel-mismatch, and also propose a new algorithm, MaxSense, and show that it achieves optimal sample-complexity in this setting. The techniques we develop for bounding mutual information may be of broader interest. To illustrate this, we show their applicability to $(i)$ learning based on 1-bit sketches, and $(ii)$ adaptive learning, where queries can be adapted based on answers to past queries.

preprint2013arXiv

Distributed User Profiling via Spectral Methods

User profiling is a useful primitive for constructing personalised services, such as content recommendation. In the present paper we investigate the feasibility of user profiling in a distributed setting, with no central authority and only local information exchanges between users. We compute a profile vector for each user (i.e., a low-dimensional vector that characterises her taste) via spectral transformation of observed user-produced ratings for items. Our two main contributions follow: i) We consider a low-rank probabilistic model of user taste. More specifically, we consider that users and items are partitioned in a constant number of classes, such that users and items within the same class are statistically identical. We prove that without prior knowledge of the compositions of the classes, based solely on few random observed ratings (namely $O(N\log N)$ such ratings for $N$ users), we can predict user preference with high probability for unrated items by running a local vote among users with similar profile vectors. In addition, we provide empirical evaluations characterising the way in which spectral profiling performance depends on the dimension of the profile space. Such evaluations are performed on a data set of real user ratings provided by Netflix. ii) We develop distributed algorithms which provably achieve an embedding of users into a low-dimensional space, based on spectral transformation. These involve simple message passing among users, and provably converge to the desired embedding. Our method essentially relies on a novel combination of gossiping and the algorithm proposed by Oja and Karhunen.

preprint2012arXiv

Best-effort networks: modeling and performance analysis via large networks asymptotics

In this paper we introduce a class of Markov models, termed best-effort networks, designed to capture performance indices such as mean transfer times in data networks with best-effort service. We introduce the so-called min bandwidth sharing policy as a conservative approximation to the classical max-min policy. We establish necessary and sufficient ergodicity conditions for best-effort networks under the min policy. We then resort to the mean field technique of statistical physics to analyze network performance deriving fixed point equations for the stationary distribution of large symmetrical best-effort networks. A specific instance of such net- works is the star-shaped network which constitutes a plausible model of a network with an overprovisioned backbone. Numerical and analytical study of the equations allows us to state a number of qualitative conclusions on the impact of traffic parameters (link loads) and topology parameters (route lengths) on mean document transfer time.

preprint2012arXiv

Community Detection in the Labelled Stochastic Block Model

We consider the problem of community detection from observed interactions between individuals, in the context where multiple types of interaction are possible. We use labelled stochastic block models to represent the observed data, where labels correspond to interaction types. Focusing on a two-community scenario, we conjecture a threshold for the problem of reconstructing the hidden communities in a way that is correlated with the true partition. To substantiate the conjecture, we prove that the given threshold correctly identifies a transition on the behaviour of belief propagation from insensitive to sensitive. We further prove that the same threshold corresponds to the transition in a related inference problem on a tree model from infeasible to feasible. Finally, numerical results using belief propagation for community detection give further support to the conjecture.

preprint2012arXiv

Convergence of multivariate belief propagation, with applications to cuckoo hashing and load balancing

This paper is motivated by two applications, namely i) generalizations of cuckoo hashing, a computationally simple approach to assigning keys to objects, and ii) load balancing in content distribution networks, where one is interested in determining the impact of content replication on performance. These two problems admit a common abstraction: in both scenarios, performance is characterized by the maximum weight of a generalization of a matching in a bipartite graph, featuring node and edge capacities. Our main result is a law of large numbers characterizing the asymptotic maximum weight matching in the limit of large bipartite random graphs, when the graphs admit a local weak limit that is a tree. This result specializes to the two application scenarios, yielding new results in both contexts. In contrast with previous results, the key novelty is the ability to handle edge capacities with arbitrary integer values. An analysis of belief propagation algorithms (BP) with multivariate belief vectors underlies the proof. In particular, we show convergence of the corresponding BP by exploiting monotonicity of the belief vectors with respect to the so-called upshifted likelihood ratio stochastic order. This auxiliary result can be of independent interest, providing a new set of structural conditions which ensure convergence of BP.

Laurent Massoulié

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

Learning with Shallow Neural Networks on Cluster-Structured Features

The value of random zero-sum games

Accelerating Abelian Random Walks with Hyperbolic Dynamics

Non-backtracking spectra of weighted inhomogeneous random graphs

Partial Recovery in the Graph Alignment Problem

Sample Optimality and All-for-all Strategies in Personalized Federated and Collaborative Learning

Asynchrony and Acceleration in Gossip Algorithms

Dual-Free Stochastic Decentralized Optimization with Variance Reduction

Clustering and Inference From Pairwise Comparisons

Non-backtracking spectrum of random graphs: community detection and non-regular Ramanujan graphs

Reconstruction in the Labeled Stochastic Block Model

Self-Organizing Flows in Social Networks

Adaptive Replication in Distributed Content Delivery Networks

Edge Label Inference in Generalized Stochastic Block Models: from Spectral Theory to Impossibility Results

The Price of Privacy in Untrusted Recommendation Engines

Distributed User Profiling via Spectral Methods

Best-effort networks: modeling and performance analysis via large networks asymptotics

Community Detection in the Labelled Stochastic Block Model

Convergence of multivariate belief propagation, with applications to cuckoo hashing and load balancing