Researcher profile

Laurent Massoulié

Laurent Massoulié contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

Learning with Shallow Neural Networks on Cluster-Structured Features

The success of deep learning in high-dimensional settings is often attributed to the presence of low-dimensional structure in real-world data. While standard theoretical models typically assume that this structure lies in the target function, projecting unstructured inputs onto a low-dimensional subspace, data such as images, text or genomic sequences exhibit strong spatial correlations within the input space itself. In this paper, we propose a tractable model to study how these correlations affect the sample complexity of learning with gradient descent on shallow neural networks. Specifically, we consider targets that depend on a small number of latent Boolean variables, and input features grouped into clusters and correlated with the latent variables. Under an identifiability assumption, we show that for a layerwise gradient-descent variant, the sample complexity scales with the number of hidden variables and, when the signal-to-noise ratio is sufficiently high, is independent of the input dimension, up to logarithmic terms. We empirically test our theoretical findings on both synthetic and real data.

preprint2026arXiv

The value of random zero-sum games

We study the value of a two-player zero-sum game on a random matrix $M\in \mathbb{R}^{n\times m}$, defined by $v(M) = \min_{x\inΔ_n}\max_{y\in Δ_m}x^T M y$. In the setting where $n=m$ and $M$ has i.i.d. standard Gaussian entries, we prove that the standard deviation of $v(M)$ is of order $\frac{1}{n}$. This confirms an experimental conjecture dating back to the 1980s. We also investigate the case where $M$ is a rectangular Gaussian matrix with $m = n+λ\sqrt{n}$, showing that the expected value of the game is of order $\fracλ{n}$, as well as the case where $M$ is a random orthogonal matrix. Our techniques are based on probabilistic arguments and convex geometry. We argue that the study of random games could shed new light on various problems in theoretical computer science.

preprint2022arXiv

Accelerating Abelian Random Walks with Hyperbolic Dynamics

Given integers $d \geq 2, n \geq 1$, we consider affine random walks on torii $(\mathbb{Z} / n \mathbb{Z})^{d}$ defined as $X_{t+1} = A X_{t} + B_{t} \mod n$, where $A \in \mathrm{GL}_{d}(\mathbb{Z})$ is an invertible matrix with integer entries and $(B_{t})_{t \geq 0}$ is a sequence of iid random increments on $\mathbb{Z}^{d}$. We show that when $A$ has no eigenvalues of modulus $1$, this random walk mixes in $O(\log n \log \log n)$ steps as $n \rightarrow \infty$, and mixes actually in $O(\log n)$ steps only for almost all $n$. These results generalize those on the so-called Chung-Diaconis-Graham process, which corresponds to the case $d=1$. Our proof is based on the initial arguments of Chung, Diaconis and Graham, and relies extensively on the properties of the dynamical system $x \mapsto A^{\top} x$ on the continuous torus $\mathbb{R}^{d} / \mathbb{Z}^{d}$. Having no eigenvalue of modulus one makes this dynamical system a hyperbolic toral automorphism, a typical example of a chaotic system known to have a rich behaviour. As such our proof sheds new light on the speed-up gained by applying a deterministic map to a Markov chain.

preprint2022arXiv

Non-backtracking spectra of weighted inhomogeneous random graphs

We study a model of random graphs where each edge is drawn independently (but not necessarily identically distributed) from the others, and then assigned a random weight. When the mean degree of such a graph is low, it is known that the spectrum of the adjacency matrix $A$ deviates significantly from that of its expected value $\mathbb E A$. In contrast, we show that over a wide range of parameters the top eigenvalues of the non-backtracking matrix $B$ -- a matrix whose powers count the non-backtracking walks between two edges -- are close to those of $\mathbb E A$, and all other eigenvalues are confined in a bulk with known radius. We also obtain a precise characterization of the scalar product between the eigenvectors of $B$ and their deterministic counterparts derived from the model parameters. This result has many applications, in domains ranging from (noisy) matrix completion to community detection, as well as matrix perturbation theory. In particular, we establish as a corollary that a result known as the Baik-Ben Arous-Péché phase transition, previously established only for rotationally invariant random matrices, holds more generally for matrices $A$ as above under a mild concentration hypothesis.

preprint2022arXiv

Partial Recovery in the Graph Alignment Problem

In this paper, we consider the graph alignment problem, which is the problem of recovering, given two graphs, a one-to-one mapping between nodes that maximizes edge overlap. This problem can be viewed as a noisy version of the well-known graph isomorphism problem and appears in many applications, including social network deanonymization and cellular biology. Our focus here is on partial recovery, i.e., we look for a one-to-one mapping which is correct on a fraction of the nodes of the graph rather than on all of them, and we assume that the two input graphs to the problem are correlated Erdős-Rényi graphs of parameters $(n,q,s)$. Our main contribution is then to give necessary and sufficient conditions on $(n,q,s)$ under which partial recovery is possible with high probability as the number of nodes $n$ goes to infinity. In particular, we show that it is possible to achieve partial recovery in the $nqs=Θ(1)$ regime under certain additional assumptions.

preprint2022arXiv

Sample Optimality and All-for-all Strategies in Personalized Federated and Collaborative Learning

In personalized Federated Learning, each member of a potentially large set of agents aims to train a model minimizing its loss function averaged over its local data distribution. We study this problem under the lens of stochastic optimization. Specifically, we introduce information-theoretic lower bounds on the number of samples required from all agents to approximately minimize the generalization error of a fixed agent. We then provide strategies matching these lower bounds, in the all-for-one and all-for-all settings where respectively one or all agents desire to minimize their own local function. Our strategies are based on a gradient filtering approach: provided prior knowledge on some notions of distances or discrepancies between local data distributions or functions, a given agent filters and aggregates stochastic gradients received from other agents, in order to achieve an optimal bias-variance trade-off.

preprint2021arXiv

Asynchrony and Acceleration in Gossip Algorithms

This paper considers the minimization of a sum of smooth and strongly convex functions dispatched over the nodes of a communication network. Previous works on the subject either focus on synchronous algorithms, which can be heavily slowed down by a few slow nodes (the straggler problem), or consider a model of asynchronous operation (Boyd et al., 2006) in which adjacent nodes communicate at the instants of Poisson point processes. We have two main contributions. 1) We propose CACDM (a Continuously Accelerated Coordinate Dual Method), and for the Poisson model of asynchronous operation, we prove CACDM to converge to optimality at an accelerated convergence rate in the sense of Nesterov et Stich, 2017. In contrast, previously proposed asynchronous algorithms have not been proven to achieve such accelerated rate. While CACDM is based on discrete updates, the proof of its convergence crucially depends on a continuous time analysis. 2) We introduce a new communication scheme based on Loss-Networks, that is programmable in a fully asynchronous and decentralized way, unlike the Poisson model of asynchronous operation that does not capture essential aspects of asynchrony such as non-instantaneous communications and computations. Under this Loss-Network model of asynchrony, we establish for CDM (a Coordinate Dual Method) a rate of convergence in terms of the eigengap of the Laplacian of the graph weighted by local effective delays. We believe this eigengap to be a fundamental bottleneck for convergence rates of asynchronous optimization. Finally, we verify empirically that CACDM enjoys an accelerated convergence rate in the Loss-Network model of asynchrony.

preprint2020arXiv

Dual-Free Stochastic Decentralized Optimization with Variance Reduction

We consider the problem of training machine learning models on distributed data in a decentralized way. For finite-sum problems, fast single-machine algorithms for large datasets rely on stochastic updates combined with variance reduction. Yet, existing decentralized stochastic algorithms either do not obtain the full speedup allowed by stochastic updates, or require oracles that are more expensive than regular gradients. In this work, we introduce a Decentralized stochastic algorithm with Variance Reduction called DVR. DVR only requires computing stochastic gradients of the local functions, and is computationally as fast as a standard stochastic variance-reduced algorithms run on a $1/n$ fraction of the dataset, where $n$ is the number of nodes. To derive DVR, we use Bregman coordinate descent on a well-chosen dual problem, and obtain a dual-free algorithm using a specific Bregman divergence. We give an accelerated version of DVR based on the Catalyst framework, and illustrate its effectiveness with simulations on real data.