Source author record

Antoine Maillard

Antoine Maillard appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.dis-nn math.PR Machine Learning cond-mat.stat-mech Information Theory math.IT math.ST Statistics Theory Data Structures and Algorithms Discrete Mathematics eess.SP gr-qc hep-th math.CO math.MG nlin.CD physics.optics

Catalog footprint

What is connected

9works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Factual recall in linear associative memories: sharp asymptotics and mechanistic insights

Large language models demonstrate remarkable ability in factual recall, yet the fundamental limits of storing and retrieving input--output associations with neural networks remain unclear. We study these limits in a minimal setting: a linear associative memory that maps $p$ input embeddings in $\mathbb{R}^d$ to their corresponding~$d$-dimensional targets via a single layer, requiring each mapped input to be well separated from all other targets. Unlike in supervised classification, this strict separation induces~$p$ constraints per association and produces strong correlations between constraints that make a direct characterisation of the storage capacity difficult. Here, we provide a precise characterisation of this capacity in the following way. We first introduce a decoupled model in which each input has its own independent set of competing outputs, and provide numerical and analytical evidence that this decoupled model is equivalent to the original model in terms of storage capacity, spectra of the learnt weights, and storage mechanism. Using tools from statistical physics, we show that the decoupled model can store up to $p_c \log p_c / d^2 = 1 / 2$ associations, and generalise the computation of $p_c$ to linear two-layer architectures. Our analysis also gives mechanistic insight into how the optimal solution improves over a naïve Hebbian learning rule: rather than boosting input-output alignments with broad fluctuations, the optimal solution raises the correct scores just above the extreme-value threshold set by the competing outputs. These findings give a sharp statistical-physics characterisation of factual storage in linear networks and provide a baseline for understanding the memory capacity of more realistic neural architectures.

preprint2023arXiv

Landscape Complexity for the Empirical Risk of Generalized Linear Models

We present a method to obtain the average and the typical value of the number of critical points of the empirical risk landscape for generalized linear estimation problems and variants. This represents a substantial extension of previous applications of the Kac-Rice method since it allows to analyze the critical points of high dimensional non-Gaussian random functions. Under a technical hypothesis, we obtain a rigorous explicit variational formula for the annealed complexity, which is the logarithm of the average number of critical points at fixed value of the empirical risk. This result is simplified, and extended, using the non-rigorous Kac-Rice replicated method from theoretical physics. In this way we find an explicit variational formula for the quenched complexity, which is generally different from its annealed counterpart, and allows to obtain the number of critical points for typical instances up to exponential accuracy.

preprint2022arXiv

A remark on Kashin's discrepancy argument and partial coloring in the Komlós conjecture

In this expository note, we discuss an early partial coloring result of B. Kashin [C. R. Acad. Bulgare Sci., 1985]. Although this result only implies Spencer's six standard deviations [Trans. Amer. Math. Soc., 1985] up to a $\log\log n$ factor, Kashin's argument gives a simple proof of the existence of a constant discrepancy partial coloring in the setup of Komlós conjecture.

preprint2022arXiv

Perturbative construction of mean-field equations in extensive-rank matrix factorization and denoising

Factorization of matrices where the rank of the two factors diverges linearly with their sizes has many applications in diverse areas such as unsupervised representation learning, dictionary learning or sparse coding. We consider a setting where the two factors are generated from known component-wise independent prior distributions, and the statistician observes a (possibly noisy) component-wise function of their matrix product. In the limit where the dimensions of the matrices tend to infinity, but their ratios remain fixed, we expect to be able to derive closed form expressions for the optimal mean squared error on the estimation of the two factors. However, this remains a very involved mathematical and algorithmic problem. A related, but simpler, problem is extensive-rank matrix denoising, where one aims to reconstruct a matrix with extensive but usually small rank from noisy measurements. In this paper, we approach both these problems using high-temperature expansions at fixed order parameters. This allows to clarify how previous attempts at solving these problems failed at finding an asymptotically exact solution. We provide a systematic way to derive the corrections to these existing approximations, taking into account the structure of correlations particular to the problem. Finally, we illustrate our approach in detail on the case of extensive-rank matrix denoising. We compare our results with known optimal rotationally-invariant estimators, and show how exact asymptotic calculations of the minimal error can be performed using extensive-rank matrix integrals.

preprint2022arXiv

Phase Retrieval: From Computational Imaging to Machine Learning

Phase retrieval consists in the recovery of a complex-valued signal from intensity-only measurements. As it pervades a broad variety of applications, many researchers have striven to develop phase-retrieval algorithms. Classical approaches involve techniques as varied as generic gradient-descent routines or specialized spectral methods, to name a few. Yet, the phase-recovery problem remains a challenge to this day. Recently, however, advances in machine learning have revitalized the study of phase retrieval in two ways: significant theoretical advances have emerged from the analogy between phase retrieval and single-layer neural networks; practical breakthroughs have been obtained thanks to deep-learning regularization. In this tutorial, we review phase retrieval under a unifying framework that encompasses classical and machine-learning methods. We focus on three key elements: applications, overview of recent reconstruction algorithms, and the latest theoretical results.

preprint2020arXiv

High-temperature Expansions and Message Passing Algorithms

Improved mean-field technics are a central theme of statistical physics methods applied to inference and learning. We revisit here some of these methods using high-temperature expansions for disordered systems initiated by Plefka, Georges and Yedidia. We derive the Gibbs free entropy and the subsequent self-consistent equations for a generic class of statistical models with correlated matrices and show in particular that many classical approximation schemes, such as adaptive TAP, Expectation-Consistency, or the approximations behind the Vector Approximate Message Passing algorithm all rely on the same assumptions, that are also at the heart of high-temperature expansions. We focus on the case of rotationally invariant random coupling matrices in the `high-dimensional' limit in which the number of samples and the dimension are both large, but with a fixed ratio. This encapsulates many widely studied models, such as Restricted Boltzmann Machines or Generalized Linear Models with correlated data matrices. In this general setting, we show that all the approximation schemes described before are equivalent, and we conjecture that they are exact in the thermodynamic limit in the replica symmetric phases. We achieve this conclusion by resummation of the infinite perturbation series, which generalizes a seminal result of Parisi and Potters. A rigorous derivation of this conjecture is an interesting mathematical challenge. On the way to these conclusions, we uncover several diagrammatical results in connection with free probability and random matrix theory, that are interesting independently of the rest of our work.

preprint2020arXiv

Phase retrieval in high dimensions: Statistical and computational phase transitions

We consider the phase retrieval problem of reconstructing a $n$-dimensional real or complex signal $\mathbf{X}^{\star}$ from $m$ (possibly noisy) observations $Y_μ= | \sum_{i=1}^n Φ_{μi} X^{\star}_i/\sqrt{n}|$, for a large class of correlated real and complex random sensing matrices $\mathbfΦ$, in a high-dimensional setting where $m,n\to\infty$ while $α= m/n=Θ(1)$. First, we derive sharp asymptotics for the lowest possible estimation error achievable statistically and we unveil the existence of sharp phase transitions for the weak- and full-recovery thresholds as a function of the singular values of the matrix $\mathbfΦ$. This is achieved by providing a rigorous proof of a result first obtained by the replica method from statistical mechanics. In particular, the information-theoretic transition to perfect recovery for full-rank matrices appears at $α=1$ (real case) and $α=2$ (complex case). Secondly, we analyze the performance of the best-known polynomial time algorithm for this problem -- approximate message-passing -- establishing the existence of a statistical-to-algorithmic gap depending, again, on the spectral properties of $\mathbfΦ$. Our work provides an extensive classification of the statistical and algorithmic thresholds in high-dimensional phase retrieval for a broad class of random matrices.

preprint2019arXiv

The spiked matrix model with generative priors

Using a low-dimensional parametrization of signals is a generic and powerful way to enhance performance in signal processing and statistical inference. A very popular and widely explored type of dimensionality reduction is sparsity; another type is generative modelling of signal distributions. Generative models based on neural networks, such as GANs or variational auto-encoders, are particularly performant and are gaining on applicability. In this paper we study spiked matrix models, where a low-rank matrix is observed through a noisy channel. This problem with sparse structure of the spikes has attracted broad attention in the past literature. Here, we replace the sparsity assumption by generative modelling, and investigate the consequences on statistical and algorithmic properties. We analyze the Bayes-optimal performance under specific generative models for the spike. In contrast with the sparsity assumption, we do not observe regions of parameters where statistical performance is superior to the best known algorithmic performance. We show that in the analyzed cases the approximate message passing algorithm is able to reach optimal performance. We also design enhanced spectral algorithms and analyze their performance and thresholds using random matrix theory, showing their superiority to the classical principal component analysis. We complement our theoretical results by illustrating the performance of the spectral algorithms when the spikes come from real datasets.

preprint2015arXiv

Islands of stability and recurrence times in AdS

We study the stability of anti-de Sitter (AdS) spacetime to spherically symmetric perturbations of a real scalar field in general relativity. Further, we work within the context of the "two time framework" (TTF) approximation, which describes the leading nonlinear effects for small amplitude perturbations, and is therefore suitable for studying the weakly turbulent instability of AdS---including both collapsing and non-collapsing solutions. We have previously identified a class of quasi-periodic (QP) solutions to the TTF equations, and in this work we analyze their stability. We show that there exist several families of QP solutions that are stable to linear order, and we argue that these solutions represent islands of stability in TTF. We extract the eigenmodes of small oscillations about QP solutions, and we use them to predict approximate recurrence times for generic non-collapsing initial data in the full (non-TTF) system. Alternatively, when sufficient energy is driven to high-frequency modes, as occurs for initial data far from a QP solution, the TTF description breaks down as an approximation to the full system. Depending on the higher order dynamics of the full system, this often signals an imminent collapse to a black hole.

Antoine Maillard

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Factual recall in linear associative memories: sharp asymptotics and mechanistic insights

Landscape Complexity for the Empirical Risk of Generalized Linear Models

A remark on Kashin's discrepancy argument and partial coloring in the Komlós conjecture

Perturbative construction of mean-field equations in extensive-rank matrix factorization and denoising

Phase Retrieval: From Computational Imaging to Machine Learning

High-temperature Expansions and Message Passing Algorithms

Phase retrieval in high dimensions: Statistical and computational phase transitions

The spiked matrix model with generative priors

Islands of stability and recurrence times in AdS