Source author record

Jean Barbier

Jean Barbier appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT cond-mat.dis-nn Machine Learning math-ph math.MP math.PR cond-mat.stat-mech Discrete Mathematics math.ST Statistics Theory

Catalog footprint

What is connected

22works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Sharp feature-learning transitions and Bayes-optimal neural scaling laws in extensive-width networks

We study the information-theoretic limits of learning a one-hidden-layer teacher network with hierarchical features from noisy queries, in the context of knowledge transfer to a smaller student model. We work in the high-dimensional regime where the teacher width $k$ scales linearly with the input dimension $d$ -- a setting that captures large-but-finite-width networks and has only recently become analytically tractable. Using a heuristic leave-one-out decoupling argument, validated numerically throughout, we derive asymptotically sharp characterizations of the Bayes-optimal generalization error and individual feature overlaps via a system of closed fixed-point equations. These equations reveal that feature learnability is governed by a sequence of sharp phase transitions: as data grows, teacher features become recoverable sequentially, each through a discontinuous jump in overlap. This sequential acquisition underlies a precise notion of \textit{effective width} $k_c$ -- the number of learnable features at a given data budget $n$ -- which unifies two distinct scaling regimes: a feature-learning regime in which the Bayes-optimal generalization error $\varepsilon^{\rm BO}$ scales as $ n^{1/(2β)-1}$, and a refinement regime in which it scales as $n^{-1}$, where $β>1/2$ is the exponent of the power-law feature hierarchy. Both laws collapse to the single relation $\varepsilon^{\rm BO}=Θ(k_c d/n)$. We further show empirically that a student trained with \textsc{Adam} near the effective width $k_c$ achieves these optimal scaling laws (up to a small algorithmic gap), and provide an information-theoretic account of the associated scaling in model size.

preprint2022arXiv

Sparse superposition codes under VAMP decoding with generic rotational invariant coding matrices

Sparse superposition codes were originally proposed as a capacity-achieving communication scheme over the gaussian channel, whose coding matrices were made of i.i.d. gaussian entries.We extend this coding scheme to more generic ensembles of rotational invariant coding matrices with arbitrary spectrum, which include the gaussian ensemble as a special case. We further introduce and analyse a decoder based on vector approximate message-passing (VAMP).Our main findings, based on both a standard replica symmetric potential theory and state evolution analysis, are the superiority of certain structured ensembles of coding matrices (such as partial row-orthogonal) when compared to i.i.d. matrices, as well as a spectrum-independent upper bound on VAMP's threshold. Most importantly, we derive a simple "spectral criterion " for the scheme to be at the same time capacity-achieving while having the best possible algorithmic threshold, in the "large section size" asymptotic limit. Our results therefore provide practical design principles for the coding matrices in this promising communication scheme.

preprint2022arXiv

Sparse superposition codes with rotational invariant coding matrices for memoryless channels

We recently showed in [1] the superiority of certain structured coding matrices ensembles (such as partial row-orthogonal) for sparse superposition codes when compared with purely random matrices with i.i.d. entries, both information-theoretically and under practical vector approximate message-passing decoding. Here we generalize this result to binary input channels under generalized vector approximate message-passing decoding [2].We focus on specific binary output channels for concreteness but our analysis based on the replica symmetric method from statistical physics applies to any memoryless channel. We confirm that the "spectral criterion" introduced in [1], a coding-matrix design principle which allows the code to be capacity-achieving in the "large section size" asymptotic limit, extends to generic memoryless channels. Moreover, we also show that the vanishing error floor property [3] of this coding scheme is universal for arbitrary spectrum of the coding matrix.

preprint2022arXiv

Statistical limits of dictionary learning: random matrix theory and the spectral replica method

We consider increasingly complex models of matrix denoising and dictionary learning in the Bayes-optimal setting, in the challenging regime where the matrices to infer have a rank growing linearly with the system size. This is in contrast with most existing literature concerned with the low-rank (i.e., constant-rank) regime. We first consider a class of rotationally invariant matrix denoising problems whose mutual information and minimum mean-square error are computable using techniques from random matrix theory. Next, we analyze the more challenging models of dictionary learning. To do so we introduce a novel combination of the replica method from statistical mechanics together with random matrix theory, coined spectral replica method. This allows us to derive variational formulas for the mutual information between hidden representations and the noisy data of the dictionary learning problem, as well as for the overlaps quantifying the optimal reconstruction error. The proposed method reduces the number of degrees of freedom from $Θ(N^2)$ matrix entries to $Θ(N)$ eigenvalues (or singular values), and yields Coulomb gas representations of the mutual information which are reminiscent of matrix models in physics. The main ingredients are a combination of large deviation results for random matrices together with a new replica symmetric decoupling ansatz at the level of the probability distributions of eigenvalues (or singular values) of certain overlap matrices and the use of HarishChandra-Itzykson-Zuber spherical integrals.

preprint2022arXiv

Strong replica symmetry in high-dimensional optimal Bayesian inference

We consider generic optimal Bayesian inference, namely, models of signal reconstruction where the posterior distribution and all hyperparameters are known. Under a standard assumption on the concentration of the free energy, we show how replica symmetry in the strong sense of concentration of all multioverlaps can be established as a consequence of the Franz-de Sanctis identities; the identities themselves in the current setting are obtained via a novel perturbation coming from exponentially distributed "side-observations" of the signal. Concentration of multioverlaps means that asymptotically the posterior distribution has a particularly simple structure encoded by a random probability measure (or, in the case of binary signal, a non-random probability measure). We believe that such strong control of the model should be key in the study of inference problems with underlying sparse graphical structure (error correcting codes, block models, etc) and, in particular, in the rigorous derivation of replica symmetric formulas for the free energy and mutual information in this context.

preprint2022arXiv

The price of ignorance: how much does it cost to forget noise structure in low-rank matrix estimation?

We consider the problem of estimating a rank-1 signal corrupted by structured rotationally invariant noise, and address the following question: how well do inference algorithms perform when the noise statistics is unknown and hence Gaussian noise is assumed? While the matched Bayes-optimal setting with unstructured noise is well understood, the analysis of this mismatched problem is only at its premises. In this paper, we make a step towards understanding the effect of the strong source of mismatch which is the noise statistics. Our main technical contribution is the rigorous analysis of a Bayes estimator and of an approximate message passing (AMP) algorithm, both of which incorrectly assume a Gaussian setup. The first result exploits the theory of spherical integrals and of low-rank matrix perturbations; the idea behind the second one is to design and analyze an artificial AMP which, by taking advantage of the flexibility in the denoisers, is able to "correct" the mismatch. Armed with these sharp asymptotic characterizations, we unveil a rich and often unexpected phenomenology. For example, despite AMP is in principle designed to efficiently compute the Bayes estimator, the former is outperformed by the latter in terms of mean-square error. We show that this performance gap is due to an incorrect estimation of the signal norm. In fact, when the SNR is large enough, the overlaps of the AMP and the Bayes estimator coincide, and they even match those of optimal estimators taking into account the structure of the noise.

preprint2020arXiv

Information-theoretic limits of a multiview low-rank symmetric spiked matrix model

We consider a generalization of an important class of high-dimensional inference problems, namely spiked symmetric matrix models, often used as probabilistic models for principal component analysis. Such paradigmatic models have recently attracted a lot of attention from a number of communities due to their phenomenological richness with statistical-to-computational gaps, while remaining tractable. We rigorously establish the information-theoretic limits through the proof of single-letter formulas for the mutual information and minimum mean-square error. On a technical side we improve the recently introduced adaptive interpolation method, so that it can be used to study low-rank models (i.e., estimation problems of "tall matrices") in full generality, an important step towards the rigorous analysis of more complicated inference and learning models.

preprint2020arXiv

Mutual Information and Optimality of Approximate Message-Passing in Random Linear Estimation

We consider the estimation of a signal from the knowledge of its noisy linear random Gaussian projections. A few examples where this problem is relevant are compressed sensing, sparse superposition codes, and code division multiple access. There has been a number of works considering the mutual information for this problem using the replica method from statistical physics. Here we put these considerations on a firm rigorous basis. First, we show, using a Guerra-Toninelli type interpolation, that the replica formula yields an upper bound to the exact mutual information. Secondly, for many relevant practical cases, we present a converse lower bound via a method that uses spatial coupling, state evolution analysis and the I-MMSE theorem. This yields a single letter formula for the mutual information and the minimal-mean-square error for random Gaussian linear estimation of all discrete bounded signals. In addition, we prove that the low complexity approximate message-passing algorithm is optimal outside of the so-called hard phase, in the sense that it asymptotically reaches the minimal-mean-square error. In this work spatial coupling is used primarily as a proof technique. However our results also prove two important features of spatially coupled noisy linear random Gaussian estimation. First there is no algorithmically hard phase. This means that for such systems approximate message-passing always reaches the minimal-mean-square error. Secondly, in a proper limit the mutual information associated to such systems is the same as the one of uncoupled linear random Gaussian estimation.

preprint2020arXiv

Overlap matrix concentration in optimal Bayesian inference

We consider models of Bayesian inference of signals with vectorial components of finite dimensionality. We show that, under a proper perturbation, these models are replica symmetric in the sense that the overlap matrix concentrates. The overlap matrix is the order parameter in these models and is directly related to error metrics such as minimum mean-square errors. Our proof is valid in the optimal Bayesian inference setting. This means that it relies on the assumption that the model and all its hyper-parameters are known so that the posterior distribution can be written exactly. Examples of important problems in high-dimensional inference and learning to which our results apply are low-rank tensor factorization, the committee machine neural network with a finite number of hidden neurons in the teacher-student scenario, or multi-layer versions of the generalized linear model.

preprint2020arXiv

The adaptive interpolation method for proving replica formulas. Applications to the Curie-Weiss and Wigner spike models

In this contribution we give a pedagogic introduction to the newly introduced adaptive interpolation method to prove in a simple and unified way replica formulas for Bayesian optimal inference problems. Many aspects of this method can already be explained at the level of the simple Curie-Weiss spin system. This provides a new method of solution for this model which does not appear to be known. We then generalize this analysis to a paradigmatic inference problem, namely rank-one matrix estimation, also refered to as the Wigner spike model in statistics. We give many pointers to the recent literature where the method has been succesfully applied.

preprint2019arXiv

Concentration of multi-overlaps for random ferromagnetic spin models

We consider ferromagnetic spin models on dilute random graphs and prove that, with suitable one-body infinitesimal perturbations added to the Hamiltonian, the multi-overlaps concentrate for all temperatures, both with respect to the thermal Gibbs average and the quenched randomness. Results of this nature have been known only for the lowest order overlaps, at high temperature or on the Nishimori line. Here we treat all multi-overlaps by a non-trivial application of Griffiths-Kelly-Sherman correlation inequalities. Our results apply in particular to the pure and mixed p-spin ferromagnets on random dilute Erdoes-Rényi hypergraphs. On physical grounds one expects that multi-overlap concentration directly implies the correctness of the cavity (or replica symmetric) formula for the pressure. The proof of this formula for the general p-spin ferromagnet on a random dilute hypergraph remains an open problem.

preprint2018arXiv

Entropy and mutual information in models of deep neural networks

We examine a class of deep learning models with a tractable method to compute information-theoretic quantities. Our contributions are three-fold: (i) We show how entropies and mutual informations can be derived from heuristic statistical physics methods, under the assumption that weight matrices are independent and orthogonally-invariant. (ii) We extend particular cases in which this result is known to be rigorously exact by providing a proof for two-layers networks with Gaussian random weights, using the recently introduced adaptive interpolation method. (iii) We propose an experiment framework with generative models of synthetic datasets, on which we train deep neural networks with a weight constraint designed so that the assumption in (i) is verified during learning. We study the behavior of entropies and mutual informations throughout learning and conclude that, in the proposed setting, the relationship between compression and generalization remains elusive.

preprint2017arXiv

Generalized Approximate Message-Passing Decoder for Universal Sparse Superposition Codes

Sparse superposition (SS) codes were originally proposed as a capacity-achieving communication scheme over the additive white Gaussian noise channel (AWGNC) [1]. Very recently, it was discovered that these codes are universal, in the sense that they achieve capacity over any memoryless channel under generalized approximate message-passing (GAMP) decoding [2], although this decoder has never been stated for SS codes. In this contribution we introduce the GAMP decoder for SS codes, we confirm empirically the universality of this communication scheme through its study on various channels and we provide the main analysis tools: state evolution and potential. We also compare the performance of GAMP with the Bayes-optimal MMSE decoder. We empirically illustrate that despite the presence of a phase transition preventing GAMP to reach the optimal performance, spatial coupling allows to boost the performance that eventually tends to capacity in a proper limit. We also prove that, in contrast with the AWGNC case, SS codes for binary input channels have a vanishing error floor in the limit of large codewords. Moreover, the performance of Hadamard-based encoders is assessed for practical implementations.

preprint2016arXiv

Proof of Threshold Saturation for Spatially Coupled Sparse Superposition Codes

Recently, a new class of codes, called sparse superposition or sparse regression codes, has been proposed for communication over the AWGN channel. It has been proven that they achieve capacity using power allocation and various forms of iterative decoding. Empirical evidence has also strongly suggested that the codes achieve capacity when spatial coupling and approximate message passing decoding are used, without need of power allocation. In this note we prove that state evolution (which tracks message passing) indeed saturates the potential threshold of the underlying code ensemble, which approaches in a proper limit the optimal threshold. Our proof uses ideas developed in the theory of low-density parity-check codes and compressive sensing.

preprint2016arXiv

Threshold Saturation of Spatially Coupled Sparse Superposition Codes for All Memoryless Channels

We recently proved threshold saturation for spatially coupled sparse superposition codes on the additive white Gaussian noise channel. Here we generalize our analysis to a much broader setting. We show for any memoryless channel that spatial coupling allows generalized approximate message-passing (GAMP) decoding to reach the potential (or Bayes optimal) threshold of the code ensemble. Moreover in the large input alphabet size limit: i) the GAMP algorithmic threshold of the underlying (or uncoupled) code ensemble is simply expressed as a Fisher information; ii) the potential threshold tends to Shannon's capacity. Although we focus on coding for sake of coherence with our previous results, the framework and methods are very general and hold for a wide class of generalized estimation problems with random linear mixing.

preprint2015arXiv

Approximate message-passing with spatially coupled structured operators, with applications to compressed sensing and sparse superposition codes

We study the behavior of Approximate Message-Passing, a solver for linear sparse estimation problems such as compressed sensing, when the i.i.d matrices -for which it has been specifically designed- are replaced by structured operators, such as Fourier and Hadamard ones. We show empirically that after proper randomization, the structure of the operators does not significantly affect the performances of the solver. Furthermore, for some specially designed spatially coupled operators, this allows a computationally fast and memory efficient reconstruction in compressed sensing up to the information-theoretical limit. We also show how this approach can be applied to sparse superposition codes, allowing the Approximate Message-Passing decoder to perform at large rates for moderate block length.

preprint2015arXiv

Scampi: a robust approximate message-passing framework for compressive imaging

Reconstruction of images from noisy linear measurements is a core problem in image processing, for which convex optimization methods based on total variation (TV) minimization have been the long-standing state-of-the-art. We present an alternative probabilistic reconstruction procedure based on approximate message-passing, Scampi, which operates in the compressive regime, where the inverse imaging problem is underdetermined. While the proposed method is related to the recently proposed GrAMPA algorithm of Borgerding, Schniter, and Rangan, we further develop the probabilistic approach to compressive imaging by introducing an expectation-maximizaiton learning of model parameters, making the Scampi robust to model uncertainties. Additionally, our numerical experiments indicate that Scampi can provide reconstruction performance superior to both GrAMPA as well as convex approaches to TV reconstruction. Finally, through exhaustive best-case experiments, we show that in many cases the maximal performance of both Scampi and convex TV can be quite close, even though the approaches are a prori distinct. The theoretical reasons for this correspondence remain an open question. Nevertheless, the proposed algorithm remains more practical, as it requires far less parameter tuning to perform optimally.

preprint2015arXiv

Statistical physics and approximate message-passing algorithms for sparse linear estimation problems in signal processing and coding theory

This thesis is interested in the application of statistical physics methods and inference to sparse linear estimation problems. The main tools are the graphical models and approximate message-passing algorithm together with the cavity method. We will also use the replica method of statistical physics of disordered systems which allows to associate to the studied problems a cost function referred as the potential of free entropy in physics. It allows to predict the different phases of typical complexity of the problem as a function of external parameters such as the noise level or the number of measurements one has about the signal: the inference can be typically easy, hard or impossible. We will see that the hard phase corresponds to a regime of coexistence of the actual solution together with another unwanted solution of the message passing equations. In this phase, it represents a metastable state which is not the true equilibrium solution. This phenomenon can be linked to supercooled water blocked in the liquid state below its freezing critical temperature. We will use a method that allows to overcome the metastability mimicing the strategy adopted by nature itself for supercooled water: the nucleation and spatial coupling. In supercooled water, a weak localized perturbation is enough to create a crystal nucleus that will propagate in all the medium thanks to the physical couplings between closeby atoms. The same process will help the algorithm to find the signal, thanks to the introduction of a nucleus containing local information about the signal. It will then spread as a "reconstruction wave" similar to the crystal in the water. After an introduction to statistical inference and sparse linear estimation, we will introduce the necessary tools. Then we will move to applications of these notions to signal processing and coding theory problems.

preprint2014arXiv

Error correcting codes and spatial coupling

These are notes from the lecture of Rüdiger Urbanke given at the autumn school "Statistical Physics, Optimization, Inference, and Message-Passing Algorithms", that took place in Les Houches, France from Monday September 30th, 2013, till Friday October 11th, 2013. The school was organized by Florent Krzakala from UPMC and ENS Paris, Federico Ricci-Tersenghi from La Sapienza Roma, Lenka Zdeborovà from CEA Saclay and CNRS, and Riccardo Zecchina from Politecnico Torino. The first three sections cover the basics of polar codes and low density parity check codes. In the last three sections, we see how the spatial coupling helps belief propagation decoding.

preprint2014arXiv

Replica Analysis and Approximate Message Passing Decoder for Superposition Codes

Superposition codes are efficient for the Additive White Gaussian Noise channel. We provide here a replica analysis of the performances of these codes for large signals. We also consider a Bayesian Approximate Message Passing decoder based on a belief-propagation approach, and discuss its performance using the density evolution technic. Our main findings are 1) for the sizes we can access, the message-passing decoder outperforms other decoders studied in the literature 2) its performance is limited by a sharp phase transition and 3) while these codes reach capacity as $B$ (a crucial parameter in the code) increases, the performance of the message passing decoder worsen as the phase transition goes to lower rates.

preprint2013arXiv

Robust error correction for real-valued signals via message-passing decoding and spatial coupling

We revisit the error correction scheme of real-valued signals when the codeword is corrupted by gross errors on a fraction of entries and a small noise on all the entries. Combining the recent developments of approximate message passing and the spatially-coupled measurement matrix in compressed sensing we show that the error correction and its robustness towards noise can be enhanced considerably. We discuss the performance in the large signal limit using previous results on state evolution, as well as for finite size signals through numerical simulations. Even for relatively small sizes, the approach proposed here outperforms convex-relaxation-based decoders.

preprint2013arXiv

The hard-core model on random graphs revisited

We revisit the classical hard-core model, also known as independent set and dual to vertex cover problem, where one puts particles with a first-neighbor hard-core repulsion on the vertices of a random graph. Although the case of random graphs with small and very large average degrees respectively are quite well understood, they yield qualitatively different results and our aim here is to reconciliate these two cases. We revisit results that can be obtained using the (heuristic) cavity method and show that it provides a closed-form conjecture for the exact density of the densest packing on random regular graphs with degree K>=20, and that for K>16 the nature of the phase transition is the same as for large K. This also shows that the hard-code model is the simplest mean-field lattice model for structural glasses and jamming.

Jean Barbier

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

Sharp feature-learning transitions and Bayes-optimal neural scaling laws in extensive-width networks

Sparse superposition codes under VAMP decoding with generic rotational invariant coding matrices

Sparse superposition codes with rotational invariant coding matrices for memoryless channels

Statistical limits of dictionary learning: random matrix theory and the spectral replica method

Strong replica symmetry in high-dimensional optimal Bayesian inference

The price of ignorance: how much does it cost to forget noise structure in low-rank matrix estimation?

Information-theoretic limits of a multiview low-rank symmetric spiked matrix model

Mutual Information and Optimality of Approximate Message-Passing in Random Linear Estimation

Overlap matrix concentration in optimal Bayesian inference

The adaptive interpolation method for proving replica formulas. Applications to the Curie-Weiss and Wigner spike models

Concentration of multi-overlaps for random ferromagnetic spin models

Entropy and mutual information in models of deep neural networks

Generalized Approximate Message-Passing Decoder for Universal Sparse Superposition Codes

Proof of Threshold Saturation for Spatially Coupled Sparse Superposition Codes

Threshold Saturation of Spatially Coupled Sparse Superposition Codes for All Memoryless Channels

Approximate message-passing with spatially coupled structured operators, with applications to compressed sensing and sparse superposition codes

Scampi: a robust approximate message-passing framework for compressive imaging

Statistical physics and approximate message-passing algorithms for sparse linear estimation problems in signal processing and coding theory

Error correcting codes and spatial coupling

Replica Analysis and Approximate Message Passing Decoder for Superposition Codes

Robust error correction for real-valued signals via message-passing decoding and spatial coupling

The hard-core model on random graphs revisited