Source author record

Florent Krzakala

Florent Krzakala appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

89works

29topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Asymptotics of Non-Convex Generalized Linear Models in High-Dimensions: A proof of the replica formula

The analytic characterization of the high-dimensional behavior of optimization for Generalized Linear Models (GLMs) with Gaussian data has been a central focus in statistics and probability in recent years. While convex cases, such as the LASSO, ridge regression, and logistic regression, have been extensively studied using a variety of techniques, the non-convex case remains far less understood despite its significance. A non-rigorous statistical physics framework has provided remarkable predictions for the behavior of high-dimensional optimization problems, but rigorously establishing their validity for non-convex problems has remained a fundamental challenge. In this work, we address this challenge by developing a systematic framework that rigorously proves replica-symmetric formulas for non-convex GLMs and precisely determines the conditions under which these formulas are valid. Remarkably, the rigorous replica-symmetric predictions align exactly with the conjectures made by physicists, and the so-called replicon condition. The originality of our approach lies in connecting two powerful theoretical tools: the Gaussian Min-Max Theorem, which we use to provide precise lower bounds, and Approximate Message Passing (AMP), which is shown to achieve these bounds algorithmically. We demonstrate the utility of this framework through significant applications: (i) by proving the optimality of the Tukey loss over the more commonly used Huber loss under a $\varepsilon$ contaminated data model, (ii) establishing the optimality of negative regularization in high-dimensional non-convex regression and (iii) characterizing the performance limits of linearized AMP algorithms. By rigorously validating statistical physics predictions in non-convex settings, we aim to open new pathways for analyzing increasingly complex optimization landscapes beyond the convex regime.

preprint2026arXiv

Deep Learning as Neural Low-Degree Filtering: A Spectral Theory of Hierarchical Feature Learning

Understanding how deep neural networks learn useful internal representations from data remains a central open problem in the theory of deep learning. We introduce Neural Low-Degree Filtering (Neural LoFi), a stylized limit of gradient-based training in which hierarchical feature learning becomes an explicit iterative spectral procedure. In this limit, the dynamics at each layer decouple: given the current representation, the next layer selects directions with maximal accessible low-degree correlation to the label. This yields a tractable surrogate mechanism for deep learning, together with a natural kernel-space interpretation. Neural LoFi provides a mathematically explicit framework for studying multi-layer feature learning beyond the lazy regime. It predicts how representations are selected layer by layer, explains how emergence of concepts arises with given sample complexity,and gives a concrete mechanism by which depth progressively constructs new features from old ones through low-degree compositionality. We complement the theory with mechanistic experiments on fully connected and convolutional architectures, showing that Neural LoFi improves over lazy random-feature baselines, recovers meaningful structured filters, and predicts representations aligned with early gradient-descent feature discovery with real datasets.

preprint2026arXiv

Scaling Laws from Sequential Feature Recovery: A Solvable Hierarchical Model

We propose a simple mechanism by which scaling laws emerge from feature learning in multi-layer networks. We study a high-dimensional hierarchical target that is a globally high-degree function, but that can be represented by a combination of latent compositional features whose weights decrease as a power law. We show that a layer-wise spectral algorithm adapted to this compositional structure achieves improved scaling relative to shallow, non-adaptive methods, and recovers the latent directions sequentially: strong features become detectable at small sample sizes, while weaker features require more data. We prove sharp feature-wise recovery thresholds and show that aggregating these transitions yields an explicit power-law decay of the prediction error. Technically, the analysis relies on random matrix methods and a resolvent-based perturbation argument, which gives matching upper and lower bounds for individual eigenvector recovery beyond what standard gap-based perturbation bounds provide. Numerical experiments confirm the predicted sequential recovery, finite-size smoothing of the thresholds, and separation from non-hierarchical kernel baselines. Together, these results show how smooth scaling laws can emerge from a cascade of sharp feature-learning transitions.

preprint2024arXiv

A High Dimensional Statistical Model for Adversarial Training: Geometry and Trade-Offs

This work investigates adversarial training in the context of margin-based linear classifiers in the high-dimensional regime where the dimension $d$ and the number of data points $n$ diverge with a fixed ratio $α= n / d$. We introduce a tractable mathematical model where the interplay between the data and adversarial attacker geometries can be studied, while capturing the core phenomenology observed in the adversarial robustness literature. Our main theoretical contribution is an exact asymptotic description of the sufficient statistics for the adversarial empirical risk minimiser, under generic convex and non-increasing losses for a Block Feature Model. Our result allow us to precisely characterise which directions in the data are associated with a higher generalisation/robustness trade-off, as defined by a robustness and a usefulness metric. We show that the the presence of multiple different feature types is crucial to the high sample complexity performances of adversarial training. In particular, we unveil the existence of directions which can be defended without penalising accuracy. Finally, we show the advantage of defending non-robust features during training, identifying a uniform protection as an inherently effective defence mechanism.

preprint2022arXiv

Bayesian Inference with Nonlinear Generative Models: Comments on Secure Learning

Unlike the classical linear model, nonlinear generative models have been addressed sparsely in the literature of statistical learning. This work aims to bringing attention to these models and their secrecy potential. To this end, we invoke the replica method to derive the asymptotic normalized cross entropy in an inverse probability problem whose generative model is described by a Gaussian random field with a generic covariance function. Our derivations further demonstrate the asymptotic statistical decoupling of the Bayesian estimator and specify the decoupled setting for a given nonlinear model. The replica solution depicts that strictly nonlinear models establish an all-or-nothing phase transition: There exists a critical load at which the optimal Bayesian inference changes from perfect to an uncorrelated learning. Based on this finding, we design a new secure coding scheme which achieves the secrecy capacity of the wiretap channel. This interesting result implies that strictly nonlinear generative models are perfectly secured without any secure coding. We justify this latter statement through the analysis of an illustrative model for perfectly secure and reliable inference.

preprint2022arXiv

Perturbative construction of mean-field equations in extensive-rank matrix factorization and denoising

Factorization of matrices where the rank of the two factors diverges linearly with their sizes has many applications in diverse areas such as unsupervised representation learning, dictionary learning or sparse coding. We consider a setting where the two factors are generated from known component-wise independent prior distributions, and the statistician observes a (possibly noisy) component-wise function of their matrix product. In the limit where the dimensions of the matrices tend to infinity, but their ratios remain fixed, we expect to be able to derive closed form expressions for the optimal mean squared error on the estimation of the two factors. However, this remains a very involved mathematical and algorithmic problem. A related, but simpler, problem is extensive-rank matrix denoising, where one aims to reconstruct a matrix with extensive but usually small rank from noisy measurements. In this paper, we approach both these problems using high-temperature expansions at fixed order parameters. This allows to clarify how previous attempts at solving these problems failed at finding an asymptotically exact solution. We provide a systematic way to derive the corrections to these existing approximations, taking into account the structure of correlations particular to the problem. Finally, we illustrate our approach in detail on the case of extensive-rank matrix denoising. We compare our results with known optimal rotationally-invariant estimators, and show how exact asymptotic calculations of the minimal error can be performed using extensive-rank matrix integrals.

preprint2022arXiv

Secure Coding via Gaussian Random Fields

Inverse probability problems whose generative models are given by strictly nonlinear Gaussian random fields show the all-or-nothing behavior: There exists a critical rate at which Bayesian inference exhibits a phase transition. Below this rate, the optimal Bayesian estimator recovers the data perfectly, and above it the recovered data becomes uncorrelated. This study uses the replica method from the theory of spin glasses to show that this critical rate is the channel capacity. This interesting finding has a particular application to the problem of secure transmission: A strictly nonlinear Gaussian random field along with random binning can be used to securely encode a confidential message in a wiretap channel. Our large-system characterization demonstrates that this secure coding scheme asymptotically achieves the secrecy capacity of the Gaussian wiretap channel.

preprint2021arXiv

Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification

We analyze in a closed form the learning dynamics of stochastic gradient descent (SGD) for a single-layer neural network classifying a high-dimensional Gaussian mixture where each cluster is assigned one of two labels. This problem provides a prototype of a non-convex loss landscape with interpolating regimes and a large generalization gap. We define a particular stochastic process for which SGD can be extended to a continuous-time limit that we call stochastic gradient flow. In the full-batch limit, we recover the standard gradient flow. We apply dynamical mean-field theory from statistical physics to track the dynamics of the algorithm in the high-dimensional limit via a self-consistent stochastic process. We explore the performance of the algorithm as a function of the control parameters shedding light on how it navigates the loss landscape.

preprint2021arXiv

The Gaussian equivalence of generative models for learning with shallow neural networks

Understanding the impact of data structure on the computational tractability of learning is a key challenge for the theory of neural networks. Many theoretical works do not explicitly model training data, or assume that inputs are drawn component-wise independently from some simple probability distribution. Here, we go beyond this simple paradigm by studying the performance of neural networks trained on data drawn from pre-trained generative models. This is possible due to a Gaussian equivalence stating that the key metrics of interest, such as the training and test errors, can be fully captured by an appropriately chosen Gaussian model. We provide three strands of rigorous, analytical and numerical evidence corroborating this equivalence. First, we establish rigorous conditions for the Gaussian equivalence to hold in the case of single-layer generative models, as well as deterministic rates for convergence in distribution. Second, we leverage this equivalence to derive a closed set of equations describing the generalisation performance of two widely studied machine learning problems: two-layer neural networks trained using one-pass stochastic gradient descent, and full-batch pre-learned features or kernel methods. Finally, we perform experiments demonstrating how our theory applies to deep, pre-trained generative models. These results open a viable path to the theoretical study of machine learning models with realistic data.

preprint2020arXiv

Asymptotic errors for convex penalized linear regression beyond Gaussian matrices

We consider the problem of learning a coefficient vector $x_{0}$ in $R^{N}$ from noisy linear observations $y=Fx_{0}+w$ in $R^{M}$ in the high dimensional limit $M,N$ to infinity with $α=M/N$ fixed. We provide a rigorous derivation of an explicit formula -- first conjectured using heuristic methods from statistical physics -- for the asymptotic mean squared error obtained by penalized convex regression estimators such as the LASSO or the elastic net, for a class of very generic random matrices corresponding to rotationally invariant data matrices with arbitrary spectrum. The proof is based on a convergence analysis of an oracle version of vector approximate message-passing (oracle-VAMP) and on the properties of its state evolution equations. Our method leverages on and highlights the link between vector approximate message-passing, Douglas-Rachford splitting and proximal descent algorithms, extending previous results obtained with i.i.d. matrices for a large class of problems. We illustrate our results on some concrete examples and show that even though they are asymptotic, our predictions agree remarkably well with numerics even for very moderate sizes.

preprint2020arXiv

Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval

Despite the widespread use of gradient-based algorithms for optimizing high-dimensional non-convex functions, understanding their ability of finding good minima instead of being trapped in spurious ones remains to a large extent an open problem. Here we focus on gradient flow dynamics for phase retrieval from random measurements. When the ratio of the number of measurements over the input dimension is small the dynamics remains trapped in spurious minima with large basins of attraction. We find analytically that above a critical ratio those critical points become unstable developing a negative direction toward the signal. By numerical experiments we show that in this regime the gradient flow algorithm is not trapped; it drifts away from the spurious critical points along the unstable direction and succeeds in finding the global minimum. Using tools from statistical physics we characterize this phenomenon, which is related to a BBP-type transition in the Hessian of the spurious minima.

preprint2020arXiv

Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures

Despite being the workhorse of deep learning, the backpropagation algorithm is no panacea. It enforces sequential layer updates, thus preventing efficient parallelization of the training process. Furthermore, its biological plausibility is being challenged. Alternative schemes have been devised; yet, under the constraint of synaptic asymmetry, none have scaled to modern deep learning tasks and architectures. Here, we challenge this perspective, and study the applicability of Direct Feedback Alignment to neural view synthesis, recommender systems, geometric learning, and natural language processing. In contrast with previous studies limited to computer vision tasks, our findings show that it successfully trains a large range of state-of-the-art deep learning architectures, with performance close to fine-tuned backpropagation. At variance with common beliefs, our work supports that challenging tasks can be tackled in the absence of weight transport.

preprint2020arXiv

Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime

Deep neural networks can achieve remarkable generalization performances while interpolating the training data perfectly. Rather than the U-curve emblematic of the bias-variance trade-off, their test error often follows a "double descent" - a mark of the beneficial role of overparametrization. In this work, we develop a quantitative theory for this phenomenon in the so-called lazy learning regime of neural networks, by considering the problem of learning a high-dimensional function with random features regression. We obtain a precise asymptotic expression for the bias-variance decomposition of the test error, and show that the bias displays a phase transition at the interpolation threshold, beyond which it remains constant. We disentangle the variances stemming from the sampling of the dataset, from the additive noise corrupting the labels, and from the initialization of the weights. Following up on Geiger et al. 2019, we first show that the latter two contributions are the crux of the double descent: they lead to the overfitting peak at the interpolation threshold and to the decay of the test error upon overparametrization. We then quantify how they are suppressed by ensemble averaging the outputs of K independently initialized estimators. When K is sent to infinity, the test error remains constant beyond the interpolation threshold. We further compare the effects of overparametrizing, ensembling and regularizing. Finally, we present numerical experiments on classic deep learning setups to show that our results hold qualitatively in realistic lazy learning scenarios.

preprint2020arXiv

Exact asymptotics for phase retrieval and compressed sensing with random generative priors

We consider the problem of compressed sensing and of (real-valued) phase retrieval with random measurement matrix. We derive sharp asymptotics for the information-theoretically optimal performance and for the best known polynomial algorithm for an ensemble of generative priors consisting of fully connected deep neural networks with random weight matrices and arbitrary activations. We compare the performance to sparse separable priors and conclude that generative priors might be advantageous in terms of algorithmic performance. In particular, while sparsity does not allow to perform compressive phase retrieval efficiently close to its information-theoretic limit, it is found that under the random generative prior compressed phase retrieval becomes tractable.

preprint2020arXiv

Generalisation error in learning with random features and the hidden manifold model

We study generalised linear regression and classification for a synthetically generated dataset encompassing different problems of interest, such as learning with random features, neural networks in the lazy training regime, and the hidden manifold model. We consider the high-dimensional regime and using the replica method from statistical physics, we provide a closed-form expression for the asymptotic generalisation performance in these problems, valid in both the under- and over-parametrised regimes and for a broad choice of generalised linear model loss functions. In particular, we show how to obtain analytically the so-called double descent behaviour for logistic regression with a peak at the interpolation threshold, we illustrate the superiority of orthogonal against random Gaussian projections in learning with random features, and discuss the role played by correlations in the data generated by the hidden manifold model. Beyond the interest in these particular problems, the theoretical formalism introduced in this manuscript provides a path to further extensions to more complex tasks.

preprint2020arXiv

Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization

We consider a commonly studied supervised classification of a synthetic dataset whose labels are generated by feeding a one-layer neural network with random iid inputs. We study the generalization performances of standard classifiers in the high-dimensional regime where $α=n/d$ is kept finite in the limit of a high dimension $d$ and number of samples $n$. Our contribution is three-fold: First, we prove a formula for the generalization error achieved by $\ell_2$ regularized classifiers that minimize a convex loss. This formula was first obtained by the heuristic replica method of statistical physics. Secondly, focussing on commonly used loss functions and optimizing the $\ell_2$ regularization strength, we observe that while ridge regression performance is poor, logistic and hinge regression are surprisingly able to approach the Bayes-optimal generalization error extremely closely. As $α\to \infty$ they lead to Bayes-optimal rates, a fact that does not follow from predictions of margin-based generalization error bounds. Third, we design an optimal loss and regularizer that provably leads to Bayes-optimal generalization error.

preprint2020arXiv

High-temperature Expansions and Message Passing Algorithms

Improved mean-field technics are a central theme of statistical physics methods applied to inference and learning. We revisit here some of these methods using high-temperature expansions for disordered systems initiated by Plefka, Georges and Yedidia. We derive the Gibbs free entropy and the subsequent self-consistent equations for a generic class of statistical models with correlated matrices and show in particular that many classical approximation schemes, such as adaptive TAP, Expectation-Consistency, or the approximations behind the Vector Approximate Message Passing algorithm all rely on the same assumptions, that are also at the heart of high-temperature expansions. We focus on the case of rotationally invariant random coupling matrices in the `high-dimensional' limit in which the number of samples and the dimension are both large, but with a fixed ratio. This encapsulates many widely studied models, such as Restricted Boltzmann Machines or Generalized Linear Models with correlated data matrices. In this general setting, we show that all the approximation schemes described before are equivalent, and we conjecture that they are exact in the thermodynamic limit in the replica symmetric phases. We achieve this conclusion by resummation of the infinite perturbation series, which generalizes a seminal result of Parisi and Potters. A rigorous derivation of this conjecture is an interesting mathematical challenge. On the way to these conclusions, we uncover several diagrammatical results in connection with free probability and random matrix theory, that are interesting independently of the rest of our work.

preprint2020arXiv

Large-Scale Optical Reservoir Computing for Spatiotemporal Chaotic Systems Prediction

Reservoir computing is a relatively recent computational paradigm that originates from a recurrent neural network and is known for its wide range of implementations using different physical technologies. Large reservoirs are very hard to obtain in conventional computers, as both the computation complexity and memory usage grow quadratically. We propose an optical scheme performing reservoir computing over very large networks potentially being able to host several millions of fully connected photonic nodes thanks to its intrinsic properties of parallelism and scalability. Our experimental studies confirm that, in contrast to conventional computers, the computation time of our optical scheme is only linearly dependent on the number of photonic nodes of the network, which is due to electronic overheads, while the optical part of computation remains fully parallel and independent of the reservoir size. To demonstrate the scalability of our optical scheme, we perform for the first time predictions on large spatiotemporal chaotic datasets obtained from the Kuramoto-Sivashinsky equation using optical reservoirs with up to 50 000 optical nodes. Our results are extremely challenging for conventional von Neumann machines, and they significantly advance the state of the art of unconventional reservoir computing approaches, in general.

preprint2020arXiv

Light-in-the-loop: using a photonics co-processor for scalable training of neural networks

As neural networks grow larger and more complex and data-hungry, training costs are skyrocketing. Especially when lifelong learning is necessary, such as in recommender systems or self-driving cars, this might soon become unsustainable. In this study, we present the first optical co-processor able to accelerate the training phase of digitally-implemented neural networks. We rely on direct feedback alignment as an alternative to backpropagation, and perform the error projection step optically. Leveraging the optical random projections delivered by our co-processor, we demonstrate its use to train a neural network for handwritten digits recognition.

preprint2020arXiv

Marvels and Pitfalls of the Langevin Algorithm in Noisy High-dimensional Inference

Gradient-descent-based algorithms and their stochastic versions have widespread applications in machine learning and statistical inference. In this work we perform an analytic study of the performances of one of them, the Langevin algorithm, in the context of noisy high-dimensional inference. We employ the Langevin algorithm to sample the posterior probability measure for the spiked matrix-tensor model. The typical behaviour of this algorithm is described by a system of integro-differential equations that we call the Langevin state evolution, whose solution is compared with the one of the state evolution of approximate message passing (AMP). Our results show that, remarkably, the algorithmic threshold of the Langevin algorithm is sub-optimal with respect to the one given by AMP. We conjecture this phenomenon to be due to the residual glassiness present in that region of parameters. Finally we show how a landscape-annealing protocol, that uses the Langevin algorithm but violate the Bayes-optimality condition, can approach the performance of AMP.

preprint2020arXiv

Mutual Information and Optimality of Approximate Message-Passing in Random Linear Estimation

We consider the estimation of a signal from the knowledge of its noisy linear random Gaussian projections. A few examples where this problem is relevant are compressed sensing, sparse superposition codes, and code division multiple access. There has been a number of works considering the mutual information for this problem using the replica method from statistical physics. Here we put these considerations on a firm rigorous basis. First, we show, using a Guerra-Toninelli type interpolation, that the replica formula yields an upper bound to the exact mutual information. Secondly, for many relevant practical cases, we present a converse lower bound via a method that uses spatial coupling, state evolution analysis and the I-MMSE theorem. This yields a single letter formula for the mutual information and the minimal-mean-square error for random Gaussian linear estimation of all discrete bounded signals. In addition, we prove that the low complexity approximate message-passing algorithm is optimal outside of the so-called hard phase, in the sense that it asymptotically reaches the minimal-mean-square error. In this work spatial coupling is used primarily as a proof technique. However our results also prove two important features of spatially coupled noisy linear random Gaussian estimation. First there is no algorithmically hard phase. This means that for such systems approximate message-passing always reaches the minimal-mean-square error. Secondly, in a proper limit the mutual information associated to such systems is the same as the one of uncoupled linear random Gaussian estimation.

preprint2020arXiv

Passed & Spurious: Descent Algorithms and Local Minima in Spiked Matrix-Tensor Models

In this work we analyse quantitatively the interplay between the loss landscape and performance of descent algorithms in a prototypical inference problem, the spiked matrix-tensor model. We study a loss function that is the negative log-likelihood of the model. We analyse the number of local minima at a fixed distance from the signal/spike with the Kac-Rice formula, and locate trivialization of the landscape at large signal-to-noise ratios. We evaluate in a closed form the performance of a gradient flow algorithm using integro-differential PDEs as developed in physics of disordered systems for the Langevin dynamics. We analyze the performance of an approximate message passing algorithm estimating the maximum likelihood configuration via its state evolution. We conclude by comparing the above results: while we observe a drastic slow down of the gradient flow dynamics even in the region where the landscape is trivial, both the analyzed algorithms are shown to perform well even in the part of the region of parameters where spurious local minima are present.

preprint2020arXiv

Phase retrieval in high dimensions: Statistical and computational phase transitions

We consider the phase retrieval problem of reconstructing a $n$-dimensional real or complex signal $\mathbf{X}^{\star}$ from $m$ (possibly noisy) observations $Y_μ= | \sum_{i=1}^n Φ_{μi} X^{\star}_i/\sqrt{n}|$, for a large class of correlated real and complex random sensing matrices $\mathbfΦ$, in a high-dimensional setting where $m,n\to\infty$ while $α= m/n=Θ(1)$. First, we derive sharp asymptotics for the lowest possible estimation error achievable statistically and we unveil the existence of sharp phase transitions for the weak- and full-recovery thresholds as a function of the singular values of the matrix $\mathbfΦ$. This is achieved by providing a rigorous proof of a result first obtained by the replica method from statistical mechanics. In particular, the information-theoretic transition to perfect recovery for full-rank matrices appears at $α=1$ (real case) and $α=2$ (complex case). Secondly, we analyze the performance of the best-known polynomial time algorithm for this problem -- approximate message-passing -- establishing the existence of a statistical-to-algorithmic gap depending, again, on the spectral properties of $\mathbfΦ$. Our work provides an extensive classification of the statistical and algorithmic thresholds in high-dimensional phase retrieval for a broad class of random matrices.

preprint2020arXiv

Rademacher complexity and spin glasses: A link between the replica and statistical theories of learning

Statistical learning theory provides bounds of the generalization gap, using in particular the Vapnik-Chervonenkis dimension and the Rademacher complexity. An alternative approach, mainly studied in the statistical physics literature, is the study of generalization in simple synthetic-data models. Here we discuss the connections between these approaches and focus on the link between the Rademacher complexity in statistical learning and the theories of generalization for typical-case synthetic models from statistical physics, involving quantities known as Gardner capacity and ground state energy. We show that in these models the Rademacher complexity is closely related to the ground state energy computed by replica theories. Using this connection, one may reinterpret many results of the literature as rigorous Rademacher bounds in a variety of models in the high-dimensional statistics limit. Somewhat surprisingly, we also show that statistical learning theory provides predictions for the behavior of the ground-state energies in some full replica symmetry breaking models.

preprint2020arXiv

Reservoir Computing meets Recurrent Kernels and Structured Transforms

Reservoir Computing is a class of simple yet efficient Recurrent Neural Networks where internal weights are fixed at random and only a linear output layer is trained. In the large size limit, such random neural networks have a deep connection with kernel methods. Our contributions are threefold: a) We rigorously establish the recurrent kernel limit of Reservoir Computing and prove its convergence. b) We test our models on chaotic time series prediction, a classic but challenging benchmark in Reservoir Computing, and show how the Recurrent Kernel is competitive and computationally efficient when the number of data points remains moderate. c) When the number of samples is too large, we leverage the success of structured Random Features for kernel approximation by introducing Structured Reservoir Computing. The two proposed methods, Recurrent Kernel and Structured Reservoir Computing, turn out to be much faster and more memory-efficient than conventional Reservoir Computing.

preprint2020arXiv

Who is Afraid of Big Bad Minima? Analysis of Gradient-Flow in a Spiked Matrix-Tensor Model

Gradient-based algorithms are effective for many machine learning tasks, but despite ample recent effort and some progress, it often remains unclear why they work in practice in optimising high-dimensional non-convex functions and why they find good minima instead of being trapped in spurious ones. Here we present a quantitative theory explaining this behaviour in a spiked matrix-tensor model. Our framework is based on the Kac-Rice analysis of stationary points and a closed-form analysis of gradient-flow originating from statistical physics. We show that there is a well defined region of parameters where the gradient-flow algorithm finds a good global minimum despite the presence of exponentially many spurious local minima. We show that this is achieved by surfing on saddles that have strong negative direction towards the global minima, a phenomenon that is connected to a BBP-type threshold in the Hessian describing the critical points of the landscapes.

preprint2019arXiv

Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup

Deep neural networks achieve stellar generalisation even when they have enough parameters to easily fit all their training data. We study this phenomenon by analysing the dynamics and the performance of over-parameterised two-layer neural networks in the teacher-student setup, where one network, the student, is trained on data generated by another network, called the teacher. We show how the dynamics of stochastic gradient descent (SGD) is captured by a set of differential equations and prove that this description is asymptotically exact in the limit of large inputs. Using this framework, we calculate the final generalisation error of student networks that have more parameters than their teachers. We find that the final generalisation error of the student increases with network size when training only the first layer, but stays constant or even decreases with size when training both layers. We show that these different behaviours have their root in the different solutions SGD finds for different activation functions. Our results indicate that achieving good generalisation in neural networks goes beyond the properties of SGD alone and depends on the interplay of at least the algorithm, the model architecture, and the data set.

preprint2019arXiv

Kernel computations from large-scale random features obtained by Optical Processing Units

Approximating kernel functions with random features (RFs)has been a successful application of random projections for nonparametric estimation. However, performing random projections presents computational challenges for large-scale problems. Recently, a new optical hardware called Optical Processing Unit (OPU) has been developed for fast and energy-efficient computation of large-scale RFs in the analog domain. More specifically, the OPU performs the multiplication of input vectors by a large random matrix with complex-valued i.i.d. Gaussian entries, followed by the application of an element-wise squared absolute value operation - this last nonlinearity being intrinsic to the sensing process. In this paper, we show that this operation results in a dot-product kernel that has connections to the polynomial kernel, and we extend this computation to arbitrary powers of the feature map. Experiments demonstrate that the OPU kernel and its RF approximation achieve competitive performance in applications using kernel ridge regression and transfer learning for image classification. Crucially, thanks to the use of the OPU, these results are obtained with time and energy savings.

preprint2019arXiv

On the Universality of Noiseless Linear Estimation with Respect to the Measurement Matrix

In a noiseless linear estimation problem, one aims to reconstruct a vector x* from the knowledge of its linear projections y=Phi x*. There have been many theoretical works concentrating on the case where the matrix Phi is a random i.i.d. one, but a number of heuristic evidence suggests that many of these results are universal and extend well beyond this restricted case. Here we revisit this problematic through the prism of development of message passing methods, and consider not only the universality of the l1 transition, as previously addressed, but also the one of the optimal Bayesian reconstruction. We observed that the universality extends to the Bayes-optimal minimum mean-squared (MMSE) error, and to a range of structured matrices.

preprint2019arXiv

The spiked matrix model with generative priors

Using a low-dimensional parametrization of signals is a generic and powerful way to enhance performance in signal processing and statistical inference. A very popular and widely explored type of dimensionality reduction is sparsity; another type is generative modelling of signal distributions. Generative models based on neural networks, such as GANs or variational auto-encoders, are particularly performant and are gaining on applicability. In this paper we study spiked matrix models, where a low-rank matrix is observed through a noisy channel. This problem with sparse structure of the spikes has attracted broad attention in the past literature. Here, we replace the sparsity assumption by generative modelling, and investigate the consequences on statistical and algorithmic properties. We analyze the Bayes-optimal performance under specific generative models for the spike. In contrast with the sparsity assumption, we do not observe regions of parameters where statistical performance is superior to the best known algorithmic performance. We show that in the analyzed cases the approximate message passing algorithm is able to reach optimal performance. We also design enhanced spectral algorithms and analyze their performance and thresholds using random matrix theory, showing their superiority to the classical principal component analysis. We complement our theoretical results by illustrating the performance of the spectral algorithms when the spikes come from real datasets.

preprint2018arXiv

Entropy and mutual information in models of deep neural networks

We examine a class of deep learning models with a tractable method to compute information-theoretic quantities. Our contributions are three-fold: (i) We show how entropies and mutual informations can be derived from heuristic statistical physics methods, under the assumption that weight matrices are independent and orthogonally-invariant. (ii) We extend particular cases in which this result is known to be rigorously exact by providing a proof for two-layers networks with Gaussian random weights, using the recently introduced adaptive interpolation method. (iii) We propose an experiment framework with generative models of synthetic datasets, on which we train deep neural networks with a weight constraint designed so that the assumption in (i) is verified during learning. We study the behavior of entropies and mutual informations throughout learning and conclude that, in the proposed setting, the relationship between compression and generalization remains elusive.

preprint2018arXiv

Fundamental limits of detection in the spiked Wigner model

We study the fundamental limits of detecting the presence of an additive rank-one perturbation, or spike, to a Wigner matrix. When the spike comes from a prior that is i.i.d. across coordinates, we prove that the log-likelihood ratio of the spiked model against the non-spiked one is asymptotically normal below a certain reconstruction threshold which is not necessarily of a "spectral" nature, and that it is degenerate above. This establishes the maximal region of contiguity between the planted and null models. It is known that this threshold also marks a phase transition for estimating the spike: the latter task is possible above the threshold and impossible below. Therefore, both estimation and detection undergo the same transition in this random matrix model. We also provide further information about the performance of the optimal test. Our proofs are based on Gaussian interpolation methods and a rigorous incarnation of the cavity method, as devised by Guerra and Talagrand in their study of the Sherrington--Kirkpatrick spin-glass model.

preprint2017arXiv

Decoding from Pooled Data: Phase Transitions of Message Passing

We consider the problem of decoding a discrete signal of categorical variables from the observation of several histograms of pooled subsets of it. We present an Approximate Message Passing (AMP) algorithm for recovering the signal in the random dense setting where each observed histogram involves a random subset of entries of size proportional to n. We characterize the performance of the algorithm in the asymptotic regime where the number of observations $m$ tends to infinity proportionally to n, by deriving the corresponding State Evolution (SE) equations and studying their dynamics. We initiate the analysis of the multi-dimensional SE dynamics by proving their convergence to a fixed point, along with some further properties of the iterates. The analysis reveals sharp phase transition phenomena where the behavior of AMP changes from exact recovery to weak correlation with the signal as m/n crosses a threshold. We derive formulae for the threshold in some special cases and show that they accurately match experimental behavior.

preprint2017arXiv

Multi-Layer Generalized Linear Estimation

We consider the problem of reconstructing a signal from multi-layered (possibly) non-linear measurements. Using non-rigorous but standard methods from statistical physics we present the Multi-Layer Approximate Message Passing (ML-AMP) algorithm for computing marginal probabilities of the corresponding estimation problem and derive the associated state evolution equations to analyze its performance. We also give the expression of the asymptotic free energy and the minimal information-theoretically achievable reconstruction error. Finally, we present some applications of this measurement model for compressed sensing and perceptron learning with structured matrices/patterns, and for a simple model of estimation of latent variables in an auto-encoder.

preprint2017arXiv

Statistical and computational phase transitions in spiked tensor estimation

We consider tensor factorizations using a generative model and a Bayesian approach. We compute rigorously the mutual information, the Minimal Mean Squared Error (MMSE), and unveil information-theoretic phase transitions. In addition, we study the performance of Approximate Message Passing (AMP) and show that it achieves the MMSE for a large set of parameters, and that factorization is algorithmically "easy" in a much wider region than previously believed. It exists, however, a "hard" region where AMP fails to reach the MMSE and we conjecture that no polynomial algorithm will improve on AMP.

preprint2016arXiv

Clustering from Sparse Pairwise Measurements

We consider the problem of grouping items into clusters based on few random pairwise comparisons between the items. We introduce three closely related algorithms for this task: a belief propagation algorithm approximating the Bayes optimal solution, and two spectral algorithms based on the non-backtracking and Bethe Hessian operators. For the case of two symmetric clusters, we conjecture that these algorithms are asymptotically optimal in that they detect the clusters as soon as it is information theoretically possible to do so. We substantiate this claim for one of the spectral approaches we introduce.

preprint2016arXiv

Fast phase retrieval for high dimensions: A block-based approach

This paper addresses fundamental scaling issues that hinder phase retrieval (PR) in high dimensions. We show that, if the measurement matrix can be put into a generalized block-diagonal form, a large PR problem can be solved on separate blocks, at the cost of a few extra global measurements to merge the partial results. We illustrate this principle using two distinct PR methods, and discuss different design trade-offs. Experimental results indicate that this block-based PR framework can reduce computational cost and memory requirements by several orders of magnitude.

preprint2016arXiv

Intensity-only optical compressive imaging using a multiply scattering material and a double phase retrieval approach

In this paper, the problem of compressive imaging is addressed using natural randomization by means of a multiply scattering medium. To utilize the medium in this way, its corresponding transmission matrix must be estimated. To calibrate the imager, we use a digital micromirror device (DMD) as a simple, cheap, and high-resolution binary intensity modulator. We propose a phase retrieval algorithm which is well adapted to intensity-only measurements on the camera, and to the input binary intensity patterns, both to estimate the complex transmission matrix as well as image reconstruction. We demonstrate promising experimental results for the proposed algorithm using the MNIST dataset of handwritten digits as example images.

preprint2016arXiv

Matrix Completion from Fewer Entries: Spectral Detectability and Rank Estimation

The completion of low rank matrices from few entries is a task with many practical applications. We consider here two aspects of this problem: detectability, i.e. the ability to estimate the rank $r$ reliably from the fewest possible random entries, and performance in achieving small reconstruction error. We propose a spectral algorithm for these two tasks called MaCBetH (for Matrix Completion with the Bethe Hessian). The rank is estimated as the number of negative eigenvalues of the Bethe Hessian matrix, and the corresponding eigenvectors are used as initial condition for the minimization of the discrepancy between the estimated matrix and the revealed entries. We analyze the performance in a random matrix setting using results from the statistical mechanics of the Hopfield neural network, and show in particular that MaCBetH efficiently detects the rank $r$ of a large $n\times m$ matrix from $C(r)r\sqrt{nm}$ entries, where $C(r)$ is a constant close to $1$. We also evaluate the corresponding root-mean-square error empirically and show that MaCBetH compares favorably to other existing approaches.

preprint2016arXiv

MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel

This paper considers probabilistic estimation of a low-rank matrix from non-linear element-wise measurements of its elements. We derive the corresponding approximate message passing (AMP) algorithm and its state evolution. Relying on non-rigorous but standard assumptions motivated by statistical physics, we characterize the minimum mean squared error (MMSE) achievable information theoretically and with the AMP algorithm. Unlike in related problems of linear estimation, in the present setting the MMSE depends on the output channel only trough a single parameter - its Fisher information. We illustrate this striking finding by analysis of submatrix localization, and of detection of communities hidden in a dense stochastic block model. For this example we locate the computational and statistical boundaries that are not equal for rank larger than four.

preprint2016arXiv

Phase transitions and sample complexity in Bayes-optimal matrix factorization

We analyse the matrix factorization problem. Given a noisy measurement of a product of two matrices, the problem is to estimate back the original matrices. It arises in many applications such as dictionary learning, blind matrix calibration, sparse principal component analysis, blind source separation, low rank matrix completion, robust principal component analysis or factor analysis. It is also important in machine learning: unsupervised representation learning can often be studied through matrix factorization. We use the tools of statistical mechanics - the cavity and replica methods - to analyze the achievability and computational tractability of the inference problems in the setting of Bayes-optimal inference, which amounts to assuming that the two matrices have random independent elements generated from some known distribution, and this information is available to the inference algorithm. In this setting, we compute the minimal mean-squared-error achievable in principle in any computational time, and the error that can be achieved by an efficient approximate message passing algorithm. The computation is based on the asymptotic state-evolution analysis of the algorithm. The performance that our analysis predicts, both in terms of the achieved mean-squared-error, and in terms of sample complexity, is extremely promising and motivating for a further development of the algorithm.

preprint2015arXiv

Approximate Message Passing with Restricted Boltzmann Machine Priors

Approximate Message Passing (AMP) has been shown to be an excellent statistical approach to signal inference and compressed sensing problem. The AMP framework provides modularity in the choice of signal prior; here we propose a hierarchical form of the Gauss-Bernouilli prior which utilizes a Restricted Boltzmann Machine (RBM) trained on the signal support to push reconstruction performance beyond that of simple iid priors for signals whose support can be well represented by a trained binary RBM. We present and analyze two methods of RBM factorization and demonstrate how these affect signal reconstruction performance within our proposed algorithm. Finally, using the MNIST handwritten digit dataset, we show experimentally that using an RBM allows AMP to approach oracle-support performance.

preprint2015arXiv

Approximate message-passing with spatially coupled structured operators, with applications to compressed sensing and sparse superposition codes

We study the behavior of Approximate Message-Passing, a solver for linear sparse estimation problems such as compressed sensing, when the i.i.d matrices -for which it has been specifically designed- are replaced by structured operators, such as Fourier and Hadamard ones. We show empirically that after proper randomization, the structure of the operators does not significantly affect the performances of the solver. Furthermore, for some specially designed spatially coupled operators, this allows a computationally fast and memory efficient reconstruction in compressed sensing up to the information-theoretical limit. We also show how this approach can be applied to sparse superposition codes, allowing the Approximate Message-Passing decoder to perform at large rates for moderate block length.

preprint2015arXiv

Phase recovery from a Bayesian point of view: the variational approach

In this paper, we consider the phase recovery problem, where a complex signal vector has to be estimated from the knowledge of the modulus of its linear projections, from a naive variational Bayesian point of view. In particular, we derive an iterative algorithm following the minimization of the Kullback-Leibler divergence under the mean-field assumption, and show on synthetic data with random projections that this approach leads to an efficient and robust procedure, with a good computational cost.

preprint2015arXiv

Phase Transitions in Sparse PCA

We study optimal estimation for sparse principal component analysis when the number of non-zero elements is small but on the same order as the dimension of the data. We employ approximate message passing (AMP) algorithm and its state evolution to analyze what is the information theoretically minimal mean-squared error and the one achieved by AMP in the limit of large sizes. For a special case of rank one and large enough density of non-zeros Deshpande and Montanari [1] proved that AMP is asymptotically optimal. We show that both for low density and for large rank the problem undergoes a series of phase transitions suggesting existence of a region of parameters where estimation is information theoretically possible, but AMP (and presumably every other polynomial algorithm) fails. The analysis of the large rank limit is particularly instructive.

preprint2015arXiv

Random Projections through multiple optical scattering: Approximating kernels at the speed of light

Random projections have proven extremely useful in many signal processing and machine learning applications. However, they often require either to store a very large random matrix, or to use a different, structured matrix to reduce the computational and memory costs. Here, we overcome this difficulty by proposing an analog, optical device, that performs the random projections literally at the speed of light without having to store any matrix in memory. This is achieved using the physical properties of multiple coherent scattering of coherent light in random media. We use this device on a simple task of classification with a kernel machine, and we show that, on the MNIST database, the experimental results closely match the theoretical performance of the corresponding kernel. This framework can help make kernel methods practical for applications that have large training sets and/or require real-time prediction. We discuss possible extensions of the method in terms of a class of kernels, speed, memory consumption and different problems.

preprint2015arXiv

Reference-less measurement of the transmission matrix of a highly scattering material using a DMD and phase retrieval techniques

This paper investigates experimental means of measuring the transmission matrix (TM) of a highly scattering medium, with the simplest optical setup. Spatial light modulation is performed by a digital micromirror device (DMD), allowing high rates and high pixel counts but only binary amplitude modulation. We used intensity measurement only, thus avoiding the need for a reference beam. Therefore, the phase of the TM has to be estimated through signal processing techniques of phase retrieval. Here, we compare four different phase retrieval principles on noisy experimental data. We validate our estimations of the TM on three criteria : quality of prediction, distribution of singular values, and quality of focusing. Results indicate that Bayesian phase retrieval algorithms with variational approaches provide a good tradeoff between the computational complexity and the precision of the estimates.

preprint2015arXiv

Scampi: a robust approximate message-passing framework for compressive imaging

Reconstruction of images from noisy linear measurements is a core problem in image processing, for which convex optimization methods based on total variation (TV) minimization have been the long-standing state-of-the-art. We present an alternative probabilistic reconstruction procedure based on approximate message-passing, Scampi, which operates in the compressive regime, where the inverse imaging problem is underdetermined. While the proposed method is related to the recently proposed GrAMPA algorithm of Borgerding, Schniter, and Rangan, we further develop the probabilistic approach to compressive imaging by introducing an expectation-maximizaiton learning of model parameters, making the Scampi robust to model uncertainties. Additionally, our numerical experiments indicate that Scampi can provide reconstruction performance superior to both GrAMPA as well as convex approaches to TV reconstruction. Finally, through exhaustive best-case experiments, we show that in many cases the maximal performance of both Scampi and convex TV can be quite close, even though the approaches are a prori distinct. The theoretical reasons for this correspondence remain an open question. Nevertheless, the proposed algorithm remains more practical, as it requires far less parameter tuning to perform optimally.

preprint2015arXiv

Spectral Detection in the Censored Block Model

We consider the problem of partially recovering hidden binary variables from the observation of (few) censored edge weights, a problem with applications in community detection, correlation clustering and synchronization. We describe two spectral algorithms for this task based on the non-backtracking and the Bethe Hessian operators. These algorithms are shown to be asymptotically optimal for the partial recovery problem, in that they detect the hidden assignment as soon as it is information theoretically possible to do so.

preprint2015arXiv

Spectral Detection on Sparse Hypergraphs

We consider the problem of the assignment of nodes into communities from a set of hyperedges, where every hyperedge is a noisy observation of the community assignment of the adjacent nodes. We focus in particular on the sparse regime where the number of edges is of the same order as the number of vertices. We propose a spectral method based on a generalization of the non-backtracking Hashimoto matrix into hypergraphs. We analyze its performance on a planted generative model and compare it with other spectral methods and with Bayesian belief propagation (which was conjectured to be asymptotically optimal for this model). We conclude that the proposed spectral method detects communities whenever belief propagation does, while having the important advantages to be simpler, entirely nonparametric, and to be able to learn the rule according to which the hyperedges were generated without prior information.

preprint2015arXiv

Training Restricted Boltzmann Machines via the Thouless-Anderson-Palmer Free Energy

Restricted Boltzmann machines are undirected neural networks which have been shown to be effective in many applications, including serving as initializations for training deep multi-layer neural networks. One of the main reasons for their success is the existence of efficient and practical stochastic algorithms, such as contrastive divergence, for unsupervised training. We propose an alternative deterministic iterative procedure based on an improved mean field method from statistical physics known as the Thouless-Anderson-Palmer approach. We demonstrate that our algorithm provides performance equal to, and sometimes superior to, persistent contrastive divergence, while also providing a clear and easy to evaluate objective function. We believe that this strategy can be easily generalized to other models as well as to more accurate higher-order approximations, paving the way for systematic improvements in training Boltzmann machines with hidden units.

preprint2014arXiv

Adaptive Damping and Mean Removal for the Generalized Approximate Message Passing Algorithm

The generalized approximate message passing (GAMP) algorithm is an efficient method of MAP or approximate-MMSE estimation of $x$ observed from a noisy version of the transform coefficients $z = Ax$. In fact, for large zero-mean i.i.d sub-Gaussian $A$, GAMP is characterized by a state evolution whose fixed points, when unique, are optimal. For generic $A$, however, GAMP may diverge. In this paper, we propose adaptive damping and mean-removal strategies that aim to prevent divergence. Numerical results demonstrate significantly enhanced robustness to non-zero-mean, rank-deficient, column-correlated, and ill-conditioned $A$.

preprint2014arXiv

Belief-Propagation Guided Monte-Carlo Sampling

A Monte-Carlo algorithm for discrete statistical models that combines the full power of the Belief Propagation algorithm with the advantages of a detailed-balanced heat bath approach is presented. A sub-tree inside the factor graph is first extracted randomly; Belief Propagation is then used as a perfect sampler to generate a configuration on the tree given the boundary conditions and the procedure is iterated. This appoach is best adapted for locally tree like graphs, it is therefore tested on the hard cases of spin-glass models for random graphs demonstrating its state-of-the art status in those cases.

preprint2014arXiv

On Convergence of Approximate Message Passing

Approximate message passing is an iterative algorithm for compressed sensing and related applications. A solid theory about the performance and convergence of the algorithm exists for measurement matrices having iid entries of zero mean. However, it was observed by several authors that for more general matrices the algorithm often encounters convergence problems. In this paper we identify the reason of the non-convergence for measurement matrices with iid entries and non-zero mean in the context of Bayes optimal inference. Finally we demonstrate numerically that when the iterative update is changed from parallel to sequential the convergence is restored.

preprint2014arXiv

Replica Analysis and Approximate Message Passing Decoder for Superposition Codes

Superposition codes are efficient for the Additive White Gaussian Noise channel. We provide here a replica analysis of the performances of these codes for large signals. We also consider a Bayesian Approximate Message Passing decoder based on a belief-propagation approach, and discuss its performance using the density evolution technic. Our main findings are 1) for the sizes we can access, the message-passing decoder outperforms other decoders studied in the literature 2) its performance is limited by a sharp phase transition and 3) while these codes reach capacity as $B$ (a crucial parameter in the code) increases, the performance of the message passing decoder worsen as the phase transition goes to lower rates.

preprint2014arXiv

Reweighted belief propagation and quiet planting for random K-SAT

We study the random K-satisfiability problem using a partition function where each solution is reweighted according to the number of variables that satisfy every clause. We apply belief propagation and the related cavity method to the reweighted partition function. This allows us to obtain several new results on the properties of random K-satisfiability problem. In particular the reweighting allows to introduce a planted ensemble that generates instances that are, in some region of parameters, equivalent to random instances. We are hence able to generate at the same time a typical random SAT instance and one of its solutions. We study the relation between clustering and belief propagation fixed points and we give a direct evidence for the existence of purely entropic (rather than energetic) barriers between clusters in some region of parameters in the random K-satisfiability problem. We exhibit, in some large planted instances, solutions with a non-trivial whitening core; such solutions were known to exist but were so far never found on very large instances. Finally, we discuss algorithmic hardness of such planted instances and we determine a region of parameters in which planting leads to satisfiable benchmarks that, up to our knowledge, are the hardest known.

preprint2014arXiv

Sparse Estimation with the Swept Approximated Message-Passing Algorithm

Approximate Message Passing (AMP) has been shown to be a superior method for inference problems, such as the recovery of signals from sets of noisy, lower-dimensionality measurements, both in terms of reconstruction accuracy and in computational efficiency. However, AMP suffers from serious convergence issues in contexts that do not exactly match its assumptions. We propose a new approach to stabilizing AMP in these contexts by applying AMP updates to individual coefficients rather than in parallel. Our results show that this change to the AMP iteration can provide theoretically expected, but hitherto unobtainable, performance for problems on which the standard AMP iteration diverges. Additionally, we find that the computational costs of this swept coefficient update scheme is not unduly burdensome, allowing it to be applied efficiently to signals of large dimensionality.

preprint2014arXiv

Spectral Clustering of Graphs with the Bethe Hessian

Spectral clustering is a standard approach to label nodes on a graph by studying the (largest or lowest) eigenvalues of a symmetric real matrix such as e.g. the adjacency or the Laplacian. Recently, it has been argued that using instead a more complicated, non-symmetric and higher dimensional operator, related to the non-backtracking walk on the graph, leads to improved performance in detecting clusters, and even to optimal performance for the stochastic block model. Here, we propose to use instead a simpler object, a symmetric real matrix known as the Bethe Hessian operator, or deformed Laplacian. We show that this approach combines the performances of the non-backtracking operator, thus detecting clusters all the way down to the theoretical limit in the stochastic block model, with the computational, theoretical and memory advantages of real symmetric matrices.

preprint2014arXiv

Spectral density of the non-backtracking operator

The non-backtracking operator was recently shown to provide a significant improvement when used for spectral clustering of sparse networks. In this paper we analyze its spectral density on large random sparse graphs using a mapping to the correlation functions of a certain interacting quantum disordered system on the graph. On sparse, tree-like graphs, this can be solved efficiently by the cavity method and a belief propagation algorithm. We show that there exists a paramagnetic phase, leading to zero spectral density, that is stable outside a circle of radius $\sqrtρ$, where $ρ$ is the leading eigenvalue of the non-backtracking operator. We observe a second-order phase transition at the edge of this circle, between a zero and a non-zero spectral density. That fact that this phase transition is absent in the spectral density of other matrices commonly used for spectral clustering provides a physical justification of the performances of the non-backtracking operator in spectral clustering.

preprint2014arXiv

Variational Free Energies for Compressed Sensing

We consider the variational free energy approach for compressed sensing. We first show that the naïve mean field approach performs remarkably well when coupled with a noise learning procedure. We also notice that it leads to the same equations as those used for iterative thresholding. We then discuss the Bethe free energy and how it corresponds to the fixed points of the approximate message passing algorithm. In both cases, we test numerically the direct optimization of the free energies as a converging sparse-estimationalgorithm.

preprint2013arXiv

Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications

In this paper we extend our previous work on the stochastic block model, a commonly used generative model for social and biological networks, and the problem of inferring functional groups or communities from the topology of the network. We use the cavity method of statistical physics to obtain an asymptotically exact analysis of the phase diagram. We describe in detail properties of the detectability/undetectability phase transition and the easy/hard phase transition for the community detection problem. Our analysis translates naturally into a belief propagation algorithm for inferring the group memberships of the nodes in an optimal way, i.e., that maximizes the overlap with the underlying group memberships, and learning the underlying parameters of the block model. Finally, we apply the algorithm to two examples of real-world networks and discuss its performance.

preprint2013arXiv

Blind Calibration in Compressed Sensing using Message Passing Algorithms

Compressed sensing (CS) is a concept that allows to acquire compressible signals with a small number of measurements. As such it is very attractive for hardware implementations. Therefore, correct calibration of the hardware is a central is- sue. In this paper we study the so-called blind calibration, i.e. when the training signals that are available to perform the calibration are sparse but unknown. We extend the approximate message passing (AMP) algorithm used in CS to the case of blind calibration. In the calibration-AMP, both the gains on the sensors and the elements of the signals are treated as unknowns. Our algorithm is also applica- ble to settings in which the sensors distort the measurements in other ways than multiplication by a gain, unlike previously suggested blind calibration algorithms based on convex relaxations. We study numerically the phase diagram of the blind calibration problem, and show that even in cases where convex relaxation is pos- sible, our algorithm requires a smaller number of measurements and/or signals in order to perform well.

preprint2013arXiv

Compressed Sensing under Matrix Uncertainty: Optimum Thresholds and Robust Approximate Message Passing

In compressed sensing one measures sparse signals directly in a compressed form via a linear transform and then reconstructs the original signal. However, it is often the case that the linear transform itself is known only approximately, a situation called matrix uncertainty, and that the measurement process is noisy. Here we present two contributions to this problem: first, we use the replica method to determine the mean-squared error of the Bayes-optimal reconstruction of sparse signals under matrix uncertainty. Second, we consider a robust variant of the approximate message passing algorithm and demonstrate numerically that in the limit of large systems, this algorithm matches the optimal performance in a large region of parameters.

preprint2013arXiv

Model Selection for Degree-corrected Block Models

The proliferation of models for networks raises challenging problems of model selection: the data are sparse and globally dependent, and models are typically high-dimensional and have large numbers of latent variables. Together, these issues mean that the usual model-selection criteria do not work properly for networks. We illustrate these challenges, and show one way to resolve them, by considering the key network-analysis problem of dividing a graph into communities or blocks of nodes with homogeneous patterns of links to the rest of the network. The standard tool for doing this is the stochastic block model, under which the probability of a link between two nodes is a function solely of the blocks to which they belong. This imposes a homogeneous degree distribution within each block; this can be unrealistic, so degree-corrected block models add a parameter for each node, modulating its over-all degree. The choice between ordinary and degree-corrected block models matters because they make very different inferences about communities. We present the first principled and tractable approach to model selection between standard and degree-corrected block models, based on new large-graph asymptotics for the distribution of log-likelihood ratios under the stochastic block model, finding substantial departures from classical results for sparse graphs. We also develop linear-time approximations for log-likelihoods under both the stochastic block model and the degree-corrected model, using belief propagation. Applications to simulated and real networks show excellent agreement with our approximations. Our results thus both solve the practical problem of deciding on degree correction, and point to a general approach to model selection in network analysis.

preprint2013arXiv

Non-adaptive pooling strategies for detection of rare faulty items

We study non-adaptive pooling strategies for detection of rare faulty items. Given a binary sparse N-dimensional signal x, how to construct a sparse binary MxN pooling matrix F such that the signal can be reconstructed from the smallest possible number M of measurements y=Fx? We show that a very low number of measurements is possible for random spatially coupled design of pools F. Our design might find application in genetic screening or compressed genotyping. We show that our results are robust with respect to the uncertainty in the matrix F when some elements are mistaken.

preprint2013arXiv

Performance of simulated annealing in p-spin glasses

We perform careful numerical simulations of slow Monte-Carlo annealings in the dense 3-body spin glass model and compare with the predictions from different theories: thresholds states, isocomplexity, following state. We conclude that while isocomplexity and following state both provide excellent agreement the numerical data, the influence of threshold states -- that is still the most commonly considered theory -- can be excluded from our data.

preprint2013arXiv

Phase Diagram and Approximate Message Passing for Blind Calibration and Dictionary Learning

We consider dictionary learning and blind calibration for signals and matrices created from a random ensemble. We study the mean-squared error in the limit of large signal dimension using the replica method and unveil the appearance of phase transitions delimiting impossible, possible-but-hard and possible inference regions. We also introduce an approximate message passing algorithm that asymptotically matches the theoretical performance, and show through numerical tests that it performs very well, for the calibration problem, for tractable system sizes.

preprint2013arXiv

Robust error correction for real-valued signals via message-passing decoding and spatial coupling

We revisit the error correction scheme of real-valued signals when the codeword is corrupted by gross errors on a fraction of entries and a small noise on all the entries. Combining the recent developments of approximate message passing and the spatially-coupled measurement matrix in compressed sensing we show that the error correction and its robustness towards noise can be enhanced considerably. We discuss the performance in the large signal limit using previous results on state evolution, as well as for finite size signals through numerical simulations. Even for relatively small sizes, the approach proposed here outperforms convex-relaxation-based decoders.

preprint2013arXiv

Spectral redemption: clustering sparse networks

Spectral algorithms are classic approaches to clustering and community detection in networks. However, for sparse networks the standard versions of these algorithms are suboptimal, in some cases completely failing to detect communities even when other algorithms such as belief propagation can do so. Here we introduce a new class of spectral algorithms based on a non-backtracking walk on the directed edges of the graph. The spectrum of this operator is much better-behaved than that of the adjacency matrix or other commonly used matrices, maintaining a strong separation between the bulk eigenvalues and the eigenvalues relevant to community structure even in the sparse case. We show that our algorithm is optimal for graphs generated by the stochastic block model, detecting communities all the way down to the theoretical limit. We also show the spectrum of the non-backtracking operator for some real-world networks, illustrating its advantages over traditional spectral clustering.

preprint2013arXiv

The hard-core model on random graphs revisited

We revisit the classical hard-core model, also known as independent set and dual to vertex cover problem, where one puts particles with a first-neighbor hard-core repulsion on the vertices of a random graph. Although the case of random graphs with small and very large average degrees respectively are quite well understood, they yield qualitatively different results and our aim here is to reconciliate these two cases. We revisit results that can be obtained using the (heuristic) cavity method and show that it provides a closed-form conjecture for the exact density of the densest packing on random regular graphs with degree K>=20, and that for K>16 the nature of the phase transition is the same as for large K. This also shows that the hard-code model is the simplest mean-field lattice model for structural glasses and jamming.

preprint2012arXiv

Comparative Study for Inference of Hidden Classes in Stochastic Block Models

Inference of hidden classes in stochastic block model is a classical problem with important applications. Most commonly used methods for this problem involve na\"ıve mean field approaches or heuristic spectral methods. Recently, belief propagation was proposed for this problem. In this contribution we perform a comparative study between the three methods on synthetically created networks. We show that belief propagation shows much better performance when compared to na\"ıve mean field and spectral approaches. This applies to accuracy, computational efficiency and the tendency to overfit the data.

preprint2012arXiv

Following states in temperature in the spherical s+p-spin glass model

In many mean-field glassy systems, the low-temperature Gibbs measure is dominated by exponentially many metastable states. We analyze the evolution of the metastable states as temperature changes adiabatically in the solvable case of the spherical $s+p$-spin glass model, extending the work of Barrat, Franz and Parisi J. Phys. A 30, 5593 (1997). We confirm the presence of level crossings, bifurcations, and temperature chaos. For the states that are at equilibrium close to the so-called dynamical temperature $T_d$, we find, however, that the following state method (and the dynamical solution of the model as well) is intrinsically limited by the vanishing of solutions with non-zero overlap at low temperature.

preprint2012arXiv

On the relation between kinetically constrained models of glass dynamics and the random first-order transition theory

In this paper we revisit and extend the mapping between two apparently different classes of models. The first class contains the prototypical models described --at the mean-field level-- by the Random First Order Transition (RFOT) theory of the glass transition, called either "random XORSAT problem" (in the information theory community) or "diluted $p$-spin model" (in the spin glass community), undergoing a single-spin flip Glauber dynamics. The models in the second class are Kinetically Constrained Models (KCM): their Hamiltonian is that of independent spins in a constant magnetic field, hence their thermodynamics is completely trivial, but the dynamics is such that only groups of spin can flip together, thus implementing a kinetic constraint that induces a non-trivial dynamical behavior. A mapping between some representatives of these two classes has been known for long. Here we formally prove this mapping at the level of the master equation, and we apply it to the particular case of Bethe lattice models. This allows us to show that a RFOT model can be mapped exactly into a KCM. However, the natural order parameter for the RFOT model, namely the spin overlap, is mapped into a very complicated non-local function in the KCM. Therefore, if one were to study the KCM without knowing of the mapping onto the RFOT model, one would guess that its physics is quite different from the RFOT one. Our results instead suggest that these two apparently different descriptions of the glass transition are, at least in some case, closely related.

preprint2012arXiv

Probabilistic Reconstruction in Compressed Sensing: Algorithms, Phase Diagrams, and Threshold Achieving Matrices

Compressed sensing is a signal processing method that acquires data directly in a compressed form. This allows one to make less measurements than what was considered necessary to record a signal, enabling faster or more precise measurement protocols in a wide range of applications. Using an interdisciplinary approach, we have recently proposed in [arXiv:1109.4424] a strategy that allows compressed sensing to be performed at acquisition rates approaching to the theoretical optimal limits. In this paper, we give a more thorough presentation of our approach, and introduce many new results. We present the probabilistic approach to reconstruction and discuss its optimality and robustness. We detail the derivation of the message passing algorithm for reconstruction and expectation max- imization learning of signal-model parameters. We further develop the asymptotic analysis of the corresponding phase diagrams with and without measurement noise, for different distribution of signals, and discuss the best possible reconstruction performances regardless of the algorithm. We also present new efficient seeding matrices, test them on synthetic data and analyze their performance asymptotically.

preprint2012arXiv

Statistical physics-based reconstruction in compressed sensing

Compressed sensing is triggering a major evolution in signal acquisition. It consists in sampling a sparse signal at low rate and later using computational power for its exact reconstruction, so that only the necessary information is measured. Currently used reconstruction techniques are, however, limited to acquisition rates larger than the true density of the signal. We design a new procedure which is able to reconstruct exactly the signal with a number of measurements that approaches the theoretical limit in the limit of large systems. It is based on the joint use of three essential ingredients: a probabilistic approach to signal reconstruction, a message-passing algorithm adapted from belief propagation, and a careful design of the measurement matrix inspired from the theory of crystal nucleation. The performance of this new algorithm is analyzed by statistical physics methods. The obtained improvement is confirmed by numerical studies of several cases.

preprint2012arXiv

The Quantum Adiabatic Algorithm applied to random optimization problems: the quantum spin glass perspective

Among various algorithms designed to exploit the specific properties of quantum computers with respect to classical ones, the quantum adiabatic algorithm is a versatile proposition to find the minimal value of an arbitrary cost function (ground state energy). Random optimization problems provide a natural testbed to compare its efficiency with that of classical algorithms. These problems correspond to mean field spin glasses that have been extensively studied in the classical case. This paper reviews recent analytical works that extended these studies to incorporate the effect of quantum fluctuations, and presents also some original results in this direction.

preprint2012arXiv

Ultrametric probe of the spin-glass state in a field

We study the ultrametric structure of phase space of one-dimensional Ising spin glasses with random power-law interaction in an external random field. Although in zero field the model in both the mean-field and non-mean-field universality classes shows an ultrametric signature [Phys. Rev. Lett. 102, 037207 (2009)], when a field is applied ultrametricity seems only present in the mean-field regime. The results for the non-mean field case in an external field agree with data for spin glasses studied within the Migdal-Kadanoff approximation. Our results therefore suggest that the spin-glass state might be fragile to external fields below the upper critical dimension.

preprint2011arXiv

Phase transition in the detection of modules in sparse networks

We present an asymptotically exact analysis of the problem of detecting communities in sparse random networks. Our results are also applicable to detection of functional modules, partitions, and colorings in noisy planted models. Using a cavity method analysis, we unveil a phase transition from a region where the original group assignment is undetectable to one where detection is possible. In some cases, the detectable region splits into an algorithmically hard region and an easy one. Our approach naturally translates into a practical algorithm for detecting modules in sparse networks, and learning the parameters of the underlying model.

preprint2011arXiv

Random-field p-spin glass model on regular random graphs

We investigate in detail the phase diagrams of the p-body +/-J Ising model with and without random fields on random graphs with fixed connectivity. One of our most interesting findings is that a thermodynamic spin glass phase is present in the three-body purely ferromagnetic model in random fields, unlike for the canonical two-body interaction random-field Ising model. We also discuss the location of the phase boundary between the paramagnetic and spin glass phases that does not depend on the change of the ferromagnetic bias. This behavior is explained by a gauge transformation, which shows that gauge-invariant properties generically do not depend on the strength of the ferromagnetic bias for the +/-J Ising model on regular random graphs.

preprint2011arXiv

The nature of the different zero-temperature phases in discrete two-dimensional spin glasses: Entropy, universality, chaos and cascades in the renormalization group flow

The properties of discrete two-dimensional spin glasses depend strongly on the way the zero-temperature limit is taken. We discuss this phenomenon in the context of the Migdal-Kadanoff renormalization group. We see, in particular, how these properties are connected with the presence of a cascade of fixed points in the renormalization group flow. Of particular interest are two unstable fixed points that correspond to two different spin-glass phases at zero temperature. We discuss how these phenomena are related with the presence of entropy fluctuations and temperature chaos, and universality in this model.

preprint2010arXiv

Elusive Glassy Phase in the Random Field Ising Model

We consider the random field Ising model and show rigorously that the spin glass susceptibility at equilibrium is always bounded by the ferromagnetic susceptibility, and therefore that no spin glass phase can be present at equilibrium out of the ferromagnet critical line. When the magnetization is, however, fixed to values smaller than the equilibrium one, a glassy phase can exist, as we show explicitly on the Bethe lattice.

preprint2010arXiv

Following Gibbs States Adiabatically - The Energy Landscape of Mean Field Glassy Systems

We introduce a generalization of the cavity, or Bethe-Peierls, method that allows to follow Gibbs states when an external parameter, e.g. the temperature, is adiabatically changed. This allows to obtain new quantitative results on the static and dynamic behavior of mean field disordered systems such as models of glassy and amorphous materials or random constraint satisfaction problems. As a first application, we discuss the residual energy after a very slow annealing, the behavior of out-of-equilibrium states, and demonstrate the presence of temperature chaos in equilibrium. We also explore the energy landscape, and identify a new transition from an computationally easier canyons-dominated region to a harder valleys-dominated one.

preprint2010arXiv

Generalization of the cavity method for adiabatic evolution of Gibbs states

Mean field glassy systems have a complicated energy landscape and an enormous number of different Gibbs states. In this paper, we introduce a generalization of the cavity method in order to describe the adiabatic evolution of these glassy Gibbs states as an external parameter, such as the temperature, is tuned. We give a general derivation of the method and describe in details the solution of the resulting equations for the fully connected p-spin model, the XOR-SAT problem and the anti-ferromagnetic Potts glass (or "coloring" problem). As direct results of the states following method, we present a study of very slow Monte-Carlo annealings, the demonstration of the presence of temperature chaos in these systems, and the identification of a easy/hard transition for simulated annealing in constraint optimization problems. We also discuss the relation between our approach and the Franz-Parisi potential, as well as with the reconstruction problem on trees in computer science. A mapping between the states following method and the physics on the Nishimori line is also presented.

preprint2010arXiv

Glassy aspects of melting dynamics (On melting dynamics and the glass transition, Part I)

The following properties are in the present literature associated with the behavior of super-cooled glass-forming liquids: faster than exponential growth of the relaxation time, dynamical heterogeneities, growing point-to-set correlation length, crossover from mean field behavior to activated dynamics. In this paper we argue that these properties are also present in a much simpler situation, namely the melting of the bulk of an ordered phase beyond a first order phase transition point. This is a promising path towards a better theoretical, numerical and experimental understanding of the above phenomena and of the physics of super-cooled liquids. We discuss in detail the analogies and the differences between the glass and the bulk melting transitions.

preprint2010arXiv

Glassy dynamics as a melting process (On melting dynamics and the glass transition, Part II)

There are deep analogies between the melting dynamics in systems with a first order phase transition and the dynamics from equilibrium in super-cooled liquids. For a class of Ising spin models undergoing a first order transition - namely p-spin models on the so-called Nishimori line - it can be shown that the melting dynamics can be exactly mapped to the equilibrium dynamics. In this mapping the dynamical -or mode-coupling- glass transition corresponds to the spinodal point, while the Kauzmann transition corresponds to the first order phase transition itself. Both in mean field and finite dimensional models this mapping provides an exact realization of the random first order theory scenario for the glass transition. The corresponding glassy phenomenology can then be understood in the framework of a standard first order phase transition.

preprint2010arXiv

No spin glass phase in ferromagnetic random-field random-temperature scalar Ginzburg-Landau model

Krzakala, Ricci-Tersenghi and Zdeborova have shown recently that the random field Ising model with non-negative interactions and arbitrary external magnetic field on an arbitrary lattice does not have a static spin glass phase. In this paper we generalize the proof to a soft scalar spin version of the Ising model: the Ginzburg-Landau model with random magnetic field and random temperature-parameter. We do so by proving that the spin glass susceptibility cannot diverge unless the ferromagnetic susceptibility does.

preprint2010arXiv

Quiet Planting in the Locked Constraint Satisfaction Problems

We study the planted ensemble of locked constraint satisfaction problems. We describe the connection between the random and planted ensembles. The use of the cavity method is combined with arguments from reconstruction on trees and first and second moment considerations; in particular the connection with the reconstruction on trees appears to be crucial. Our main result is the location of the hard region in the planted ensemble. In a part of that hard region instances have with high probability a single satisfying assignment.

preprint2009arXiv

Quantum Annealing of Hard Problems

Quantum annealing is analogous to simulated annealing with a tunneling mechanism substituting for thermal activation. Its performance has been tested in numerical simulation with mixed conclusions. There is a class of optimization problems for which the efficiency can be studied analytically using techniques based on the statistical mechanics of spin glasses.

preprint2007arXiv

Phase Transitions in the Coloring of Random Graphs

We consider the problem of coloring the vertices of a large sparse random graph with a given number of colors so that no adjacent vertices have the same color. Using the cavity method, we present a detailed and systematic analytical study of the space of proper colorings (solutions). We show that for a fixed number of colors and as the average vertex degree (number of constraints) increases, the set of solutions undergoes several phase transitions similar to those observed in the mean field theory of glasses. First, at the clustering transition, the entropically dominant part of the phase space decomposes into an exponential number of pure states so that beyond this transition a uniform sampling of solutions becomes hard. Afterward, the space of solutions condenses over a finite number of the largest states and consequently the total entropy of solutions becomes smaller than the annealed one. Another transition takes place when in all the entropically dominant states a finite fraction of nodes freezes so that each of these nodes is allowed a single color in all the solutions inside the state. Eventually, above the coloring threshold, no more solutions are available. We compute all the critical connectivities for Erdos-Renyi and regular random graphs and determine their asymptotic values for large number of colors. Finally, we discuss the algorithmic consequences of our findings. We argue that the onset of computational hardness is not associated with the clustering transition and we suggest instead that the freezing transition might be the relevant phenomenon. We also discuss the performance of a simple local Walk-COL algorithm and of the belief propagation algorithm in the light of our results.

Florent Krzakala

What is connected

Connect this record

See the researcher in context

Building this map preview

89 published item(s)

Asymptotics of Non-Convex Generalized Linear Models in High-Dimensions: A proof of the replica formula

Deep Learning as Neural Low-Degree Filtering: A Spectral Theory of Hierarchical Feature Learning

Scaling Laws from Sequential Feature Recovery: A Solvable Hierarchical Model

A High Dimensional Statistical Model for Adversarial Training: Geometry and Trade-Offs

Bayesian Inference with Nonlinear Generative Models: Comments on Secure Learning

Perturbative construction of mean-field equations in extensive-rank matrix factorization and denoising

Secure Coding via Gaussian Random Fields

Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification

The Gaussian equivalence of generative models for learning with shallow neural networks

Asymptotic errors for convex penalized linear regression beyond Gaussian matrices

Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval

Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures

Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime

Exact asymptotics for phase retrieval and compressed sensing with random generative priors

Generalisation error in learning with random features and the hidden manifold model

Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization

High-temperature Expansions and Message Passing Algorithms

Large-Scale Optical Reservoir Computing for Spatiotemporal Chaotic Systems Prediction

Light-in-the-loop: using a photonics co-processor for scalable training of neural networks

Marvels and Pitfalls of the Langevin Algorithm in Noisy High-dimensional Inference

Mutual Information and Optimality of Approximate Message-Passing in Random Linear Estimation

Passed & Spurious: Descent Algorithms and Local Minima in Spiked Matrix-Tensor Models

Phase retrieval in high dimensions: Statistical and computational phase transitions

Rademacher complexity and spin glasses: A link between the replica and statistical theories of learning

Reservoir Computing meets Recurrent Kernels and Structured Transforms

Who is Afraid of Big Bad Minima? Analysis of Gradient-Flow in a Spiked Matrix-Tensor Model

Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup

Kernel computations from large-scale random features obtained by Optical Processing Units

On the Universality of Noiseless Linear Estimation with Respect to the Measurement Matrix

The spiked matrix model with generative priors

Entropy and mutual information in models of deep neural networks

Fundamental limits of detection in the spiked Wigner model

Decoding from Pooled Data: Phase Transitions of Message Passing

Multi-Layer Generalized Linear Estimation

Statistical and computational phase transitions in spiked tensor estimation

Clustering from Sparse Pairwise Measurements

Fast phase retrieval for high dimensions: A block-based approach

Intensity-only optical compressive imaging using a multiply scattering material and a double phase retrieval approach

Matrix Completion from Fewer Entries: Spectral Detectability and Rank Estimation

MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel

Phase transitions and sample complexity in Bayes-optimal matrix factorization

Approximate Message Passing with Restricted Boltzmann Machine Priors

Approximate message-passing with spatially coupled structured operators, with applications to compressed sensing and sparse superposition codes

Phase recovery from a Bayesian point of view: the variational approach

Phase Transitions in Sparse PCA

Random Projections through multiple optical scattering: Approximating kernels at the speed of light

Reference-less measurement of the transmission matrix of a highly scattering material using a DMD and phase retrieval techniques

Scampi: a robust approximate message-passing framework for compressive imaging

Spectral Detection in the Censored Block Model

Spectral Detection on Sparse Hypergraphs

Training Restricted Boltzmann Machines via the Thouless-Anderson-Palmer Free Energy

Adaptive Damping and Mean Removal for the Generalized Approximate Message Passing Algorithm

Belief-Propagation Guided Monte-Carlo Sampling

On Convergence of Approximate Message Passing

Replica Analysis and Approximate Message Passing Decoder for Superposition Codes

Reweighted belief propagation and quiet planting for random K-SAT

Sparse Estimation with the Swept Approximated Message-Passing Algorithm

Spectral Clustering of Graphs with the Bethe Hessian

Spectral density of the non-backtracking operator

Variational Free Energies for Compressed Sensing

Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications

Blind Calibration in Compressed Sensing using Message Passing Algorithms

Compressed Sensing under Matrix Uncertainty: Optimum Thresholds and Robust Approximate Message Passing

Model Selection for Degree-corrected Block Models

Non-adaptive pooling strategies for detection of rare faulty items

Performance of simulated annealing in p-spin glasses

Phase Diagram and Approximate Message Passing for Blind Calibration and Dictionary Learning

Robust error correction for real-valued signals via message-passing decoding and spatial coupling

Spectral redemption: clustering sparse networks

The hard-core model on random graphs revisited

Comparative Study for Inference of Hidden Classes in Stochastic Block Models

Following states in temperature in the spherical s+p-spin glass model

On the relation between kinetically constrained models of glass dynamics and the random first-order transition theory

Probabilistic Reconstruction in Compressed Sensing: Algorithms, Phase Diagrams, and Threshold Achieving Matrices