Source author record

Carlo Lucibello

Carlo Lucibello appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.dis-nn Machine Learning cond-mat.stat-mech Neurons and Cognition Discrete Mathematics Graphics Information Theory math.IT

Catalog footprint

What is connected

19works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Information-theoretical analysis of the neural code for decoupled face representation

Processing faces accurately and efficiently is a key capability of humans and other animals that engage in sophisticated social tasks. Recent studies reported a decoupled coding for faces in the primate inferotemporal cortex, with two separate neural populations coding for the geometric position of (texture-free) facial landmarks and for the image texture at fixed landmark positions, respectively. Here, we formally assess the efficiency of this decoupled coding by appealing to the information-theoretic notion of description length, which quantifies the amount of information that is saved when encoding novel facial images, with a given precision. We show that despite decoupled coding describes the facial images in terms of two sets of principal components (of landmark shape and image texture), it is more efficient (i.e., yields more information compression) than the encoding in terms of the image principal components only, which corresponds to the widely used eigenface method. The advantage of decoupled coding over eigenface coding increases with image resolution and is especially prominent when coding variants of training set images that only differ in facial expressions. Moreover, we demonstrate that decoupled coding entails better performance in three different tasks: the representation of facial images, the (daydream) sampling of novel facial images, and the recognition of facial identities and gender. In summary, our study provides a first principle perspective on the efficiency and accuracy of the decoupled coding of facial stimuli reported in the primate inferotemporal cortex.

preprint2022arXiv

Deep learning via message passing algorithms based on belief propagation

Message-passing algorithms based on the Belief Propagation (BP) equations constitute a well-known distributed computational scheme. It is exact on tree-like graphical models and has also proven to be effective in many problems defined on graphs with loops (from inference to optimization, from signal processing to clustering). The BP-based scheme is fundamentally different from stochastic gradient descent (SGD), on which the current success of deep networks is based. In this paper, we present and adapt to mini-batch training on GPUs a family of BP-based message-passing algorithms with a reinforcement field that biases distributions towards locally entropic solutions. These algorithms are capable of training multi-layer neural networks with discrete weights and activations with performance comparable to SGD-inspired heuristics (BinaryNet) and are naturally well-adapted to continual learning. Furthermore, using these algorithms to estimate the marginals of the weights allows us to make approximate Bayesian predictions that have higher accuracy than point-wise solutions.

preprint2022arXiv

Unexpected upper critical dimension for spin glass models in a field predicted by the loop expansion around the Bethe solution at zero temperature

The spin-glass transition in a field in finite dimension is analyzed directly at zero temperature using a perturbative loop expansion around the Bethe lattice solution. The loop expansion is generated by the $M$-layer construction whose first diagrams are evaluated numerically and analytically. The generalized Ginzburg criterion reveals that the upper critical dimension below which mean-field theory fails is $D_U \le 8$, at variance with the classical result $D_U = 6$ yielded by finite-temperature replica field theory. Our expansion around the Bethe lattice has two crucial differences with respect to the classical one. The finite connectivity $z$ of the lattice is directly included from the beginning in the Bethe lattice, while in the classical computation the finite connectivity is obtained through an expansion in $1/z$. Moreover, if one is interested in the zero temperature ($T = 0$) transition, one can directly expand around the $T = 0$ Bethe transition. The expansion directly at $T = 0$ is not possible in the classical framework because the fully connected spin glass does not have a transition at $T = 0$, being in the broken phase for any value of the external field.

preprint2021arXiv

Entropic gradient descent algorithms and wide flat minima

The properties of flat minima in the empirical risk landscape of neural networks have been debated for some time. Increasing evidence suggests they possess better generalization capabilities with respect to sharp ones. First, we discuss Gaussian mixture classification models and show analytically that there exist Bayes optimal pointwise estimators which correspond to minimizers belonging to wide flat regions. These estimators can be found by applying maximum flatness algorithms either directly on the classifier (which is norm independent) or on the differentiable loss function used in learning. Next, we extend the analysis to the deep learning scenario by extensive numerical validations. Using two algorithms, Entropy-SGD and Replicated-SGD, that explicitly include in the optimization objective a non-local flatness measure known as local entropy, we consistently improve the generalization error for common architectures (e.g. ResNet, EfficientNet). An easy to compute flatness measure shows a clear correlation with test accuracy.

preprint2020arXiv

Clustering of solutions in the symmetric binary perceptron

The geometrical features of the (non-convex) loss landscape of neural network models are crucial in ensuring successful optimization and, most importantly, the capability to generalize well. While minimizers' flatness consistently correlates with good generalization, there has been little rigorous work in exploring the condition of existence of such minimizers, even in toy models. Here we consider a simple neural network model, the symmetric perceptron, with binary weights. Phrasing the learning problem as a constraint satisfaction problem, the analogous of a flat minimizer becomes a large and dense cluster of solutions, while the narrowest minimizers are isolated solutions. We perform the first steps toward the rigorous proof of the existence of a dense cluster in certain regimes of the parameters, by computing the first and second moment upper bounds for the existence of pairs of arbitrarily close solutions. Moreover, we present a non rigorous derivation of the same bounds for sets of $y$ solutions at fixed pairwise distances.

preprint2020arXiv

Critical initialisation in continuous approximations of binary neural networks

The training of stochastic neural network models with binary ($\pm1$) weights and activations via continuous surrogate networks is investigated. We derive new surrogates using a novel derivation based on writing the stochastic neural network as a Markov chain. This derivation also encompasses existing variants of the surrogates presented in the literature. Following this, we theoretically study the surrogates at initialisation. We derive, using mean field theory, a set of scalar equations describing how input signals propagate through the randomly initialised networks. The equations reveal whether so-called critical initialisations exist for each surrogate network, where the network can be trained to arbitrary depth. Moreover, we predict theoretically and confirm numerically, that common weight initialisation schemes used in standard continuous networks, when applied to the mean values of the stochastic binary weights, yield poor training performance. This study shows that, contrary to common intuition, the means of the stochastic binary weights should be initialised close to $\pm 1$, for deeper networks to be trainable.

preprint2020arXiv

Reconstruction of Pairwise Interactions using Energy-Based Models

Pairwise models like the Ising model or the generalized Potts model have found many successful applications in fields like physics, biology, and economics. Closely connected is the problem of inverse statistical mechanics, where the goal is to infer the parameters of such models given observed data. An open problem in this field is the question of how to train these models in the case where the data contain additional higher-order interactions that are not present in the pairwise model. In this work, we propose an approach based on Energy-Based Models and pseudolikelihood maximization to address these complications: we show that hybrid models, which combine a pairwise model and a neural network, can lead to significant improvements in the reconstruction of pairwise interactions. We show these improvements to hold consistently when compared to a standard approach using only the pairwise model and to an approach using only a neural network. This is in line with the general idea that simple interpretable models and complex black-box models are not necessarily a dichotomy: interpolating these two classes of models can allow to keep some advantages of both.

preprint2019arXiv

Generalized Approximate Survey Propagation for High-Dimensional Estimation

In Generalized Linear Estimation (GLE) problems, we seek to estimate a signal that is observed through a linear transform followed by a component-wise, possibly nonlinear and noisy, channel. In the Bayesian optimal setting, Generalized Approximate Message Passing (GAMP) is known to achieve optimal performance for GLE. However, its performance can significantly degrade whenever there is a mismatch between the assumed and the true generative model, a situation frequently encountered in practice. In this paper, we propose a new algorithm, named Generalized Approximate Survey Propagation (GASP), for solving GLE in the presence of prior or model mis-specifications. As a prototypical example, we consider the phase retrieval problem, where we show that GASP outperforms the corresponding GAMP, reducing the reconstruction threshold and, for certain choices of its parameters, approaching Bayesian optimal performance. Furthermore, we present a set of State Evolution equations that exactly characterize the dynamics of GASP in the high-dimensional limit.

preprint2019arXiv

New loop expansion for the Random Magnetic Field Ising Ferromagnets at zero temperature

We apply to the Random Field Ising Model at zero temperature (T= 0) the perturbative loop expansion around the Bethe solution. A comparison with the standard epsilon-expansion is made, highlighting the key differences that make the new expansion much more appropriate to correctly describe strongly disordered systems, especially those controlled by a T = 0 RG fixed point. This new loop expansion produces an effective theory with cubic vertices. We compute the one-loop corrections due to cubic vertices, finding new terms that are absent in the epsilon-expansion. However, these new terms are subdominant with respect to the standard, supersymmetric ones, therefore dimensional reduction is still valid at this order of the loop expansion.

preprint2016arXiv

Learning may need only a few bits of synaptic precision

Learning in neural networks poses peculiar challenges when using discretized rather then continuous synaptic states. The choice of discrete synapses is motivated by biological reasoning and experiments, and possibly by hardware implementation considerations as well. In this paper we extend a previous large deviations analysis which unveiled the existence of peculiar dense regions in the space of synaptic states which accounts for the possibility of learning efficiently in networks with binary synapses. We extend the analysis to synapses with multiple states and generally more plausible biological features. The results clearly indicate that the overall qualitative picture is unchanged with respect to the binary case, and very robust to variation of the details of the model. We also provide quantitative results which suggest that the advantages of increasing the synaptic precision (i.e.~the number of internal synaptic states) rapidly vanish after the first few bits, and therefore that, for practical applications, only few bits may be needed for near-optimal performance, consistently with recent biological findings. Finally, we demonstrate how the theoretical analysis can be exploited to design efficient algorithmic search strategies.

preprint2016arXiv

Local entropy as a measure for sampling solutions in Constraint Satisfaction Problems

We introduce a novel Entropy-driven Monte Carlo (EdMC) strategy to efficiently sample solutions of random Constraint Satisfaction Problems (CSPs). First, we extend a recent result that, using a large-deviation analysis, shows that the geometry of the space of solutions of the Binary Perceptron Learning Problem (a prototypical CSP), contains regions of very high-density of solutions. Despite being sub-dominant, these regions can be found by optimizing a local entropy measure. Building on these results, we construct a fast solver that relies exclusively on a local entropy estimate, and can be applied to general CSPs. We describe its performance not only for the Perceptron Learning Problem but also for the random $K$-Satisfiabilty Problem (another prototypical CSP with a radically different structure), and show numerically that a simple zero-temperature Metropolis search in the smooth local entropy landscape can reach sub-dominant clusters of optimal solutions in a small number of steps, while standard Simulated Annealing either requires extremely long cooling procedures or just fails. We also discuss how the EdMC can heuristically be made even more efficient for the cases we studied.

preprint2016arXiv

Unreasonable Effectiveness of Learning Neural Networks: From Accessible States and Robust Ensembles to Basic Algorithmic Schemes

In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost-function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare - but extremely dense and accessible - regions of configurations in the network weight space. We define a novel measure, which we call the "robust ensemble" (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models, and also provide a general algorithmic scheme which is straightforward to implement: define a cost-function given by a sum of a finite number of replicas of the original cost-function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful new algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems.

preprint2015arXiv

Subdominant Dense Clusters Allow for Simple Learning and High Computational Performance in Neural Networks with Discrete Synapses

We show that discrete synaptic weights can be efficiently used for learning in large scale neural systems, and lead to unanticipated computational performance. We focus on the representative case of learning random patterns with binary synapses in single layer networks. The standard statistical analysis shows that this problem is exponentially dominated by isolated solutions that are extremely hard to find algorithmically. Here, we introduce a novel method that allows us to find analytical evidence for the existence of subdominant and extremely dense regions of solutions. Numerical experiments confirm these findings. We also show that the dense regions are surprisingly accessible by simple learning protocols, and that these synaptic configurations are robust to perturbations and generalize better than typical solutions. These outcomes extend to synapses with multiple states and to deeper neural architectures. The large deviation measure also suggests how to design novel algorithmic schemes for optimization based on local entropy maximization.

preprint2014arXiv

Anomalous finite size corrections in random field models

The presence of a random magnetic field in ferromagnetic systems leads, in the broken phase, to an anomalous $O(\sqrt{1/N})$ convergence of some thermodynamic quantities to their asymptotic limits. Here we show a general method, based on the replica trick, to compute analytically the $O(\sqrt{1/N})$ finite size correction to the average free energy. We apply this method to two mean field Ising models, fully connected and random regular graphs, and compare the results to exact numerical algorithms. We argue that this behaviour is present in finite dimensional models as well.

preprint2014arXiv

Finite size corrections to disordered Ising models on Random Regular Graphs

We derive the analytical expression for the first finite size correction to the average free energy of disordered Ising models on random regular graphs. The formula can be physically interpreted as a weighted sum over all non self-intersecting loops in the graph, the weight being the free-energy shift due to the addition of the loop to an infinite tree.

preprint2014arXiv

One-dimensional disordered Ising models by replica and cavity methods

Using a formalism based on the spectral decomposition of the replicated transfer matrix for disordered Ising models, we obtain several results that apply both to isolated one-dimensional systems and to locally tree-like graph and factor graph (p-spin) ensembles. We present exact analytical expressions, which can be efficiently approximated numerically, for many types of correlation functions and for the average free energies of open and closed finite chains. All the results achieved, with the exception of those involving closed chains, are then rigorously derived without replicas, using a probabilistic approach with the same flavour of cavity method.

preprint2014arXiv

Scaling hypothesis for the Euclidean bipartite matching problem

We propose a simple yet very predictive form, based on a Poisson's equation, for the functional dependence of the cost from the density of points in the Euclidean bipartite matching problem. This leads, for quadratic costs, to the analytic prediction of the large $N$ limit of the average cost in dimension $d=1,2$ and of the subleading correction in higher dimension. A non-trivial scaling exponent, $γ_d=\frac{d-2}{d}$, which differs from the monopartite's one, is found for the subleading correction. We argue that the same scaling holds true for a generic cost exponent in dimension $d>2$.

preprint2013arXiv

Finite size corrections to disordered systems on Erdös-Rényi random graphs

We study the finite size corrections to the free energy density in disorder spin systems on sparse random graphs, using both replica theory and cavity method. We derive an analytical expressions for the $O(1/N)$ corrections in the replica symmetric phase as a linear combination of the free energies of open and closed chains. We perform a numerical check of the formulae on the Random Field Ising Model at zero temperature, by computing finite size corrections to the ground state energy density.

preprint2013arXiv

The statistical mechanics of random set packing and a generalization of the Karp-Sipser algorithm

We analyse the asymptotic behaviour of random instances of the Maximum Set Packing (MSP) optimization problem, also known as Maximum Matching or Maximum Strong Independent Set on Hypergraphs. We give an analytical prediction of the MSPs size using the 1RSB cavity method from statistical mechanics of disordered systems. We also propose a heuristic algorithm, a generalization of the celebrated Karp-Sipser one, which allows us to rigorously prove that the replica symmetric cavity method prediction is exact for certain problem ensembles and breaks down when a core survives the leaf removal process. The $e$-phenomena threshold discovered by Karp and Sipser, marking the onset of core emergence and of replica symmetry breaking, is elegantly generalized to $c_s = \frac{e}{d-1}$ for one of the ensembles considered, where $d$ is the size of the sets.

Carlo Lucibello

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

Information-theoretical analysis of the neural code for decoupled face representation

Deep learning via message passing algorithms based on belief propagation

Unexpected upper critical dimension for spin glass models in a field predicted by the loop expansion around the Bethe solution at zero temperature

Entropic gradient descent algorithms and wide flat minima

Clustering of solutions in the symmetric binary perceptron

Critical initialisation in continuous approximations of binary neural networks

Reconstruction of Pairwise Interactions using Energy-Based Models

Generalized Approximate Survey Propagation for High-Dimensional Estimation

New loop expansion for the Random Magnetic Field Ising Ferromagnets at zero temperature

Learning may need only a few bits of synaptic precision

Local entropy as a measure for sampling solutions in Constraint Satisfaction Problems

Unreasonable Effectiveness of Learning Neural Networks: From Accessible States and Robust Ensembles to Basic Algorithmic Schemes

Subdominant Dense Clusters Allow for Simple Learning and High Computational Performance in Neural Networks with Discrete Synapses

Anomalous finite size corrections in random field models

Finite size corrections to disordered Ising models on Random Regular Graphs

One-dimensional disordered Ising models by replica and cavity methods

Scaling hypothesis for the Euclidean bipartite matching problem

Finite size corrections to disordered systems on Erdös-Rényi random graphs

The statistical mechanics of random set packing and a generalization of the Karp-Sipser algorithm