Source author record

Riccardo Zecchina

Riccardo Zecchina appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

31works

25topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Dynamical Learning in Deep Asymmetric Recurrent Neural Networks

We investigate recurrent neural networks with asymmetric interactions and demonstrate that the inclusion of self-couplings or sparse excitatory inter-module connections leads to the emergence of a densely connected manifold of dynamically accessible stable configurations. This representation manifold is exponentially large in system size and is reachable through simple local dynamics, despite constituting a subdominant subset of the global configuration space. We further show that learning can be implemented directly on this structure via a fully local, gradient-free mechanism that selectively stabilizes a single task-relevant network configuration. Unlike error-driven or contrastive learning schemes, this approach does not require explicit comparisons between network states obtained with and without output supervision. Instead, transient supervisory signals bias the dynamics toward the representation manifold, after which local plasticity consolidates the attained configuration, effectively shaping the latent representation space. Numerical evaluations on standard image classification benchmarks indicate performance comparable to that of multilayer perceptrons trained using backpropagation. More generally, these results suggest that the dynamical accessibility of fixed points and the stabilization of internal network dynamics constitute viable alternative principles for learning in recurrent systems, with conceptual links to statistical physics and potential implications for biologically motivated and neuromorphic computing architectures.

preprint2022arXiv

Deep learning via message passing algorithms based on belief propagation

Message-passing algorithms based on the Belief Propagation (BP) equations constitute a well-known distributed computational scheme. It is exact on tree-like graphical models and has also proven to be effective in many problems defined on graphs with loops (from inference to optimization, from signal processing to clustering). The BP-based scheme is fundamentally different from stochastic gradient descent (SGD), on which the current success of deep networks is based. In this paper, we present and adapt to mini-batch training on GPUs a family of BP-based message-passing algorithms with a reinforcement field that biases distributions towards locally entropic solutions. These algorithms are capable of training multi-layer neural networks with discrete weights and activations with performance comparable to SGD-inspired heuristics (BinaryNet) and are naturally well-adapted to continual learning. Furthermore, using these algorithms to estimate the marginals of the weights allows us to make approximate Bayesian predictions that have higher accuracy than point-wise solutions.

preprint2022arXiv

Learning through atypical "phase transitions" in overparameterized neural networks

Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of prediction accuracy without overfitting. These are formidable results that defy predictions of statistical learning and pose conceptual challenges for non-convex optimization. In this paper, we use methods from statistical physics of disordered systems to analytically study the computational fallout of overparameterization in non-convex binary neural network models, trained on data generated from a structurally simpler but "hidden" network. As the number of connection weights increases, we follow the changes of the geometrical structure of different minima of the error loss function and relate them to learning and generalization performance. A first transition happens at the so-called interpolation point, when solutions begin to exist (perfect fitting becomes possible). This transition reflects the properties of typical solutions, which however are in sharp minima and hard to sample. After a gap, a second transition occurs, with the discontinuous appearance of a different kind of "atypical" structures: wide regions of the weight space that are particularly solution-dense and have good generalization properties. The two kinds of solutions coexist, with the typical ones being exponentially more numerous, but empirically we find that efficient algorithms sample the atypical, rare ones. This suggests that the atypical phase transition is the relevant one for learning. The results of numerical tests with realistic networks on observables suggested by the theory are consistent with this scenario.

preprint2022arXiv

Unveiling the structure of wide flat minima in neural networks

The success of deep learning has revealed the application potential of neural networks across the sciences and opened up fundamental theoretical problems. In particular, the fact that learning algorithms based on simple variants of gradient methods are able to find near-optimal minima of highly nonconvex loss functions is an unexpected feature of neural networks. Moreover, such algorithms are able to fit the data even in the presence of noise, and yet they have excellent predictive capabilities. Several empirical results have shown a reproducible correlation between the so-called flatness of the minima achieved by the algorithms and the generalization performance. At the same time, statistical physics results have shown that in nonconvex networks a multitude of narrow minima may coexist with a much smaller number of wide flat minima, which generalize well. Here we show that wide flat minima arise as complex extensive structures, from the coalescence of minima around "high-margin" (i.e., locally robust) configurations. Despite being exponentially rare compared to zero-margin ones, high-margin minima tend to concentrate in particular regions. These minima are in turn surrounded by other solutions of smaller and smaller margin, leading to dense regions of solutions over long distances. Our analysis also provides an alternative analytical method for estimating when flat minima appear and when algorithms begin to find solutions, as the number of model parameters varies.

preprint2021arXiv

Entropic gradient descent algorithms and wide flat minima

The properties of flat minima in the empirical risk landscape of neural networks have been debated for some time. Increasing evidence suggests they possess better generalization capabilities with respect to sharp ones. First, we discuss Gaussian mixture classification models and show analytically that there exist Bayes optimal pointwise estimators which correspond to minimizers belonging to wide flat regions. These estimators can be found by applying maximum flatness algorithms either directly on the classifier (which is norm independent) or on the differentiable loss function used in learning. Next, we extend the analysis to the deep learning scenario by extensive numerical validations. Using two algorithms, Entropy-SGD and Replicated-SGD, that explicitly include in the optimization objective a non-local flatness measure known as local entropy, we consistently improve the generalization error for common architectures (e.g. ResNet, EfficientNet). An easy to compute flatness measure shows a clear correlation with test accuracy.

preprint2020arXiv

Clustering of solutions in the symmetric binary perceptron

The geometrical features of the (non-convex) loss landscape of neural network models are crucial in ensuring successful optimization and, most importantly, the capability to generalize well. While minimizers' flatness consistently correlates with good generalization, there has been little rigorous work in exploring the condition of existence of such minimizers, even in toy models. Here we consider a simple neural network model, the symmetric perceptron, with binary weights. Phrasing the learning problem as a constraint satisfaction problem, the analogous of a flat minimizer becomes a large and dense cluster of solutions, while the narrowest minimizers are isolated solutions. We perform the first steps toward the rigorous proof of the existence of a dense cluster in certain regimes of the parameters, by computing the first and second moment upper bounds for the existence of pairs of arbitrarily close solutions. Moreover, we present a non rigorous derivation of the same bounds for sets of $y$ solutions at fixed pairwise distances.

preprint2020arXiv

Shaping the learning landscape in neural networks around wide flat minima

Learning in Deep Neural Networks (DNN) takes place by minimizing a non-convex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points, and that such minimizers are often satisfactory at avoiding overfitting. How these two features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far reaching open question. In this paper we study basic non-convex one- and two-layer neural network models which learn random patterns, and derive a number of basic geometrical and algorithmic features which suggest some answers. We first show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points. We then show that the minimizers of the cross-entropy loss function overlap with the WFM of the error loss. We also show examples of learning devices for which WFM do not exist. From the algorithmic perspective we derive entropy driven greedy and message passing algorithms which focus their search on wide flat regions of minimizers. In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM. We corroborate the results by a numerical study of the correlations between the volumes of the minimizers, their Hessian and their generalization performance on real data.

preprint2020arXiv

Wide flat minima and optimal generalization in classifying high-dimensional Gaussian mixtures

We analyze the connection between minimizers with good generalizing properties and high local entropy regions of a threshold-linear classifier in Gaussian mixtures with the mean squared error loss function. We show that there exist configurations that achieve the Bayes-optimal generalization error, even in the case of unbalanced clusters. We explore analytically the error-counting loss landscape in the vicinity of a Bayes-optimal solution, and show that the closer we get to such configurations, the higher the local entropy, implying that the Bayes-optimal solution lays inside a wide flat region. We also consider the algorithmically relevant case of targeting wide flat minima of the (differentiable) mean squared error loss. Our analytical and numerical results show not only that in the balanced case the dependence on the norm of the weights is mild, but also, in the unbalanced case, that the performances can be improved.

preprint2016arXiv

Learning may need only a few bits of synaptic precision

Learning in neural networks poses peculiar challenges when using discretized rather then continuous synaptic states. The choice of discrete synapses is motivated by biological reasoning and experiments, and possibly by hardware implementation considerations as well. In this paper we extend a previous large deviations analysis which unveiled the existence of peculiar dense regions in the space of synaptic states which accounts for the possibility of learning efficiently in networks with binary synapses. We extend the analysis to synapses with multiple states and generally more plausible biological features. The results clearly indicate that the overall qualitative picture is unchanged with respect to the binary case, and very robust to variation of the details of the model. We also provide quantitative results which suggest that the advantages of increasing the synaptic precision (i.e.~the number of internal synaptic states) rapidly vanish after the first few bits, and therefore that, for practical applications, only few bits may be needed for near-optimal performance, consistently with recent biological findings. Finally, we demonstrate how the theoretical analysis can be exploited to design efficient algorithmic search strategies.

preprint2016arXiv

Local entropy as a measure for sampling solutions in Constraint Satisfaction Problems

We introduce a novel Entropy-driven Monte Carlo (EdMC) strategy to efficiently sample solutions of random Constraint Satisfaction Problems (CSPs). First, we extend a recent result that, using a large-deviation analysis, shows that the geometry of the space of solutions of the Binary Perceptron Learning Problem (a prototypical CSP), contains regions of very high-density of solutions. Despite being sub-dominant, these regions can be found by optimizing a local entropy measure. Building on these results, we construct a fast solver that relies exclusively on a local entropy estimate, and can be applied to general CSPs. We describe its performance not only for the Perceptron Learning Problem but also for the random $K$-Satisfiabilty Problem (another prototypical CSP with a radically different structure), and show numerically that a simple zero-temperature Metropolis search in the smooth local entropy landscape can reach sub-dominant clusters of optimal solutions in a small number of steps, while standard Simulated Annealing either requires extremely long cooling procedures or just fails. We also discuss how the EdMC can heuristically be made even more efficient for the cases we studied.

preprint2016arXiv

Unreasonable Effectiveness of Learning Neural Networks: From Accessible States and Robust Ensembles to Basic Algorithmic Schemes

In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost-function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare - but extremely dense and accessible - regions of configurations in the network weight space. We define a novel measure, which we call the "robust ensemble" (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models, and also provide a general algorithmic scheme which is straightforward to implement: define a cost-function given by a sum of a finite number of replicas of the original cost-function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful new algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems.

preprint2015arXiv

A three-threshold learning rule approaches the maximal capacity of recurrent neural networks

Understanding the theoretical foundations of how memories are encoded and retrieved in neural populations is a central challenge in neuroscience. A popular theoretical scenario for modeling memory function is the attractor neural network scenario, whose prototype is the Hopfield model. The model has a poor storage capacity, compared with the capacity achieved with perceptron learning algorithms. Here, by transforming the perceptron learning rule, we present an online learning rule for a recurrent neural network that achieves near-maximal storage capacity without an explicit supervisory error signal, relying only upon locally accessible information. The fully-connected network consists of excitatory binary neurons with plastic recurrent connections and non-plastic inhibitory feedback stabilizing the network dynamics; the memory patterns are presented online as strong afferent currents, producing a bimodal distribution for the neuron synaptic inputs. Synapses corresponding to active inputs are modified as a function of the value of the local fields with respect to three thresholds. Above the highest threshold, and below the lowest threshold, no plasticity occurs. In between these two thresholds, potentiation/depression occurs when the local field is above/below an intermediate threshold. We simulated and analyzed a network of binary neurons implementing this rule and measured its storage capacity for different sizes of the basins of attraction. The storage capacity obtained through numerical simulations is shown to be close to the value predicted by analytical calculations. We also measured the dependence of capacity on the strength of external inputs. Finally, we quantified the statistics of the resulting synaptic connectivity matrix, and found that both the fraction of zero weight synapses and the degree of symmetry of the weight matrix increase with the number of stored patterns.

preprint2015arXiv

Subdominant Dense Clusters Allow for Simple Learning and High Computational Performance in Neural Networks with Discrete Synapses

We show that discrete synaptic weights can be efficiently used for learning in large scale neural systems, and lead to unanticipated computational performance. We focus on the representative case of learning random patterns with binary synapses in single layer networks. The standard statistical analysis shows that this problem is exponentially dominated by isolated solutions that are extremely hard to find algorithmically. Here, we introduce a novel method that allows us to find analytical evidence for the existence of subdominant and extremely dense regions of solutions. Numerical experiments confirm these findings. We also show that the dense regions are surprisingly accessible by simple learning protocols, and that these synaptic configurations are robust to perturbations and generalize better than typical solutions. These outcomes extend to synapses with multiple states and to deeper neural architectures. The large deviation measure also suggests how to design novel algorithmic schemes for optimization based on local entropy maximization.

preprint2014arXiv

A cavity approach to optimization and inverse dynamical problems

In these two lectures we shall discuss how the cavity approach can be used efficiently to study optimization problems with global (topological) constraints and how the same techniques can be generalized to study inverse problems in irreversible dynamical processes. These two classes of problems are formally very similar: they both require an efficient procedure to trace over all trajectories of either auxiliary variables which enforce global constraints, or directly dynamical variables defining the inverse dynamical problems. We will mention three basic examples, namely the Minimum Steiner Tree problem, the inverse threshold linear dynamical problem, and the patient-zero problem in epidemic cascades. All these examples are root problems in optimization and inference over networks. They appear in many modern applications and in a variety of different contexts. Credit for these results should be shared with A. Braunstein, A. Ramezanpour, F. Altarelli, L. Dall'Asta, I. Biazzo and A. Lage-Castellanos.

preprint2014arXiv

Bayesian inference of epidemics on networks via Belief Propagation

We study several bayesian inference problems for irreversible stochastic epidemic models on networks from a statistical physics viewpoint. We derive equations which allow to accurately compute the posterior distribution of the time evolution of the state of each node given some observations. At difference with most existing methods, we allow very general observation models, including unobserved nodes, state observations made at different or unknown times, and observations of infection times, possibly mixed together. Our method, which is based on the Belief Propagation algorithm, is efficient, naturally distributed, and exact on trees. As a particular case, we consider the problem of finding the "zero patient" of a SIR or SI epidemic given a snapshot of the state of the network at a later unknown time. Numerical simulations show that our method outperforms previous ones on both synthetic and real networks, often by a very large margin.

preprint2014arXiv

Fast and accurate multivariate Gaussian modeling of protein families: Predicting residue contacts and protein-interaction partners

In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code

preprint2014arXiv

The zero-patient problem with noisy observations

A Belief Propagation approach has been recently proposed for the zero-patient problem in a SIR epidemics. The zero-patient problem consists in finding the initial source of an epidemic outbreak given observations at a later time. In this work, we study a harder but related inference problem, in which observations are noisy and there is confusion between observed states. In addition to studying the zero-patient problem, we also tackle the problem of completing and correcting the observations possibly finding undiscovered infected individuals and false test results. Moreover, we devise a set of equations, based on the variational expression of the Bethe free energy, to find the zero patient along with maximum-likelihood epidemic parameters. We show, by means of simulated epidemics, how this method is able to infer details on the past history of an epidemic outbreak based solely on the topology of the contact network and a single snapshot of partial and noisy observations.

preprint2013arXiv

On the performance of a cavity method based algorithm for the Prize-Collecting Steiner Tree Problem on graphs

We study the behavior of an algorithm derived from the cavity method for the Prize-Collecting Steiner Tree (PCST) problem on graphs. The algorithm is based on the zero temperature limit of the cavity equations and as such is formally simple (a fixed point equation resolved by iteration) and distributed (parallelizable). We provide a detailed comparison with state-of-the-art algorithms on a wide range of existing benchmarks networks and random graphs. Specifically, we consider an enhanced derivative of the Goemans-Williamson heuristics and the DHEA solver, a Branch and Cut Linear/Integer Programming based approach. The comparison shows that the cavity algorithm outperforms the two algorithms in most large instances both in running time and quality of the solution. Finally we prove a few optimality properties of the solutions provided by our algorithm, including optimality under the two post-processing procedures defined in the Goemans-Williamson derivative and global optimality in some limit cases.

preprint2013arXiv

Optimizing spread dynamics on graphs by message passing

Cascade processes are responsible for many important phenomena in natural and social sciences. Simple models of irreversible dynamics on graphs, in which nodes activate depending on the state of their neighbors, have been successfully applied to describe cascades in a large variety of contexts. Over the last decades, many efforts have been devoted to understand the typical behaviour of the cascades arising from initial conditions extracted at random from some given ensemble. However, the problem of optimizing the trajectory of the system, i.e. of identifying appropriate initial conditions to maximize (or minimize) the final number of active nodes, is still considered to be practically intractable, with the only exception of models that satisfy a sort of diminishing returns property called submodularity. Submodular models can be approximately solved by means of greedy strategies, but by definition they lack cooperative characteristics which are fundamental in many real systems. Here we introduce an efficient algorithm based on statistical physics for the optimization of trajectories in cascade processes on graphs. We show that for a wide class of irreversible dynamics, even in the absence of submodularity, the spread optimization problem can be solved efficiently on large networks. Analytic and algorithmic results on random graphs are complemented by the solution of the spread maximization problem on a real-world network (the Epinions consumer reviews network).

preprint2013arXiv

Perturbation Biology: inferring signaling networks in cellular systems

We present a new experimental-computational technology of inferring network models that predict the response of cells to perturbations and that may be useful in the design of combinatorial therapy against cancer. The experiments are systematic series of perturbations of cancer cell lines by targeted drugs, singly or in combination. The response to perturbation is measured in terms of levels of proteins and phospho-proteins and of cellular phenotype such as viability. Computational network models are derived de novo, i.e., without prior knowledge of signaling pathways, and are based on simple non-linear differential equations. The prohibitively large solution space of all possible network models is explored efficiently using a probabilistic algorithm, belief propagation, which is three orders of magnitude more efficient than Monte Carlo methods. Explicit executable models are derived for a set of perturbation experiments in Skmel-133 melanoma cell lines, which are resistant to the therapeutically important inhibition of Raf kinase. The resulting network models reproduce and extend known pathway biology. They can be applied to discover new molecular interactions and to predict the effect of novel drug perturbations, one of which is verified experimentally. The technology is suitable for application to larger systems in diverse areas of molecular biology.

preprint2013arXiv

Theory and learning protocols for the material tempotron model

Neural networks are able to extract information from the timing of spikes. Here we provide new results on the behavior of the simplest neuronal model which is able to decode information embedded in temporal spike patterns, the so called tempotron. Using statistical physics techniques we compute the capacity for the case of sparse, time-discretized input, and "material" discrete synapses, showing that the device saturates the information theoretic bounds with a statistics of output spikes that is consistent with the statistics of the inputs. We also derive two simple and highly efficient learning algorithms which are able to learn a number of associations which are close to the theoretical limit. The simplest versions of these algorithms correspond to distributed on-line protocols of interest for neuromorphic devices, and can be adapted to address the more biologically relevant continuous-time version of the classification problem, hopefully allowing for the understanding of some aspects of synaptic plasticity.

preprint2012arXiv

Modeling competing endogenous RNAs networks

MicroRNAs (miRNAs) are small RNA molecules, about 22 nucleotide long, which post-transcriptionally regulate their target messenger RNAs (mRNAs). They accomplish key roles in gene regulatory networks, ranging from signaling pathways to tissue morphogenesis, and their aberrant behavior is often associated with the development of various diseases. Recently it has been shown that, in analogy with the better understood case of small RNAs in bacteria, the way miRNAs interact with their targets can be described in terms of a titration mechanism characterized by threshold effects, hypersensitivity of the system near the threshold, and prioritized cross-talk among targets. The latter characteristic has been lately identified as competing endogenous RNA (ceRNA) effect to mark those indirect interactions among targets of a common pool of miRNAs they are in competition for. Here we analyze the equilibrium and out-of-equilibrium properties of a general stochastic model of $M$ miRNAs interacting with $N$ mRNA targets. In particular we are able to describe in details the peculiar equilibrium and non-equilibrium phenomena that the system displays around the threshold: (i) maximal cross-talk and correlation between targets, (ii) robustness of ceRNA effect with respect to the model's parameters and in particular to the catalyticity of the miRNA-mRNA interaction, and (iii) anomalous response-time to external perturbations.

preprint2011arXiv

3D Protein Structure Predicted from Sequence

The evolutionary trajectory of a protein through sequence space is constrained by function and three-dimensional (3D) structure. Residues in spatial proximity tend to co-evolve, yet attempts to invert the evolutionary record to identify these constraints and use them to computationally fold proteins have so far been unsuccessful. Here, we show that co-variation of residue pairs, observed in a large protein family, provides sufficient information to determine 3D protein structure. Using a data-constrained maximum entropy model of the multiple sequence alignment, we identify pairs of statistically coupled residue positions which are expected to be close in the protein fold, termed contacts inferred from evolutionary information (EICs). To assess the amount of information about the protein fold contained in these coupled pairs, we evaluate the accuracy of predicted 3D structures for proteins of 50-260 residues, from 15 diverse protein families, including a G-protein coupled receptor. These structure predictions are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The resulting low Cα-RMSD error range of 2.7-5.1Å, over at least 75% of the protein, indicates the potential for predicting essentially correct 3D structures for the thousands of protein families that have no known structure, provided they include a sufficiently large number of divergent sample sequences. With the current enormous growth in sequence information based on new sequencing technology, this opens the door to a comprehensive survey of protein 3D structures, including many not currently accessible to the experimental methods of structural genomics. This advance has potential applications in many biological contexts, such as synthetic biology, identification of functional sites in proteins and interpretation of the functional impact of genetic variants.

preprint2011arXiv

Belief-Propagation for Weighted b-Matchings on Arbitrary Graphs and its Relation to Linear Programs with Integer Solutions

We consider the general problem of finding the minimum weight $\bm$-matching on arbitrary graphs. We prove that, whenever the linear programming (LP) relaxation of the problem has no fractional solutions, then the belief propagation (BP) algorithm converges to the correct solution. We also show that when the LP relaxation has a fractional solution then the BP algorithm can be used to solve the LP relaxation. Our proof is based on the notion of graph covers and extends the analysis of (Bayati-Shah-Sharma 2005 and Huang-Jebara 2007}. These results are notable in the following regards: (1) It is one of a very small number of proofs showing correctness of BP without any constraint on the graph structure. (2) Variants of the proof work for both synchronous and asynchronous BP; it is the first proof of convergence and correctness of an asynchronous BP algorithm for a combinatorial optimization problem.

preprint2011arXiv

Direct-coupling analysis of residue co-evolution captures native contacts across many protein families

The similarity in the three-dimensional structures of homologous proteins imposes strong constraints on their sequence variability. It has long been suggested that the resulting correlations among amino acid compositions at different sequence positions can be exploited to infer spatial contacts within the tertiary protein structure. Crucial to this inference is the ability to disentangle direct and indirect correlations, as accomplished by the recently introduced Direct Coupling Analysis (DCA) (Weigt et al. (2009) Proc Natl Acad Sci 106:67). Here we develop a computationally efficient implementation of DCA, which allows us to evaluate the accuracy of contact prediction by DCA for a large number of protein domains, based purely on sequence information. DCA is shown to yield a large number of correctly predicted contacts, recapitulating the global structure of the contact map for the majority of the protein domains examined. Furthermore, our analysis captures clear signals beyond intra- domain residue contacts, arising, e.g., from alternative protein conformations, ligand- mediated residue couplings, and inter-domain interactions in protein oligomers. Our findings suggest that contacts predicted by DCA can be used as a reliable guide to facilitate computational predictions of alternative protein conformations, protein complex formation, and even the de novo prediction of protein domain structures, provided the existence of a large number of homologous sequences which are being rapidly made available due to advances in genome sequencing.

preprint2011arXiv

Efficient data compression from statistical physics of codes over finite fields

In this paper we discuss a novel data compression technique for binary symmetric sources based on the cavity method over a Galois Field of order q (GF(q)). We present a scheme of low complexity and near optimal empirical performance. The compression step is based on a reduction of sparse low density parity check codes over GF(q) and is done through the so called reinforced belief-propagation equations. These reduced codes appear to have a non-trivial geometrical modification of the space of codewords which makes such compression computationally feasible. The computational complexity is O(d.n.q.log(q)) per iteration, where d is the average degree of the check nodes and n is the number of bits. For our code ensemble, decompression can be done in a time linear in the code's length by a simple leaf-removal algorithm.

preprint2011arXiv

Efficient LDPC Codes over GF(q) for Lossy Data Compression

In this paper we consider the lossy compression of a binary symmetric source. We present a scheme that provides a low complexity lossy compressor with near optimal empirical performance. The proposed scheme is based on b-reduced ultra-sparse LDPC codes over GF(q). Encoding is performed by the Reinforced Belief Propagation algorithm, a variant of Belief Propagation. The computational complexity at the encoder is O(<d>.n.q.log q), where <d> is the average degree of the check nodes. For our code ensemble, decoding can be performed iteratively following the inverse steps of the leaf removal algorithm. For a sparse parity-check matrix the number of needed operations is O(n).

preprint2011arXiv

Statistical physics of optimization under uncertainty

Optimization under uncertainty deals with the problem of optimizing stochastic cost functions given some partial information on their inputs. These problems are extremely difficult to solve and yet pervade all areas of technological and natural sciences. We propose a general approach to solve such large-scale stochastic optimization problems and a Survey Propagation based algorithm that implements it. In the problems we consider some of the parameters are not known at the time of the first optimization, but are extracted later independently of each other from known distributions. As an illustration, we apply our method to the stochastic bipartite matching problem, in the two-stage and multi-stage cases. The efficiency of our approach, which does not rely on sampling techniques, allows us to validate the analytical predictions with large-scale numerical simulations.

preprint2011arXiv

Stochastic optimization by message passing

Most optimization problems in applied sciences realistically involve uncertainty in the parameters defining the cost function, of which only statistical information is known beforehand. In a recent work we introduced a message passing algorithm based on the cavity method of statistical physics to solve the two-stage matching problem with independently distributed stochastic parameters. In this paper we provide an in-depth explanation of the general method and caveats, show the details of the derivation and resulting algorithm for the matching problem and apply it to a stochastic version of the independent set problem, which is a computationally hard and relevant problem in communication networks. We compare the results with some greedy algorithms and briefly discuss the extension to more complicated stochastic multi-stage problems.

preprint2011arXiv

The stochastic matching problem

The matching problem plays a basic role in combinatorial optimization and in statistical mechanics. In its stochastic variants, optimization decisions have to be taken given only some probabilistic information about the instance. While the deterministic case can be solved in polynomial time, stochastic variants are worst-case intractable. We propose an efficient method to solve stochastic matching problems which combines some features of the survey propagation equations and of the cavity method. We test it on random bipartite graphs, for which we analyze the phase diagram and compare the results with exact bounds. Our approach is shown numerically to be effective on the full range of parameters, and to outperform state-of-the-art methods. Finally we discuss how the method can be generalized to other problems of optimization under uncertainty.

preprint2004arXiv

Minimizing energy below the glass thresholds

Focusing on the optimization version of the random K-satisfiability problem, the MAX-K-SAT problem, we study the performance of the finite energy version of the Survey Propagation (SP) algorithm. We show that a simple (linear time) backtrack decimation strategy is sufficient to reach configurations well below the lower bound for the dynamic threshold energy and very close to the analytic prediction for the optimal ground states. A comparative numerical study on one of the most efficient local search procedures is also given.

Riccardo Zecchina

What is connected

Connect this record

See the researcher in context

Building this map preview

31 published item(s)

Dynamical Learning in Deep Asymmetric Recurrent Neural Networks

Deep learning via message passing algorithms based on belief propagation

Learning through atypical "phase transitions" in overparameterized neural networks

Unveiling the structure of wide flat minima in neural networks

Entropic gradient descent algorithms and wide flat minima

Clustering of solutions in the symmetric binary perceptron

Shaping the learning landscape in neural networks around wide flat minima

Wide flat minima and optimal generalization in classifying high-dimensional Gaussian mixtures

Learning may need only a few bits of synaptic precision

Local entropy as a measure for sampling solutions in Constraint Satisfaction Problems

Unreasonable Effectiveness of Learning Neural Networks: From Accessible States and Robust Ensembles to Basic Algorithmic Schemes

A three-threshold learning rule approaches the maximal capacity of recurrent neural networks

Subdominant Dense Clusters Allow for Simple Learning and High Computational Performance in Neural Networks with Discrete Synapses

A cavity approach to optimization and inverse dynamical problems

Bayesian inference of epidemics on networks via Belief Propagation

Fast and accurate multivariate Gaussian modeling of protein families: Predicting residue contacts and protein-interaction partners

The zero-patient problem with noisy observations

On the performance of a cavity method based algorithm for the Prize-Collecting Steiner Tree Problem on graphs

Optimizing spread dynamics on graphs by message passing

Perturbation Biology: inferring signaling networks in cellular systems

Theory and learning protocols for the material tempotron model

Modeling competing endogenous RNAs networks

3D Protein Structure Predicted from Sequence

Belief-Propagation for Weighted b-Matchings on Arbitrary Graphs and its Relation to Linear Programs with Integer Solutions

Direct-coupling analysis of residue co-evolution captures native contacts across many protein families

Efficient data compression from statistical physics of codes over finite fields

Efficient LDPC Codes over GF(q) for Lossy Data Compression

Statistical physics of optimization under uncertainty

Stochastic optimization by message passing

The stochastic matching problem

Minimizing energy below the glass thresholds