Source author record

Yasser Roudi

Yasser Roudi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.dis-nn physics.data-an cond-mat.stat-mech Neurons and Cognition Quantitative Methods Machine Learning Methodology nlin.AO q-fin.ST

Catalog footprint

What is connected

20works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Activation Functions, Statistics and Learning of Higher-Order Interactions in Restricted Boltzmann Machines

The great success of neural networks in recognizing hidden patterns and correlations in complex data lies in the way they take advantage of the large number of parameters and nonlinear single-unit activation, jointly. Restricted Boltzmann Machines (RBMs) provide a simple yet powerful framework for studying the impact of activation nonlinearities on performance and representation. In this work, we exploit the duality between RBMs and models of interacting binary variables to study the statistics of the interactions induced by RBM ensembles with different hidden unit activation functions. We characterize the space of representable models analytically in terms of moments of the distribution of induced interactions for four commonly used activation functions: Linear, Step, ReLU, and Exponential. Quantitative predictions of the analytical calculations on learning show a very good agreement with results of the simulations of the training process. In particular, our analysis shows that there are certain data structures, namely those generated by models of interacting variables with large interaction terms beyond pairwise, that are difficult to represent, and thus to learn, for any RBM. Yet, we find that rapidly increasing nonlinearities, such as the Exponential function, can facilitate the representation and learning of such data structures for a specific range of parameters that is determined analytically.

preprint2022arXiv

Bayesian interpolation for power laws in neural data analysis

Power laws arise in a variety of phenomena ranging from matter undergoing phase transition to the distribution of word frequencies in the English language. Usually, their presence is only apparent when data is abundant, and accurately determining their exponents often requires even larger amounts of data. As the scale of recordings in neuroscience becomes larger, an increasing number of studies attempt to characterise potential power-law relationships in neural data. In this paper, we aim to discuss the potential pitfalls that one faces in such efforts and to promote a Bayesian interpolation framework for this purpose. We apply this framework to synthetic data and to data from a recent study of large-scale recordings in mouse primary visual cortex (V1), where the exponent of a power-law scaling in the data played an important role: its value was argued to determine whether the population's stimulus-response relationship is smooth, and experimental data was provided to confirm that this is indeed so. Our analysis shows that with such data types and sizes as we consider here, the best-fit values found for the parameters of the power law and the uncertainty for these estimates are heavily dependent on the noise model assumed for the estimation, the range of the data chosen, and (with all other things being equal) the particular recordings. It is thus challenging to offer a reliable statement about the exponents of the power law. Our analysis, however, shows that this does not affect the conclusions regarding the smoothness of the population response to low-dimensional stimuli but casts doubt on those to natural images. We discuss the implications of this result for the neural code in the V1 and offer the approach discussed here as a framework that future studies, perhaps exploring larger ranges of data, can employ as their starting point to examine power-law scalings in neural data.

preprint2022arXiv

Quantifying Relevance in Learning and Inference

Learning is a distinctive feature of intelligent behaviour. High-throughput experimental data and Big Data promise to open new windows on complex systems such as cells, the brain or our societies. Yet, the puzzling success of Artificial Intelligence and Machine Learning shows that we still have a poor conceptual understanding of learning. These applications push statistical inference into uncharted territories where data is high-dimensional and scarce, and prior information on "true" models is scant if not totally absent. Here we review recent progress on understanding learning, based on the notion of "relevance". The relevance, as we define it here, quantifies the amount of information that a dataset or the internal representation of a learning machine contains on the generative model of the data. This allows us to define maximally informative samples, on one hand, and optimal learning machines on the other. These are ideal limits of samples and of machines, that contain the maximal amount of information about the unknown generative process, at a given resolution (or level of compression). Both ideal limits exhibit critical features in the statistical sense: Maximally informative samples are characterised by a power-law frequency distribution (statistical criticality) and optimal learning machines by an anomalously large susceptibility. The trade-off between resolution (i.e. compression) and relevance distinguishes the regime of noisy representations from that of lossy compression. These are separated by a special point characterised by Zipf's law statistics. This identifies samples obeying Zipf's law as the most compressed loss-less representations that are optimal in the sense of maximal relevance. Criticality in optimal learning machines manifests in an exponential degeneracy of energy levels, that leads to unusual thermodynamic properties.

preprint2020arXiv

Efficiency of local learning rules in threshold-linear associative networks

We derive the Gardner storage capacity for associative networks of threshold linear units, and show that with Hebbian learning they can operate closer to such Gardner bound than binary networks, and even surpass it. This is largely achieved through a sparsification of the retrieved patterns, which we analyze for theoretical and empirical distributions of activity. As reaching the optimal capacity via non-local learning rules like backpropagation requires slow and neurally implausible training procedures, our results indicate that one-shot self-organized Hebbian learning can be just as efficient.

preprint2017arXiv

Sparse model selection in the highly under-sampled regime

We propose a method for recovering the structure of a sparse undirected graphical model when very few samples are available. The method decides about the presence or absence of bonds between pairs of variable by considering one pair at a time and using a closed form formula, analytically derived by calculating the posterior probability for every possible model explaining a two body system using Jeffreys prior. The approach does not rely on the optimisation of any cost functions and consequently is much faster than existing algorithms. Despite this time and computational advantage, numerical results show that for several sparse topologies the algorithm is comparable to the best existing algorithms, and is more accurate in the presence of hidden variables. We apply this approach to the analysis of US stock market data and to neural data, in order to show its efficiency in recovering robust statistical dependencies in real data with non stationary correlations in time and space.

preprint2016arXiv

Variational perturbation and extended Plefka approaches to dynamics on random networks: the case of the kinetic Ising model

We describe and analyze some novel approaches for studying the dynamics of Ising spin glass models. We first briefly consider the variational approach based on minimizing the Kullback-Leibler divergence between independent trajectories and the real ones and note that this approach only coincides with the mean field equations from the saddle point approximation to the generating functional when the dynamics is defined through a logistic link function, which is the case for the kinetic Ising model with parallel update. We then spend the rest of the paper developing two ways of going beyond the saddle point approximation to the generating functional. In the first one, we develop a variational perturbative approximation to the generating functional by expanding the action around a quadratic function of the local fields and conjugate local fields whose parameters are optimized. We derive analytical expressions for the optimal parameters and show that when the optimization is suitably restricted, we recover the mean field equations that are exact for the fully asymmetric random couplings (Mézard and Sakellariou, 2011). However, without this restriction the results are different. We also describe an extended Plefka expansion in which in addition to the magnetization, we also fix the correlation and response functions. Finally, we numerically study the performance of these approximations for Sherrington-Kirkpatrick type couplings for various coupling strengths, degrees of coupling symmetry and external fields. We show that the dynamical equations derived from the extended Plefka expansion outperform the others in all regimes, although it is computationally more demanding. The unconstrained variational approach does not perform well in the small coupling regime, while it approaches dynamical TAP equations of (Roudi and Hertz, 2011) for strong couplings.

preprint2015arXiv

U.S. stock market interaction network as learned by the Boltzmann Machine

We study historical dynamics of joint equilibrium distribution of stock returns in the U.S. stock market using the Boltzmann distribution model being parametrized by external fields and pairwise couplings. Within Boltzmann learning framework for statistical inference, we analyze historical behavior of the parameters inferred using exact and approximate learning algorithms. Since the model and inference methods require use of binary variables, effect of this mapping of continuous returns to the discrete domain is studied. The presented analysis shows that binarization preserves market correlation structure. Properties of distributions of external fields and couplings as well as industry sector clustering structure are studied for different historical dates and moving window sizes. We found that a heavy positive tail in the distribution of couplings is responsible for the sparse market clustering structure. We also show that discrepancies between the model parameters might be used as a precursor of financial instabilities.

preprint2014arXiv

Belief-Propagation and replicas for inference and learning in a kinetic Ising model with hidden spins

We propose a new algorithm for inferring the state of hidden spins and reconstructing the connections in a synchronous kinetic Ising model, given the observed history. Focusing on the case in which the hidden spins are conditionally independent of each other given the state of observable spins, we show that calculating the likelihood of the data can be simplified by introducing a set of replicated auxiliary spins. Belief Propagation (BP) and Susceptibility Propagation (SusP) can then be used to infer the states of hidden variables and learn the couplings. We study the convergence and performance of this algorithm for networks with both Gaussian-distributed and binary bonds. We also study how the algorithm behaves as the fraction of hidden nodes and the amount of data are changed, showing that it outperforms the TAP equations for reconstructing the connections.

preprint2014arXiv

Correlations and functional connections in a population of grid cells

We study the statistics of spike trains of simultaneously recorded grid cells in freely behaving rats. We evaluate pairwise correlations between these cells and, using a generalized linear model (kinetic Ising model), study their functional connectivity. Even when we account for the covariations in firing rates due to overlapping fields, both the pairwise correlations and functional connections decay as a function of the shortest distance between the vertices of the spatial firing pattern of pairs of grid cells, i.e. their phase difference. The functional connectivity takes positive values between cells with nearby phases and approaches zero or negative values for larger phase differences. We also find similar results when, in addition to correlations due to overlapping fields, we account for correlations due to theta oscillations and head directional inputs. The inferred connections between neurons can be both negative and positive regardless of whether the cells share common spatial firing characteristics, that is, whether they belong to the same modules, or not. The mean strength of these inferred connections is close to zero, but the strongest inferred connections are found between cells of the same module. Taken together, our results suggest that grid cells in the same module do indeed form a local network of interconnected neurons with a functional connectivity that supports a role for attractor dynamics in the generation of the grid pattern.

preprint2013arXiv

Maximum likelihood reconstruction for Ising models with asynchronous updates

We describe how the couplings in an asynchronous kinetic Ising model can be inferred. We consider two cases, one in which we know both the spin history and the update times and one in which we only know the spin history. For the first case, we show that one can average over all possible choices of update times to obtain a learning rule that depends only on spin correlations and can also be derived from the equations of motion for the correlations. For the second case, the same rule can be derived within a further decoupling approximation. We study all methods numerically for fully asymmetric Sherrington-Kirkpatrick models, varying the data length, system size, temperature, and external field. Good convergence is observed in accordance with the theoretical expectations.

preprint2013arXiv

On sampling and modeling complex systems

The study of complex systems is limited by the fact that only few variables are accessible for modeling and sampling, which are not necessarily the most relevant ones to explain the systems behavior. In addition, empirical data typically under sample the space of possible states. We study a generic framework where a complex system is seen as a system of many interacting degrees of freedom, which are known only in part, that optimize a given function. We show that the underlying distribution with respect to the known variables has the Boltzmann form, with a temperature that depends on the number of unknown variables. In particular, when the unknown part of the objective function decays faster than exponential, the temperature decreases as the number of variables increases. We show in the representative case of the Gaussian distribution, that models are predictable only when the number of relevant variables is less than a critical threshold. As a further consequence, we show that the information that a sample contains on the behavior of the system is quantified by the entropy of the frequency with which different states occur. This allows us to characterize the properties of maximally informative samples: in the under-sampling regime, the most informative frequency size distributions have power law behavior and Zipf's law emerges at the crossover between the under sampled regime and the regime where the sample contains enough statistics to make inference on the behavior of the system. These ideas are illustrated in some applications, showing that they can be used to identify relevant variables or to select most informative representations of data, e.g. in data clustering.

preprint2012arXiv

L$_1$ Regularization for Reconstruction of a non-equilibrium Ising Model

The couplings in a sparse asymmetric, asynchronous Ising network are reconstructed using an exact learning algorithm. L$_1$ regularization is used to remove the spurious weak connections that would otherwise be found by simply minimizing the minus likelihood of a finite data set. In order to see how L$_1$ regularization works in detail, we perform the calculation in several ways including (1) by iterative minimization of a cost function equal to minus the log likelihood of the data plus an L$_1$ penalty term, and (2) an approximate scheme based on a quadratic expansion of the cost function around its minimum. In these schemes, we track how connections are pruned as the strength of the L$_1$ penalty is increased from zero to large values. The performance of the methods for various coupling strengths is quantified using ROC curves.

preprint2011arXiv

Dynamical TAP equations for non-equilibrium Ising spin glasses

We derive and study dynamical TAP equations for Ising spin glasses obeying both synchronous and asynchronous dynamics using a generating functional approach. The system can have an asymmetric coupling matrix, and the external fields can be time-dependent. In the synchronously updated model, the TAP equations take the form of self consistent equations for magnetizations at time $t+1$, given the magnetizations at time $t$. In the asynchronously updated model, the TAP equations determine the time derivatives of the magnetizations at each time, again via self consistent equations, given the current values of the magnetizations. Numerical simulations suggest that the TAP equations become exact for large systems.

preprint2011arXiv

Effect of coupling asymmetry on mean-field solutions of direct and inverse Sherrington-Kirkpatrick model

We study how the degree of symmetry in the couplings influences the performance of three mean field methods used for solving the direct and inverse problems for generalized Sherrington-Kirkpatrick models. In this context, the direct problem is predicting the potentially time-varying magnetizations. The three theories include the first and second order Plefka expansions, referred to as naive mean field (nMF) and TAP, respectively, and a mean field theory which is exact for fully asymmetric couplings. We call the last of these simply MF theory. We show that for the direct problem, nMF performs worse than the other two approximations, TAP outperforms MF when the coupling matrix is nearly symmetric, while MF works better when it is strongly asymmetric. For the inverse problem, MF performs better than both TAP and nMF, although an ad hoc adjustment of TAP can make it comparable to MF. For high temperatures the performance of TAP and MF approach each other.

preprint2011arXiv

Ising Models for Inferring Network Structure From Spike Data

Now that spike trains from many neurons can be recorded simultaneously, there is a need for methods to decode these data to learn about the networks that these neurons are part of. One approach to this problem is to adjust the parameters of a simple model network to make its spike trains resemble the data as much as possible. The connections in the model network can then give us an idea of how the real neurons that generated the data are connected and how they influence each other. In this chapter we describe how to do this for the simplest kind of model: an Ising network. We derive algorithms for finding the best model connection strengths for fitting a given data set, as well as faster approximate algorithms based on mean field theory. We test the performance of these algorithms on data from model networks and experiments.

preprint2011arXiv

Mean Field Theory For Non-Equilibrium Network Reconstruction

There has been recent progress on the problem of inferring the structure of interactions in complex networks when they are in stationary states satisfying detailed balance, but little has been done for non-equilibrium systems. Here we introduce an approach to this problem, considering, as an example, the question of recovering the interactions in an asymmetrically-coupled, synchronously-updated Sherrington-Kirkpatrick model. We derive an exact iterative inversion algorithm and develop efficient approximations based on dynamical mean-field and Thouless-Anderson-Palmer equations that express the interactions in terms of equal-time and one time step-delayed correlation functions.

preprint2011arXiv

Role of correlations in population coding

Correlations among spikes, both on the same neuron and across neurons, are ubiquitous in the brain. For example cross-correlograms can have large peaks, at least in the periphery, and smaller -- but still non-negligible -- ones in cortex, and auto-correlograms almost always exhibit non-trivial temporal structure at a range of timescales. Although this has been known for over forty years, it's still not clear what role these correlations play in the brain -- and, indeed, whether they play any role at all. The goal of this chapter is to shed light on this issue by reviewing some of the work on this subject.

preprint2010arXiv

Dynamics and Performance of Susceptibility Propagation on Synthetic Data

We study the performance and convergence properties of the Susceptibility Propagation (SusP) algorithm for solving the Inverse Ising problem. We first study how the temperature parameter (T) in a Sherrington-Kirkpatrick model generating the data influences the performance and convergence of the algorithm. We find that at the high temperature regime (T>4), the algorithm performs well and its quality is only limited by the quality of the supplied data. In the low temperature regime (T<4), we find that the algorithm typically does not converge, yielding diverging values for the couplings. However, we show that by stopping the algorithm at the right time before divergence becomes serious, good reconstruction can be achieved down to T~2. We then show that dense connectivity, loopiness of the connectivity, and high absolute magnetization all have deteriorating effects on the performance of the algorithm. When absolute magnetization is high, we show that other methods can be work better than SusP. Finally, we show that for neural data with high absolute magnetization, SusP performs less well than TAP inversion.

preprint2008arXiv

Pairwise maximum entropy models for studying large biological systems: when they can and when they can't work

One of the most critical problems we face in the study of biological systems is building accurate statistical descriptions of them. This problem has been particularly challenging because biological systems typically contain large numbers of interacting elements, which precludes the use of standard brute force approaches. Recently, though, several groups have reported that there may be an alternate strategy. The reports show that reliable statistical models can be built without knowledge of all the interactions in a system; instead, pairwise interactions can suffice. These findings, however, are based on the analysis of small subsystems. Here we ask whether the observations will generalize to systems of realistic size, that is, whether pairwise models will provide reliable descriptions of true biological systems. Our results show that, in most cases, they will not. The reason is that there is a crossover in the predictive power of pairwise models: If the size of the subsystem is below the crossover point, then the results have no predictive power for large systems. If the size is above the crossover point, the results do have predictive power. This work thus provides a general framework for determining the extent to which pairwise models can be used to predict the behavior of whole biological systems. Applied to neural data, the size of most systems studied so far is below the crossover point.

preprint2007arXiv

A balanced memory network

A fundamental problem in neuroscience is understanding how working memory -- the ability to store information at intermediate timescales, like 10s of seconds -- is implemented in realistic neuronal networks. The most likely candidate mechanism is the attractor network, and a great deal of effort has gone toward investigating it theoretically. Yet, despite almost a quarter century of intense work, attractor networks are not fully understood. In particular, there are still two unanswered questions. First, how is it that attractor networks exhibit irregular firing, as is observed experimentally during working memory tasks? And second, how many memories can be stored under biologically realistic conditions? Here we answer both questions by studying an attractor neural network in which inhibition and excitation balance each other. Using mean field analysis, we derive a three-variable description of attractor networks. From this description it follows that irregular firing can exist only if the number of neurons involved in a memory is large. The same mean field analysis also shows that the number of memories that can be stored in a network scales with the number of excitatory connections, a result that has been suggested for simple models but never shown for realistic ones. Both of these predictions are verified using simulations with large networks of spiking neurons.

Yasser Roudi

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

Activation Functions, Statistics and Learning of Higher-Order Interactions in Restricted Boltzmann Machines

Bayesian interpolation for power laws in neural data analysis

Quantifying Relevance in Learning and Inference

Efficiency of local learning rules in threshold-linear associative networks

Sparse model selection in the highly under-sampled regime

Variational perturbation and extended Plefka approaches to dynamics on random networks: the case of the kinetic Ising model

U.S. stock market interaction network as learned by the Boltzmann Machine

Belief-Propagation and replicas for inference and learning in a kinetic Ising model with hidden spins

Correlations and functional connections in a population of grid cells

Maximum likelihood reconstruction for Ising models with asynchronous updates

On sampling and modeling complex systems

L$_1$ Regularization for Reconstruction of a non-equilibrium Ising Model

Dynamical TAP equations for non-equilibrium Ising spin glasses

Effect of coupling asymmetry on mean-field solutions of direct and inverse Sherrington-Kirkpatrick model

Ising Models for Inferring Network Structure From Spike Data

Mean Field Theory For Non-Equilibrium Network Reconstruction

Role of correlations in population coding

Dynamics and Performance of Susceptibility Propagation on Synthetic Data

Pairwise maximum entropy models for studying large biological systems: when they can and when they can't work

A balanced memory network