Researcher profile

Yasser Roudi

Yasser Roudi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

Activation Functions, Statistics and Learning of Higher-Order Interactions in Restricted Boltzmann Machines

The great success of neural networks in recognizing hidden patterns and correlations in complex data lies in the way they take advantage of the large number of parameters and nonlinear single-unit activation, jointly. Restricted Boltzmann Machines (RBMs) provide a simple yet powerful framework for studying the impact of activation nonlinearities on performance and representation. In this work, we exploit the duality between RBMs and models of interacting binary variables to study the statistics of the interactions induced by RBM ensembles with different hidden unit activation functions. We characterize the space of representable models analytically in terms of moments of the distribution of induced interactions for four commonly used activation functions: Linear, Step, ReLU, and Exponential. Quantitative predictions of the analytical calculations on learning show a very good agreement with results of the simulations of the training process. In particular, our analysis shows that there are certain data structures, namely those generated by models of interacting variables with large interaction terms beyond pairwise, that are difficult to represent, and thus to learn, for any RBM. Yet, we find that rapidly increasing nonlinearities, such as the Exponential function, can facilitate the representation and learning of such data structures for a specific range of parameters that is determined analytically.

preprint2022arXiv

Bayesian interpolation for power laws in neural data analysis

Power laws arise in a variety of phenomena ranging from matter undergoing phase transition to the distribution of word frequencies in the English language. Usually, their presence is only apparent when data is abundant, and accurately determining their exponents often requires even larger amounts of data. As the scale of recordings in neuroscience becomes larger, an increasing number of studies attempt to characterise potential power-law relationships in neural data. In this paper, we aim to discuss the potential pitfalls that one faces in such efforts and to promote a Bayesian interpolation framework for this purpose. We apply this framework to synthetic data and to data from a recent study of large-scale recordings in mouse primary visual cortex (V1), where the exponent of a power-law scaling in the data played an important role: its value was argued to determine whether the population's stimulus-response relationship is smooth, and experimental data was provided to confirm that this is indeed so. Our analysis shows that with such data types and sizes as we consider here, the best-fit values found for the parameters of the power law and the uncertainty for these estimates are heavily dependent on the noise model assumed for the estimation, the range of the data chosen, and (with all other things being equal) the particular recordings. It is thus challenging to offer a reliable statement about the exponents of the power law. Our analysis, however, shows that this does not affect the conclusions regarding the smoothness of the population response to low-dimensional stimuli but casts doubt on those to natural images. We discuss the implications of this result for the neural code in the V1 and offer the approach discussed here as a framework that future studies, perhaps exploring larger ranges of data, can employ as their starting point to examine power-law scalings in neural data.

preprint2022arXiv

Quantifying Relevance in Learning and Inference

Learning is a distinctive feature of intelligent behaviour. High-throughput experimental data and Big Data promise to open new windows on complex systems such as cells, the brain or our societies. Yet, the puzzling success of Artificial Intelligence and Machine Learning shows that we still have a poor conceptual understanding of learning. These applications push statistical inference into uncharted territories where data is high-dimensional and scarce, and prior information on "true" models is scant if not totally absent. Here we review recent progress on understanding learning, based on the notion of "relevance". The relevance, as we define it here, quantifies the amount of information that a dataset or the internal representation of a learning machine contains on the generative model of the data. This allows us to define maximally informative samples, on one hand, and optimal learning machines on the other. These are ideal limits of samples and of machines, that contain the maximal amount of information about the unknown generative process, at a given resolution (or level of compression). Both ideal limits exhibit critical features in the statistical sense: Maximally informative samples are characterised by a power-law frequency distribution (statistical criticality) and optimal learning machines by an anomalously large susceptibility. The trade-off between resolution (i.e. compression) and relevance distinguishes the regime of noisy representations from that of lossy compression. These are separated by a special point characterised by Zipf's law statistics. This identifies samples obeying Zipf's law as the most compressed loss-less representations that are optimal in the sense of maximal relevance. Criticality in optimal learning machines manifests in an exponential degeneracy of energy levels, that leads to unusual thermodynamic properties.

preprint2020arXiv

Efficiency of local learning rules in threshold-linear associative networks

We derive the Gardner storage capacity for associative networks of threshold linear units, and show that with Hebbian learning they can operate closer to such Gardner bound than binary networks, and even surpass it. This is largely achieved through a sparsification of the retrieved patterns, which we analyze for theoretical and empirical distributions of activity. As reaching the optimal capacity via non-local learning rules like backpropagation requires slow and neurally implausible training procedures, our results indicate that one-shot self-organized Hebbian learning can be just as efficient.

preprint2010arXiv

Dynamics and Performance of Susceptibility Propagation on Synthetic Data

We study the performance and convergence properties of the Susceptibility Propagation (SusP) algorithm for solving the Inverse Ising problem. We first study how the temperature parameter (T) in a Sherrington-Kirkpatrick model generating the data influences the performance and convergence of the algorithm. We find that at the high temperature regime (T>4), the algorithm performs well and its quality is only limited by the quality of the supplied data. In the low temperature regime (T<4), we find that the algorithm typically does not converge, yielding diverging values for the couplings. However, we show that by stopping the algorithm at the right time before divergence becomes serious, good reconstruction can be achieved down to T~2. We then show that dense connectivity, loopiness of the connectivity, and high absolute magnetization all have deteriorating effects on the performance of the algorithm. When absolute magnetization is high, we show that other methods can be work better than SusP. Finally, we show that for neural data with high absolute magnetization, SusP performs less well than TAP inversion.