Source author record

Guillaume Lajoie

Guillaume Lajoie appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Neurons and Cognition math.DS nlin.CD Artificial Intelligence Neural and Evolutionary Computing eess.SP Biological Physics

Catalog footprint

What is connected

17works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rules

To unveil how the brain learns, ongoing work seeks biologically-plausible approximations of gradient descent algorithms for training recurrent neural networks (RNNs). Yet, beyond task accuracy, it is unclear if such learning rules converge to solutions that exhibit different levels of generalization than their nonbiologically-plausible counterparts. Leveraging results from deep learning theory based on loss landscape curvature, we ask: how do biologically-plausible gradient approximations affect generalization? We first demonstrate that state-of-the-art biologically-plausible learning rules for training RNNs exhibit worse and more variable generalization performance compared to their machine learning counterparts that follow the true gradient more closely. Next, we verify that such generalization performance is correlated significantly with loss landscape curvature, and we show that biologically-plausible learning rules tend to approach high-curvature regions in synaptic weight space. Using tools from dynamical systems, we derive theoretical arguments and present a theorem explaining this phenomenon. This predicts our numerical results, and explains why biologically-plausible rules lead to worse and more variable generalization properties. Finally, we suggest potential remedies that could be used by the brain to mitigate this effect. To our knowledge, our analysis is the first to identify the reason for this generalization gap between artificial and biologically-plausible learning rules, which can help guide future investigations into how the brain learns solutions that generalize.

preprint2022arXiv

Clarifying MCMC-based training of modern EBMs : Contrastive Divergence versus Maximum Likelihood

The Energy-Based Model (EBM) framework is a very general approach to generative modeling that tries to learn and exploit probability distributions only defined though unnormalized scores. It has risen in popularity recently thanks to the impressive results obtained in image generation by parameterizing the distribution with Convolutional Neural Networks (CNN). However, the motivation and theoretical foundations behind modern EBMs are often absent from recent papers and this sometimes results in some confusion. In particular, the theoretical justifications behind the popular MCMC-based learning algorithm Contrastive Divergence (CD) are often glossed over and we find that this leads to theoretical errors in recent influential papers (Du & Mordatch, 2019; Du et al., 2020). After offering a first-principles introduction of MCMC-based training, we argue that the learning algorithm they use can in fact not be described as CD and reinterpret theirs methods in light of a new interpretation. Finally, we discuss the implications of our new interpretation and provide some illustrative experiments.

preprint2022arXiv

Compositional Attention: Disentangling Search and Retrieval

Multi-head, key-value attention is the backbone of the widely successful Transformer model and its variants. This attention mechanism uses multiple parallel key-value attention blocks (called heads), each performing two fundamental computations: (1) search - selection of a relevant entity from a set via query-key interactions, and (2) retrieval - extraction of relevant features from the selected entity via a value matrix. Importantly, standard attention heads learn a rigid mapping between search and retrieval. In this work, we first highlight how this static nature of the pairing can potentially: (a) lead to learning of redundant parameters in certain tasks, and (b) hinder generalization. To alleviate this problem, we propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure. The proposed mechanism disentangles search and retrieval and composes them in a dynamic, flexible and context-dependent manner through an additional soft competition stage between the query-key combination and value pairing. Through a series of numerical experiments, we show that it outperforms standard multi-head attention on a variety of tasks, including some out-of-distribution settings. Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed. Our proposed mechanism generalizes multi-head attention, allows independent scaling of search and retrieval, and can easily be implemented in lieu of standard attention heads in any network architecture.

preprint2022arXiv

Efficient and robust multi-task learning in the brain with modular latent primitives

Biological agents do not have infinite resources to learn new things. For this reason, a central aspect of human learning is the ability to recycle previously acquired knowledge in a way that allows for faster, less resource-intensive acquisition of new skills. In spite of that, how neural networks in the brain leverage existing knowledge to learn new computations is not well understood. In this work, we study this question in artificial recurrent neural networks (RNNs) trained on a corpus of commonly used neuroscience tasks. Combining brain-inspired inductive biases we call functional and structural, we propose a system that learns new tasks by building on top of pre-trained latent dynamics organised into separate recurrent modules. These modules, acting as prior knowledge acquired previously through evolution or development, are pre-trained on the statistics of the full corpus of tasks so as to be independent and maximally informative. The resulting model, we call a Modular Latent Primitives (MoLaP) network, allows for learning multiple tasks while keeping parameter counts, and updates, low. We also show that the skills acquired with our approach are more robust to a broad range of perturbations compared to those acquired with other multi-task learning strategies, and that generalisation to new tasks is facilitated. This work offers a new perspective on achieving efficient multi-task learning in the brain, illustrating the benefits of leveraging pre-trained latent dynamical primitives.

preprint2022arXiv

Embedding Signals on Knowledge Graphs with Unbalanced Diffusion Earth Mover's Distance

In modern relational machine learning it is common to encounter large graphs that arise via interactions or similarities between observations in many domains. Further, in many cases the target entities for analysis are actually signals on such graphs. We propose to compare and organize such datasets of graph signals by using an earth mover's distance (EMD) with a geodesic cost over the underlying graph. Typically, EMD is computed by optimizing over the cost of transporting one probability distribution to another over an underlying metric space. However, this is inefficient when computing the EMD between many signals. Here, we propose an unbalanced graph EMD that efficiently embeds the unbalanced EMD on an underlying graph into an $L^1$ space, whose metric we call unbalanced diffusion earth mover's distance (UDEMD). Next, we show how this gives distances between graph signals that are robust to noise. Finally, we apply this to organizing patients based on clinical notes, embedding cells modeled as signals on a gene graph, and organizing genes modeled as signals over a large cell graph. In each case, we show that UDEMD-based embeddings find accurate distances that are highly efficient compared to other methods.

preprint2022arXiv

Exploring the Geometry and Topology of Neural Network Loss Landscapes

Recent work has established clear links between the generalization performance of trained neural networks and the geometry of their loss landscape near the local minima to which they converge. This suggests that qualitative and quantitative examination of the loss landscape geometry could yield insights about neural network generalization performance during training. To this end, researchers have proposed visualizing the loss landscape through the use of simple dimensionality reduction techniques. However, such visualization methods have been limited by their linear nature and only capture features in one or two dimensions, thus restricting sampling of the loss landscape to lines or planes. Here, we expand and improve upon these in three ways. First, we present a novel "jump and retrain" procedure for sampling relevant portions of the loss landscape. We show that the resulting sampled data holds more meaningful information about the network's ability to generalize. Next, we show that non-linear dimensionality reduction of the jump and retrain trajectories via PHATE, a trajectory and manifold-preserving method, allows us to visualize differences between networks that are generalizing well vs poorly. Finally, we combine PHATE trajectories with a computational homology characterization to quantify trajectory differences.

preprint2022arXiv

Is a Modular Architecture Enough?

Inspired from human cognition, machine learning systems are gradually revealing advantages of sparser and more modular architectures. Recent work demonstrates that not only do some modular architectures generalize well, but they also lead to better out-of-distribution generalization, scaling properties, learning speed, and interpretability. A key intuition behind the success of such systems is that the data generating system for most real-world settings is considered to consist of sparsely interacting parts, and endowing models with similar inductive biases will be helpful. However, the field has been lacking in a rigorous quantitative assessment of such systems because these real-world data distributions are complex and unknown. In this work, we provide a thorough assessment of common modular architectures, through the lens of simple and known modular data distributions. We highlight the benefits of modularity and sparsity and reveal insights on the challenges faced while optimizing modular systems. In doing so, we propose evaluation metrics that highlight the benefits of modularity, the regimes in which these benefits are substantial, as well as the sub-optimality of current end-to-end learned modular systems as opposed to their claimed potential.

preprint2022arXiv

On Neural Architecture Inductive Biases for Relational Tasks

Current deep learning approaches have shown good in-distribution generalization performance, but struggle with out-of-distribution generalization. This is especially true in the case of tasks involving abstract relations like recognizing rules in sequences, as we find in many intelligence tests. Recent work has explored how forcing relational representations to remain distinct from sensory representations, as it seems to be the case in the brain, can help artificial systems. Building on this work, we further explore and formalize the advantages afforded by 'partitioned' representations of relations and sensory details, and how this inductive bias can help recompose learned relational structure in newly encountered settings. We introduce a simple architecture based on similarity scores which we name Compositional Relational Network (CoRelNet). Using this model, we investigate a series of inductive biases that ensure abstract relations are learned and represented distinctly from sensory data, and explore their effects on out-of-distribution generalization for a series of relational psychophysics tasks. We find that simple architectural choices can outperform existing models in out-of-distribution generalization. Together, these results show that partitioning relational representations from other information streams may be a simple way to augment existing network architectures' robustness when performing out-of-distribution relational computations.

preprint2021arXiv

Learning shared neural manifolds from multi-subject FMRI data

Functional magnetic resonance imaging (fMRI) is a notoriously noisy measurement of brain activity because of the large variations between individuals, signals marred by environmental differences during collection, and spatiotemporal averaging required by the measurement resolution. In addition, the data is extremely high dimensional, with the space of the activity typically having much lower intrinsic dimension. In order to understand the connection between stimuli of interest and brain activity, and analyze differences and commonalities between subjects, it becomes important to learn a meaningful embedding of the data that denoises, and reveals its intrinsic structure. Specifically, we assume that while noise varies significantly between individuals, true responses to stimuli will share common, low-dimensional features between subjects which are jointly discoverable. Similar approaches have been exploited previously but they have mainly used linear methods such as PCA and shared response modeling (SRM). In contrast, we propose a neural network called MRMD-AE (manifold-regularized multiple decoder, autoencoder), that learns a common embedding from multiple subjects in an experiment while retaining the ability to decode to individual raw fMRI signals. We show that our learned common space represents an extensible manifold (where new points not seen during training can be mapped), improves the classification accuracy of stimulus features of unseen timepoints, as well as improves cross-subject translation of fMRI signals. We believe this framework can be used for many downstream applications such as guided brain-computer interface (BCI) training in the future.

preprint2020arXiv

Advantages of biologically-inspired adaptive neural activation in RNNs during learning

Dynamic adaptation in single-neuron response plays a fundamental role in neural coding in biological neural networks. Yet, most neural activation functions used in artificial networks are fixed and mostly considered as an inconsequential architecture choice. In this paper, we investigate nonlinear activation function adaptation over the large time scale of learning, and outline its impact on sequential processing in recurrent neural networks. We introduce a novel parametric family of nonlinear activation functions, inspired by input-frequency response curves of biological neurons, which allows interpolation between well-known activation functions such as ReLU and sigmoid. Using simple numerical experiments and tools from dynamical systems and information theory, we study the role of neural activation features in learning dynamics. We find that activation adaptation provides distinct task-specific solutions and in some cases, improves both learning speed and performance. Importantly, we find that optimal activation features emerging from our parametric family are considerably different from typical functions used in the literature, suggesting that exploiting the gap between these usual configurations can help learning. Finally, we outline situations where neural activation adaptation alone may help mitigate changes in input statistics in a given task, suggesting mechanisms for transfer learning optimization.

preprint2020arXiv

Internal representation dynamics and geometry in recurrent neural networks

The efficiency of recurrent neural networks (RNNs) in dealing with sequential data has long been established. However, unlike deep, and convolution networks where we can attribute the recognition of a certain feature to every layer, it is unclear what "sub-task" a single recurrent step or layer accomplishes. Our work seeks to shed light onto how a vanilla RNN implements a simple classification task by analysing the dynamics of the network and the geometric properties of its hidden states. We find that early internal representations are evocative of the real labels of the data but this information is not directly accessible to the output layer. Furthermore the network's dynamics and the sequence length are both critical to correct classifications even when there is no additional task relevant information provided.

preprint2020arXiv

On Lyapunov Exponents for RNNs: Understanding Information Propagation Using Dynamical Systems Tools

Recurrent neural networks (RNNs) have been successfully applied to a variety of problems involving sequential data, but their optimization is sensitive to parameter initialization, architecture, and optimizer hyperparameters. Considering RNNs as dynamical systems, a natural way to capture stability, i.e., the growth and decay over long iterates, are the Lyapunov Exponents (LEs), which form the Lyapunov spectrum. The LEs have a bearing on stability of RNN training dynamics because forward propagation of information is related to the backward propagation of error gradients. LEs measure the asymptotic rates of expansion and contraction of nonlinear system trajectories, and generalize stability analysis to the time-varying attractors structuring the non-autonomous dynamics of data-driven RNNs. As a tool to understand and exploit stability of training dynamics, the Lyapunov spectrum fills an existing gap between prescriptive mathematical approaches of limited scope and computationally-expensive empirical approaches. To leverage this tool, we implement an efficient way to compute LEs for RNNs during training, discuss the aspects specific to standard RNN architectures driven by typical sequential datasets, and show that the Lyapunov spectrum can serve as a robust readout of training stability across hyperparameters. With this exposition-oriented contribution, we hope to draw attention to this understudied, but theoretically grounded tool for understanding training stability in RNNs.

preprint2016arXiv

Dynamic signal tracking in a simple V1 spiking model

This work is part of an effort to understand the neural basis for our visual system's ability, or failure, to accurately track moving visual signals. We consider here a ring model of spiking neurons, intended as a simplified computational model of a single hypercolumn of the primary visual cortex. Signals that consist of edges with time-varying orientations localized in space are considered. Our model is calibrated to produce spontaneous and driven firing rates roughly consistent with experiments, and our two main findings, for which we offer dynamical explanation on the level of neuronal interactions, are the following: (1) We have documented consistent transient overshoots in signal perception following signal switches due to emergent interactions of the E- and I-populations, and (2) for continuously moving signals, we have found that accuracy is considerably lower at reversals of orientation than when continuing in the same direction (as when the signal is a rotating bar). To measure performance, we use two metrics, called fidelity and reliability, to compare signals reconstructed by the system to the ones presented, and to assess trial-to-trial variability. We propose that the same population mechanisms responsible for orientation selectivity also impose constraints on dynamic signal tracking that manifest in perception failures consistent with psychophysical observations.

preprint2016arXiv

Revisiting chaos in stimulus-driven spiking networks: signal encoding and discrimination

Highly connected recurrent neural networks often produce chaotic dynamics, meaning their precise activity is sensitive to small perturbations. What are the consequences for how such networks encode streams of temporal stimuli? On the one hand, chaos is a strong source of randomness, suggesting that small changes in stimuli will be obscured by intrinsically generated variability. On the other hand, recent work shows that the type of chaos that occurs in spiking networks can have a surprisingly low-dimensional structure, suggesting that there may be "room" for fine stimulus features to be precisely resolved. Here we show that strongly chaotic networks produce patterned spikes that reliably encode time-dependent stimuli: using a decoder sensitive to spike times on timescales of 10's of ms, one can easily distinguish responses to very similar inputs. Moreover, recurrence serves to distribute signals throughout chaotic networks so that small groups of cells can encode substantial information about signals arriving elsewhere. A conclusion is that the presence of strong chaos in recurrent networks does not prohibit precise stimulus encoding.

preprint2014arXiv

Structured chaos shapes spike-response noise entropy in balanced neural networks

Large networks of sparsely coupled, excitatory and inhibitory cells occur throughout the brain. A striking feature of these networks is that they are chaotic. How does this chaos manifest in the neural code? Specifically, how variable are the spike patterns that such a network produces in response to an input signal? To answer this, we derive a bound for the entropy of multi-cell spike pattern distributions in large recurrent networks of spiking neurons responding to fluctuating inputs. The analysis is based on results from random dynamical systems theory and is complimented by detailed numerical simulations. We find that the spike pattern entropy is an order of magnitude lower than what would be extrapolated from single cells. This holds despite the fact that network coupling becomes vanishingly sparse as network size grows -- a phenomenon that depends on ``extensive chaos," as previously discovered for balanced networks without stimulus drive. Moreover, we show how spike pattern entropy is controlled by temporal features of the inputs. Our findings provide insight into how neural networks may encode stimuli in the presence of inherently chaotic dynamics.

preprint2013arXiv

Chaos and reliability in balanced spiking networks with temporal drive

Biological information processing is often carried out by complex networks of interconnected dynamical units. A basic question about such networks is that of reliability: if the same signal is presented many times with the network in different initial states, will the system entrain to the signal in a repeatable way? Reliability is of particular interest in neuroscience, where large, complex networks of excitatory and inhibitory cells are ubiquitous. These networks are known to autonomously produce strongly chaotic dynamics - an obvious threat to reliability. Here, we show that such chaos persists in the presence of weak and strong stimuli, but that even in the presence of chaos, intermittent periods of highly reliable spiking often coexist with unreliable activity. We elucidate the local dynamical mechanisms involved in this intermittent reliability, and investigate the relationship between this phenomenon and certain time-dependent attractors arising from the dynamics. A conclusion is that chaotic dynamics do not have to be an obstacle to precise spike responses, a fact with implications for signal coding in large networks.

preprint2011arXiv

Shared inputs, entrainment, and desynchrony in elliptic bursters: from slow passage to discontinuous circle maps

What input signals will lead to synchrony vs. desynchrony in a group of biological oscillators? This question connects with both classical dynamical systems analyses of entrainment and phase locking and with emerging studies of stimulation patterns for controlling neural network activity. Here, we focus on the response of a population of uncoupled, elliptically bursting neurons to a common pulsatile input. We extend a phase reduction from the literature to capture inputs of varied strength, leading to a circle map with discontinuities of various orders. In a combined analytical and numerical approach, we apply our results to both a normal form model for elliptic bursting and to a biophysically-based neuron model from the basal ganglia. We find that, depending on the period and amplitude of inputs, the response can either appear chaotic (with provably positive Lyaponov exponent for the associated circle maps), or periodic with a broad range of phase-locked periods. Throughout, we discuss the critical underlying mechanisms, including slow-passage effects through Hopf bifurcation, the role and origin of discontinuities, and the impact of noise

Guillaume Lajoie

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rules

Clarifying MCMC-based training of modern EBMs : Contrastive Divergence versus Maximum Likelihood

Compositional Attention: Disentangling Search and Retrieval

Efficient and robust multi-task learning in the brain with modular latent primitives

Embedding Signals on Knowledge Graphs with Unbalanced Diffusion Earth Mover's Distance

Exploring the Geometry and Topology of Neural Network Loss Landscapes

Is a Modular Architecture Enough?

On Neural Architecture Inductive Biases for Relational Tasks

Learning shared neural manifolds from multi-subject FMRI data

Advantages of biologically-inspired adaptive neural activation in RNNs during learning

Internal representation dynamics and geometry in recurrent neural networks

On Lyapunov Exponents for RNNs: Understanding Information Propagation Using Dynamical Systems Tools

Dynamic signal tracking in a simple V1 spiking model

Revisiting chaos in stimulus-driven spiking networks: signal encoding and discrimination

Structured chaos shapes spike-response noise entropy in balanced neural networks

Chaos and reliability in balanced spiking networks with temporal drive

Shared inputs, entrainment, and desynchrony in elliptic bursters: from slow passage to discontinuous circle maps