Source author record

David J. Schwab

David J. Schwab appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.stat-mech Biological Physics cond-mat.dis-nn Machine Learning Neurons and Cognition Molecular Networks Cell Behavior Neural and Evolutionary Computing Quantitative Methods Biomolecules cond-mat.soft math.DS nlin.CD nlin.PS physics.data-an Populations and Evolution

Catalog footprint

What is connected

17works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Inferring couplings in networks across order-disorder phase transitions

Statistical inference is central to many scientific endeavors, yet how it works remains unresolved. Answering this requires a quantitative understanding of the intrinsic interplay between statistical models, inference methods and data structure. To this end, we characterize the efficacy of direct coupling analysis (DCA)--a highly successful method for analyzing amino acid sequence data--in inferring pairwise interactions from samples of ferromagnetic Ising models on random graphs. Our approach allows for physically motivated exploration of qualitatively distinct data regimes separated by phase transitions. We show that inference quality depends strongly on the nature of generative models: optimal accuracy occurs at an intermediate temperature where the detrimental effects from macroscopic order and thermal noise are minimal. Importantly our results indicate that DCA does not always outperform its local-statistics-based predecessors; while DCA excels at low temperatures, it becomes inferior to simple correlation thresholding at virtually all temperatures when data are limited. Our findings offer new insights into the regime in which DCA operates so successfully and more broadly how inference interacts with data structure.

preprint2020arXiv

Energy consumption and cooperation for optimal sensing

The reliable detection of environmental molecules in the presence of noise is an important cellular function, yet the underlying computational mechanisms are not well understood. We introduce a model of two interacting sensors which allows for the principled exploration of signal statistics, cooperation strategies and the role of energy consumption in optimal sensing, quantified through the mutual information between the signal and the sensors. Here we report that in general the optimal sensing strategy depends both on the noise level and the statistics of the signals. For joint, correlated signals, energy consuming (nonequilibrium), asymmetric couplings result in maximum information gain in the low-noise, high-signal-correlation limit. Surprisingly we also find that energy consumption is not always required for optimal sensing. We generalise our model to incorporate time integration of the sensor state by a population of readout molecules, and demonstrate that sensor interaction and energy consumption remain important for optimal sensing.

preprint2020arXiv

Gating creates slow modes and controls phase-space complexity in GRUs and LSTMs

Recurrent neural networks (RNNs) are powerful dynamical models for data with complex temporal structure. However, training RNNs has traditionally proved challenging due to exploding or vanishing of gradients. RNN models such as LSTMs and GRUs (and their variants) significantly mitigate these issues associated with training by introducing various types of gating units into the architecture. While these gates empirically improve performance, how the addition of gates influences the dynamics and trainability of GRUs and LSTMs is not well understood. Here, we take the perspective of studying randomly initialized LSTMs and GRUs as dynamical systems, and ask how the salient dynamical properties are shaped by the gates. We leverage tools from random matrix theory and mean-field theory to study the state-to-state Jacobians of GRUs and LSTMs. We show that the update gate in the GRU and the forget gate in the LSTM can lead to an accumulation of slow modes in the dynamics. Moreover, the GRU update gate can poise the system at a marginally stable point. The reset gate in the GRU and the output and input gates in the LSTM control the spectral radius of the Jacobian, and the GRU reset gate also modulates the complexity of the landscape of fixed-points. Furthermore, for the GRU we obtain a phase diagram describing the statistical properties of fixed-points. We also provide a preliminary comparison of training performance to the various dynamical regimes realized by varying hyperparameters. Looking to the future, we have introduced a powerful set of techniques which can be adapted to a broad class of RNNs, to study the influence of various architectural choices on dynamics, and potentially motivate the principled discovery of novel architectures.

preprint2020arXiv

Superlinear Precision and Memory in Simple Population Codes

The brain constructs population codes to represent stimuli through widely distributed patterns of activity across neurons. An important figure of merit of population codes is how much information about the original stimulus can be decoded from them. Fisher information is widely used to quantify coding precision and specify optimal codes, because of its relationship to mean squared error (MSE) under certain assumptions. When neural firing is sparse, however, optimizing Fisher information can result in codes that are highly sub-optimal in terms of MSE. We find that this discrepancy arises from the non-local component of error not accounted for by the Fisher information. Using this insight, we construct optimal population codes by directly minimizing the MSE. We study the scaling properties of MSE with coding parameters, focusing on the tuning curve width. We find that the optimal tuning curve width for coding no longer scales as the inverse population size, and the quadratic scaling of precision with system size predicted by Fisher information alone no longer holds. However, superlinearity is still preserved with only a logarithmic slowdown. We derive analogous results for networks storing the memory of a stimulus through continuous attractor dynamics, and show that similar scaling properties optimize memory and representation.

preprint2020arXiv

The Early Phase of Neural Network Training

Recent studies have shown that many important aspects of neural network learning take place within the very earliest iterations or epochs of training. For example, sparse, trainable sub-networks emerge (Frankle et al., 2019), gradient descent moves into a small subspace (Gur-Ari et al., 2018), and the network undergoes a critical period (Achille et al., 2019). Here, we examine the changes that deep neural networks undergo during this early phase of training. We perform extensive measurements of the network state during these early iterations of training and leverage the framework of Frankle et al. (2019) to quantitatively probe the weight distribution and its reliance on various aspects of the dataset. We find that, within this framework, deep networks are not robust to reinitializing with random weights while maintaining signs, and that weight distributions are highly non-independent even after only a few hundred iterations. Despite this behavior, pre-training with blurred inputs or an auxiliary self-supervised task can approximate the changes in supervised networks, suggesting that these changes are not inherently label-dependent, though labels significantly accelerate this process. Together, these results help to elucidate the network changes occurring during this pivotal initial period of learning.

preprint2016arXiv

Comment on "Why does deep and cheap learning work so well?" [arXiv:1608.08225]

In a recent paper, "Why does deep and cheap learning work so well?", Lin and Tegmark claim to show that the mapping between deep belief networks and the variational renormalization group derived in [arXiv:1410.3831] is invalid, and present a "counterexample" that claims to show that this mapping does not hold. In this comment, we show that these claims are incorrect and stem from a misunderstanding of the variational RG procedure proposed by Kadanoff. We also explain why the "counterexample" of Lin and Tegmark is compatible with the mapping proposed in [arXiv:1410.3831].

preprint2015arXiv

Landauer in the age of synthetic biology: energy consumption and information processing in biochemical networks

A central goal of synthetic biology is to design sophisticated synthetic cellular circuits that can perform complex computations and information processing tasks in response to specific inputs. The tremendous advances in our ability to understand and manipulate cellular information processing networks raises several fundamental physics questions: How do the molecular components of cellular circuits exploit energy consumption to improve information processing? Can one utilize ideas from thermodynamics to improve the design of synthetic cellular circuits and modules? Here, we summarize recent theoretical work addressing these questions. Energy consumption in cellular circuits serves five basic purposes: (1) increasing specificity, (2) manipulating dynamics, (3) reducing variability, (4) amplifying signal, and (5) erasing memory. We demonstrate these ideas using several simple examples and discuss the implications of these theoretical ideas for the emerging field of synthetic biology. We conclude by discussing how it may be possible to overcome these limitations using "post-translational" synthetic biology that exploits reversible protein modification.

preprint2014arXiv

A binary Hopfield network with $1/\log(n)$ information rate and applications to grid cell decoding

A Hopfield network is an auto-associative, distributive model of neural memory storage and retrieval. A form of error-correcting code, the Hopfield network can learn a set of patterns as stable points of the network dynamic, and retrieve them from noisy inputs -- thus Hopfield networks are their own decoders. Unlike in coding theory, where the information rate of a good code (in the Shannon sense) is finite but the cost of decoding does not play a role in the rate, the information rate of Hopfield networks trained with state-of-the-art learning algorithms is of the order ${\log(n)}/{n}$, a quantity that tends to zero asymptotically with $n$, the number of neurons in the network. For specially constructed networks, the best information rate currently achieved is of order ${1}/{\sqrt{n}}$. In this work, we design simple binary Hopfield networks that have asymptotically vanishing error rates at an information rate of ${1}/{\log(n)}$. These networks can be added as the decoders of any neural code with noisy neurons. As an example, we apply our network to a binary neural decoder of the grid cell code to attain information rate ${1}/{\log(n)}$.

preprint2014arXiv

An exact mapping between the Variational Renormalization Group and Deep Learning

Deep learning is a broad set of techniques that uses multiple layers of representation to automatically learn relevant features directly from structured data. Recently, such techniques have yielded record-breaking results on a diverse set of difficult machine learning tasks in computer vision, speech recognition, and natural language processing. Despite the enormous success of deep learning, relatively little is understood theoretically about why these techniques are so successful at feature learning and compression. Here, we show that deep learning is intimately related to one of the most important and successful techniques in theoretical physics, the renormalization group (RG). RG is an iterative coarse-graining scheme that allows for the extraction of relevant features (i.e. operators) as a physical system is examined at different length scales. We construct an exact mapping from the variational renormalization group, first introduced by Kadanoff, and deep learning architectures based on Restricted Boltzmann Machines (RBMs). We illustrate these ideas using the nearest-neighbor Ising Model in one and two-dimensions. Our results suggests that deep learning algorithms may be employing a generalized RG-like scheme to learn relevant features from data.

preprint2014arXiv

From Intracellular Signaling to Population Oscillations: Bridging Scales in Collective Behavior

Collective behavior in cellular populations is coordinated by biochemical signaling networks within individual cells. Connecting the dynamics of these intracellular networks to the population phenomena they control poses a considerable challenge because of network complexity and our limited knowledge of kinetic parameters. However, from physical systems we know that behavioral changes in the individual constituents of a collectively-behaving system occur in a limited number of well-defined classes, and these can be described using simple models. Here we apply such an approach to the emergence of collective oscillations in cellular populations of the social amoeba Dictyostelium discoideum. Through direct tests of our model with quantitative in vivo measurements of single-cell and population signaling dynamics, we show how a simple model can effectively describe a complex molecular signaling network and its effects at multiple size and temporal scales. The model predicts novel noise-driven single-cell and population-level signaling phenomena that we then experimentally observe. Our results suggest that like physical systems, collective behavior in biology may be universal and described using simple mathematical models.

preprint2014arXiv

Quantifying the role of population subdivision in evolution on rugged fitness landscapes

Natural selection drives populations towards higher fitness, but crossing fitness valleys or plateaus may facilitate progress up a rugged fitness landscape involving epistasis. We investigate quantitatively the effect of subdividing an asexual population on the time it takes to cross a fitness valley or plateau. We focus on a generic and minimal model that includes only population subdivision into equivalent demes connected by global migration, and does not require significant size changes of the demes, environmental heterogeneity or specific geographic structure. We determine the optimal speedup of valley or plateau crossing that can be gained by subdivision, if the process is driven by the deme that crosses fastest. We show that isolated demes have to be in the sequential fixation regime for subdivision to significantly accelerate crossing. Using Markov chain theory, we obtain analytical expressions for the conditions under which optimal speedup is achieved: valley or plateau crossing by the subdivided population is then as fast as that of its fastest deme. We verify our analytical predictions through stochastic simulations. We demonstrate that subdivision can substantially accelerate the crossing of fitness valleys and plateaus in a wide range of parameters extending beyond the optimal window. We study the effect of varying the degree of subdivision of a population, and investigate the trade-off between the magnitude of the optimal speedup and the width of the parameter range over which it occurs. Our results also hold for weakly beneficial intermediate mutations. We extend our work to the case of a population connected by migration to one or several smaller islands. Our results demonstrate that subdivision with migration alone can significantly accelerate the crossing of fitness valleys and plateaus, and shed light onto the quantitative conditions necessary for this to occur.

preprint2014arXiv

Zipf's law and criticality in multivariate data without fine-tuning

The joint probability distribution of many degrees of freedom in biological systems, such as firing patterns in neural networks or antibody sequence composition in zebrafish, often follow Zipf's law, where a power law is observed on a rank-frequency plot. This behavior has recently been shown to imply that these systems reside near to a unique critical point where the extensive parts of the entropy and energy are exactly equal. Here we show analytically, and via numerical simulations, that Zipf-like probability distributions arise naturally if there is an unobserved variable (or variables) that affects the system, e. g. for neural networks an input stimulus that causes individual neurons in the network to fire at time-varying rates. In statistics and machine learning, these models are called latent-variable or mixture models. Our model shows that no fine-tuning is required, i.e. Zipf's law arises generically without tuning parameters to a point, and gives insight into the ubiquity of Zipf's law in a wide range of systems.

preprint2012arXiv

The Energetic Costs of Cellular Computation

Cells often perform computations in response to environmental cues. A simple example is the classic problem, first considered by Berg and Purcell, of determining the concentration of a chemical ligand in the surrounding media. On general theoretical grounds (Landuer's Principle), it is expected that such computations require cells to consume energy. Here, we explicitly calculate the energetic costs of computing ligand concentration for a simple two-component cellular network that implements a noisy version of the Berg-Purcell strategy. We show that learning about external concentrations necessitates the breaking of detailed balance and consumption of energy, with greater learning requiring more energy. Our calculations suggest that the energetic costs of cellular computation may be an important constraint on networks designed to function in resource poor environments such as the spore germination networks of bacteria.

preprint2011arXiv

Kuramoto model with coupling through an external medium

Synchronization of coupled oscillators is often described using the Kuramoto model. Here we study a generalization of the Kuramoto model where oscillators communicate with each other through an external medium. This generalized model exhibits interesting new phenomena such as bistability between synchronization and incoherence and a qualitatively new form of synchronization where the external medium exhibits small-amplitude oscillations. We conclude by discussing the relationship of the model to other variations of the Kuramoto model including the Kuramoto model with a bimodal frequency distribution and the Millennium Bridge problem.

preprint2010arXiv

Dynamical quorum-sensing and synchronization of nonlinear oscillators coupled through an external medium

Many biological and physical systems exhibit population-density dependent transitions to synchronized oscillations in a process often termed "dynamical quorum sensing". Synchronization frequently arises through chemical communication via signaling molecules distributed through an external media. We study a simple theoretical model for dynamical quorum sensing: a heterogenous population of limit-cycle oscillators diffusively coupled through a common media. We show that this model exhibits a rich phase diagram with four qualitatively distinct mechanisms fueling population-dependent transitions to global oscillations, including a new type of transition we term "dynamic death". We derive a single pair of analytic equations that allows us to calculate all phase boundaries as a function of population density and show that the model reproduces many of the qualitative features of recent experiments of BZ catalytic particles as well as synthetically engineered bacteria.

preprint2009arXiv

Statistical Mechanics of Integral Membrane Protein Assembly

During the synthesis of integral membrane proteins (IMPs), the hydrophobic amino acids of the polypeptide sequence are partitioned mostly into the membrane interior and hydrophilic amino acids mostly into the aqueous exterior. We analyze the minimum free energy state of polypeptide sequences partitioned into alpha-helical transmembrane (TM) segments and the role of thermal fluctuations using a many-body statistical mechanics model. Results suggest that IMP TM segment partitioning shares important features with general theories of protein folding. For random polypeptide sequences, the minimum free energy state at room temperature is characterized by fluctuations in the number of TM segments with very long relaxation times. Simple assembly scenarios do not produce a unique number of TM segments and jamming phenomena interfere with segment placement. For sequences corresponding to IMPs, the minimum free energy structure with the wildtype number of segments is free of number fluctuations due to an anomalous gap in the energy spectrum, and simple assembly scenarios produce this structure. There is a threshold number of random point mutations beyond which the size of this gap is reduced so that the wildtype groundstate is destabilized and number fluctuations reappear.

preprint2008arXiv

Rhythmogenic neuronal networks, pacemakers, and k-cores

Neuronal networks are controlled by a combination of the dynamics of individual neurons and the connectivity of the network that links them together. We study a minimal model of the preBotzinger complex, a small neuronal network that controls the breathing rhythm of mammals through periodic firing bursts. We show that the properties of a such a randomly connected network of identical excitatory neurons are fundamentally different from those of uniformly connected neuronal networks as described by mean-field theory. We show that (i) the connectivity properties of the networks determines the location of emergent pacemakers that trigger the firing bursts and (ii) that the collective desensitization that terminates the firing bursts is determined again by the network connectivity, through k-core clusters of neurons.

David J. Schwab

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Inferring couplings in networks across order-disorder phase transitions

Energy consumption and cooperation for optimal sensing

Gating creates slow modes and controls phase-space complexity in GRUs and LSTMs

Superlinear Precision and Memory in Simple Population Codes

The Early Phase of Neural Network Training

Comment on "Why does deep and cheap learning work so well?" [arXiv:1608.08225]

Landauer in the age of synthetic biology: energy consumption and information processing in biochemical networks

A binary Hopfield network with $1/\log(n)$ information rate and applications to grid cell decoding

An exact mapping between the Variational Renormalization Group and Deep Learning

From Intracellular Signaling to Population Oscillations: Bridging Scales in Collective Behavior

Quantifying the role of population subdivision in evolution on rugged fitness landscapes

Zipf's law and criticality in multivariate data without fine-tuning

The Energetic Costs of Cellular Computation

Kuramoto model with coupling through an external medium

Dynamical quorum-sensing and synchronization of nonlinear oscillators coupled through an external medium

Statistical Mechanics of Integral Membrane Protein Assembly

Rhythmogenic neuronal networks, pacemakers, and k-cores