Source author record

Jacob A. Zavatone-Veth

Jacob A. Zavatone-Veth appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning cond-mat.dis-nn Neurons and Cognition

Catalog footprint

What is connected

6works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Contrasting random and learned features in deep Bayesian linear regression

Understanding how feature learning affects generalization is among the foremost goals of modern deep learning theory. Here, we study how the ability to learn representations affects the generalization performance of a simple class of models: deep Bayesian linear neural networks trained on unstructured Gaussian data. By comparing deep random feature models to deep networks in which all layers are trained, we provide a detailed characterization of the interplay between width, depth, data density, and prior mismatch. We show that both models display sample-wise double-descent behavior in the presence of label noise. Random feature models can also display model-wise double-descent if there are narrow bottleneck layers, while deep networks do not show these divergences. Random feature models can have particular widths that are optimal for generalization at a given data density, while making neural networks as wide or as narrow as possible is always optimal. Moreover, we show that the leading-order correction to the kernel-limit learning curve cannot distinguish between random feature models and deep networks in which all layers are trained. Taken together, our findings begin to elucidate how architectural details affect generalization performance in this simple class of deep regression models.

preprint2022arXiv

On neural network kernels and the storage capacity problem

In this short note, we reify the connection between work on the storage capacity problem in wide two-layer treelike neural networks and the rapidly-growing body of literature on kernel limits of wide neural networks. Concretely, we observe that the "effective order parameter" studied in the statistical mechanics literature is exactly equivalent to the infinite-width Neural Network Gaussian Process Kernel. This correspondence connects the expressivity and trainability of wide two-layer neural networks.

preprint2021arXiv

Activation function dependence of the storage capacity of treelike neural networks

The expressive power of artificial neural networks crucially depends on the nonlinearity of their activation functions. Though a wide variety of nonlinear activation functions have been proposed for use in artificial neural networks, a detailed understanding of their role in determining the expressive power of a network has not emerged. Here, we study how activation functions affect the storage capacity of treelike two-layer networks. We relate the boundedness or divergence of the capacity in the infinite-width limit to the smoothness of the activation function, elucidating the relationship between previously studied special cases. Our results show that nonlinearity can both increase capacity and decrease the robustness of classification, and provide simple estimates for the capacity of networks with several commonly used activation functions. Furthermore, they generate a hypothesis for the functional benefit of dendritic spikes in branched neurons.

preprint2021arXiv

Depth induces scale-averaging in overparameterized linear Bayesian neural networks

Inference in deep Bayesian neural networks is only fully understood in the infinite-width limit, where the posterior flexibility afforded by increased depth washes out and the posterior predictive collapses to a shallow Gaussian process. Here, we interpret finite deep linear Bayesian neural networks as data-dependent scale mixtures of Gaussian process predictors across output channels. We leverage this observation to study representation learning in these networks, allowing us to connect limiting results obtained in previous studies within a unified framework. In total, these results advance our analytical understanding of how depth affects inference in a simple class of Bayesian neural networks.

preprint2021arXiv

Exact marginal prior distributions of finite Bayesian neural networks

Bayesian neural networks are theoretically well-understood only in the infinite-width limit, where Gaussian priors over network weights yield Gaussian priors over network outputs. Recent work has suggested that finite Bayesian networks may outperform their infinite counterparts, but their non-Gaussian function space priors have been characterized only though perturbative approaches. Here, we derive exact solutions for the function space priors for individual input examples of a class of finite fully-connected feedforward Bayesian neural networks. For deep linear networks, the prior has a simple expression in terms of the Meijer $G$-function. The prior of a finite ReLU network is a mixture of the priors of linear networks of smaller widths, corresponding to different numbers of active units in each layer. Our results unify previous descriptions of finite network priors in terms of their tail decay and large-width behavior.

preprint2021arXiv

Parallel locomotor control strategies in mice and flies

Our understanding of the neural basis of locomotor behavior can be informed by careful quantification of animal movement. Classical descriptions of legged locomotion have defined discrete locomotor gaits, characterized by distinct patterns of limb movement. Recent technical advances have enabled increasingly detailed characterization of limb kinematics across many species, imposing tighter constraints on neural control. Here, we highlight striking similarities between coordination patterns observed in two genetic model organisms: the laboratory mouse and Drosophila. Both species exhibit continuously-variable coordination patterns with similar low-dimensional structure, suggesting shared principles for limb coordination and descending neural control.

Jacob A. Zavatone-Veth

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Contrasting random and learned features in deep Bayesian linear regression

On neural network kernels and the storage capacity problem

Activation function dependence of the storage capacity of treelike neural networks

Depth induces scale-averaging in overparameterized linear Bayesian neural networks

Exact marginal prior distributions of finite Bayesian neural networks

Parallel locomotor control strategies in mice and flies