Source author record

Yasaman Bahri

Yasaman Bahri appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning cond-mat.str-el Artificial Intelligence cond-mat.dis-nn cond-mat.mes-hall cond-mat.quant-gas Neural and Evolutionary Computing quant-ph

Catalog footprint

What is connected

7works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

There is a previously identified equivalence between wide fully connected neural networks (FCNs) and Gaussian processes (GPs). This equivalence enables, for instance, test set predictions that would have resulted from a fully Bayesian, infinitely wide trained FCN to be computed without ever instantiating the FCN, but by instead evaluating the corresponding GP. In this work, we derive an analogous equivalence for multi-layer convolutional neural networks (CNNs) both with and without pooling layers, and achieve state of the art results on CIFAR10 for GPs without trainable kernels. We also introduce a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible. Surprisingly, in the absence of pooling layers, the GPs corresponding to CNNs with and without weight sharing are identical. As a consequence, translation equivariance, beneficial in finite channel CNNs trained with stochastic gradient descent (SGD), is guaranteed to play no role in the Bayesian treatment of the infinite channel limit - a qualitative difference between the two regimes that is not present in the FCN case. We confirm experimentally, that while in some scenarios the performance of SGD-trained finite CNNs approaches that of the corresponding GPs as the channel count increases, with careful tuning SGD-trained CNNs can significantly outperform their corresponding GPs, suggesting advantages from SGD training compared to fully Bayesian parameter estimation.

preprint2020arXiv

Infinite attention: NNGP and NTK for deep attention networks

There is a growing amount of literature on the relationship between wide neural networks (NNs) and Gaussian processes (GPs), identifying an equivalence between the two for a variety of NN architectures. This equivalence enables, for instance, accurate approximation of the behaviour of wide Bayesian NNs without MCMC or variational approximations, or characterisation of the distribution of randomly initialised wide NNs optimised by gradient descent without ever running an optimiser. We provide a rigorous extension of these results to NNs involving attention layers, showing that unlike single-head attention, which induces non-Gaussian behaviour, multi-head attention architectures behave as GPs as the number of heads tends to infinity. We further discuss the effects of positional encodings and layer normalisation, and propose modifications of the attention mechanism which lead to improved results for both finite and infinitely wide NNs. We evaluate attention kernels empirically, leading to a moderate improvement upon the previous state-of-the-art on CIFAR-10 for GPs without trainable kernels and advanced data preprocessing. Finally, we introduce new features to the Neural Tangents library (Novak et al., 2020) allowing applications of NNGP/NTK models, with and without attention, to variable-length sequences, with an example on the IMDb reviews dataset.

preprint2020arXiv

The large learning rate phase of deep learning: the catapult mechanism

The choice of initial learning rate can have a profound effect on the performance of deep networks. We present a class of neural networks with solvable training dynamics, and confirm their predictions empirically in practical deep learning settings. The networks exhibit sharply distinct behaviors at small and large learning rates. The two regimes are separated by a phase transition. In the small learning rate phase, training can be understood using the existing theory of infinitely wide neural networks. At large learning rates the model captures qualitatively distinct phenomena, including the convergence of gradient descent dynamics to flatter minima. One key prediction of our model is a narrow range of large, stable learning rates. We find good agreement between our model's predictions and training dynamics in realistic deep learning settings. Furthermore, we find that the optimal performance in such settings is often found in the large learning rate phase. We believe our results shed light on characteristics of models trained at different learning rates. In particular, they fill a gap between existing wide neural network theory, and the nonlinear, large learning rate, training dynamics relevant to practice.

preprint2019arXiv

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters. Furthermore, mirroring the correspondence between wide Bayesian neural networks and Gaussian processes, gradient-based training of wide neural networks with a squared loss produces test set predictions drawn from a Gaussian process with a particular compositional kernel. While these theoretical results are only exact in the infinite width limit, we nevertheless find excellent empirical agreement between the predictions of the original network and those of the linearized version even for finite practically-sized networks. This agreement is robust across different architectures, optimization methods, and loss functions.

preprint2014arXiv

Detecting Majorana fermions in quasi-one-dimensional topological phases using nonlocal order parameters

Topological phases which host Majorana fermions can not be identified via local order parameters. We give simple nonlocal order parameters to distinguish quasi-one-dimensional (1D) topological superconductors of spinless fermions, for any interacting model in the absence of time reversal symmetry. These string or "brane" order parameters are natural for measurements in cold atom systems using quantum gas microscopy. We propose them as a way to identify symmetry-protected topological phases of Majorana fermions in cold atom experiments via bulk rather than edge degrees of freedom. Subsequently, we study two-dimensional (2D) topological superconductors via the quasi-1D limit of coupling $N$ identical chains on the cylinder. We classify the symmetric, interacting topological phases protected by the additional $\mathbb{Z}_N$ translation symmetry. The phases include quasi-1D analogs of (i) the $p+ip$ chiral topological superconductor, which can be distinguished up to the 2D Chern number mod 2, and (ii) the 2D weak topological superconductor. We devise general rules for constructing nonlocal order parameters which distinguish the phases. These rules encode the signature of the fermionic topological phase in the symmetry properties of the terminating operators of the nonlocal string or brane. The nonlocal order parameters for some of these phases simply involve a product of the string order parameters for the individual chains. Finally, we give a physical picture of one of the topological phases as a condensate of certain defects, which motivates the form of the nonlocal order parameter and is reminiscent of higher dimensional constructions of topological phases.

preprint2014arXiv

Stable non-Fermi liquid phase of itinerant spin-orbit coupled ferromagnets

Direct coupling between gapless bosons and a Fermi surface results in the destruction of Landau quasiparticles and a breakdown of Fermi liquid theory. Such a non-Fermi liquid phase arises in spin-orbit coupled ferromagnets with spontaneously broken continuous symmetries due to strong coupling between rotational Goldstone modes and itinerant electrons. These systems provide an experimentally accessible context for studying non-Fermi liquid physics. Possible examples include low-density Rashba coupled electron gases, which have a natural tendency towards spontaneous ferromagnetism, or topological insulator surface states with proximity-induced ferromagnetism. Crucially, unlike the related case of a spontaneous nematic distortion of the Fermi surface, for which the non-Fermi liquid regime is expected to be masked by a superconducting dome, we show that the non-Fermi liquid phase in spin-orbit coupled ferromagnets is stable.

preprint2013arXiv

Localization and topology protected quantum coherence at the edge of 'hot' matter

Topological phases are often characterized by special edge states confined near the boundaries by an energy gap in the bulk. On raising temperature, these edge states are lost in a clean system due to mobile thermal excitations. Recently however, it has been established that disorder can localize an isolated many body system, potentially allowing for a sharply defined topological phase even in a highly excited state. Here we show this to be the case for the topological phase of a one dimensional magnet with quenched disorder, which features spin one-half excitations at the edges. The time evolution of a simple, highly excited, initial state is used to reveal quantum coherent edge spins. In particular, we demonstrate, using theoretical arguments and numerical simulation, the coherent revival of an edge spin over a time scale that grows exponentially bigger with system size. This is in sharp contrast to the general expectation that quantum bits strongly coupled to a 'hot' many body system will rapidly lose coherence.

Yasaman Bahri

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

Infinite attention: NNGP and NTK for deep attention networks

The large learning rate phase of deep learning: the catapult mechanism

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Detecting Majorana fermions in quasi-one-dimensional topological phases using nonlocal order parameters

Stable non-Fermi liquid phase of itinerant spin-orbit coupled ferromagnets

Localization and topology protected quantum coherence at the edge of 'hot' matter