Source author record

Ethan Dyer

Ethan Dyer appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

hep-th Machine Learning Computer Vision gr-qc cond-mat.dis-nn cond-mat.str-el Artificial Intelligence Computation and Language cond-mat.stat-mech eess.IV hep-ph

Catalog footprint

What is connected

16works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Solving Quantitative Reasoning Problems with Language Models

Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering problems at the college level. To help close this gap, we introduce Minerva, a large language model pretrained on general natural language data and further trained on technical content. The model achieves state-of-the-art performance on technical benchmarks without the use of external tools. We also evaluate our model on over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences that require quantitative reasoning, and find that the model can correctly answer nearly a third of them.

preprint2021arXiv

When Do Curricula Work?

Inspired by human learning, researchers have proposed ordering examples during training based on their difficulty. Both curriculum learning, exposing a network to easier examples early in training, and anti-curriculum learning, showing the most difficult examples first, have been suggested as improvements to the standard i.i.d. training. In this work, we set out to investigate the relative benefits of ordered learning. We first investigate the \emph{implicit curricula} resulting from architectural and optimization bias and find that samples are learned in a highly consistent order. Next, to quantify the benefit of \emph{explicit curricula}, we conduct extensive experiments over thousands of orderings spanning three kinds of learning: curriculum, anti-curriculum, and random-curriculum -- in which the size of the training dataset is dynamically increased over time, but the examples are randomly ordered. We find that for standard benchmark datasets, curricula have only marginal benefits, and that randomly ordered samples perform as well or better than curricula and anti-curricula, suggesting that any benefit is entirely due to the dynamic training set size. Inspired by common use cases of curriculum learning in practice, we investigate the role of limited training time budget and noisy data in the success of curriculum learning. Our experiments demonstrate that curriculum, but not anti-curriculum can indeed improve the performance either with limited training time budget or in existence of noisy data.

preprint2020arXiv

Affinity and Diversity: Quantifying Mechanisms of Data Augmentation

Though data augmentation has become a standard component of deep neural network training, the underlying mechanism behind the effectiveness of these techniques remains poorly understood. In practice, augmentation policies are often chosen using heuristics of either distribution shift or augmentation diversity. Inspired by these, we seek to quantify how data augmentation improves model generalization. To this end, we introduce interpretable and easy-to-compute measures: Affinity and Diversity. We find that augmentation performance is predicted not by either of these alone but by jointly optimizing the two.

preprint2020arXiv

Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics

A central challenge in developing versatile machine learning systems is catastrophic forgetting: a model trained on tasks in sequence will suffer significant performance drops on earlier tasks. Despite the ubiquity of catastrophic forgetting, there is limited understanding of the underlying process and its causes. In this paper, we address this important knowledge gap, investigating how forgetting affects representations in neural network models. Through representational analysis techniques, we find that deeper layers are disproportionately the source of forgetting. Supporting this, a study of methods to mitigate forgetting illustrates that they act to stabilize deeper layers. These insights enable the development of an analytic argument and empirical picture relating the degree of forgetting to representational similarity between tasks. Consistent with this picture, we observe maximal forgetting occurs for task sequences with intermediate similarity. We perform empirical studies on the standard split CIFAR-10 setup and also introduce a novel CIFAR-100 based task approximating realistic input distribution shift.

preprint2020arXiv

Asymptotics of Wide Convolutional Neural Networks

Wide neural networks have proven to be a rich class of architectures for both theory and practice. Motivated by the observation that finite width convolutional networks appear to outperform infinite width networks, we study scaling laws for wide CNNs and networks with skip connections. Following the approach of (Dyer & Gur-Ari, 2019), we present a simple diagrammatic recipe to derive the asymptotic width dependence for many quantities of interest. These scaling relationships provide a solvable description for the training dynamics of wide convolutional networks. We test these relations across a broad range of architectures. In particular, we find that the difference in performance between finite and infinite width models vanishes at a definite rate with respect to model width. Nonetheless, this relation is consistent with finite width models generalizing either better or worse than their infinite width counterparts, and we provide examples where the relative performance depends on the optimization details.

preprint2020arXiv

The large learning rate phase of deep learning: the catapult mechanism

The choice of initial learning rate can have a profound effect on the performance of deep networks. We present a class of neural networks with solvable training dynamics, and confirm their predictions empirically in practical deep learning settings. The networks exhibit sharply distinct behaviors at small and large learning rates. The two regimes are separated by a phase transition. In the small learning rate phase, training can be understood using the existing theory of infinitely wide neural networks. At large learning rates the model captures qualitatively distinct phenomena, including the convergence of gradient descent dynamics to flatter minima. One key prediction of our model is a narrow range of large, stable learning rates. We find good agreement between our model's predictions and training dynamics in realistic deep learning settings. Furthermore, we find that the optimal performance in such settings is often found in the large learning rate phase. We believe our results shed light on characteristics of models trained at different learning rates. In particular, they fill a gap between existing wide neural network theory, and the nonlinear, large learning rate, training dynamics relevant to practice.

preprint2016arXiv

Linking dynamical heterogeneity to static amorphous order

Glass-forming liquids grow dramatically sluggish upon cooling. This slowdown has long been thought to be accompanied by a growing correlation length. Characteristic dynamical and static length scales, however, have been observed to grow at different rates, which perplexes the relationship between the two and with the slowdown. Here, we show the existence of a direct link between dynamical sluggishness and static point-to-set correlations, holding at the local level as we probe different environments within a liquid. This link, which is stronger and more general than that observed with locally preferred structures, suggests the existence of an intimate relationship between structure and dynamics in a broader range of glass-forming liquids than previously thought.

preprint2016arXiv

Scaling dimensions of monopole operators in the $\mathbb{CP}^{N_b - 1}$ theory in $2+1$ dimensions

We study monopole operators at the conformal critical point of the $\mathbb{CP}^{N_b - 1}$ theory in $2+1$ spacetime dimensions. Using the state-operator correspondence and a saddle point approximation, we compute the scaling dimensions of these operators to next-to-leading order in $1/N_b$. We find remarkable agreement between our results and numerical studies of quantum antiferromagnets on two-dimensional lattices with SU($N_b$) global symmetry, using the mapping of the monopole operators to valence bond solid order parameters of the lattice antiferromagnet.

preprint2016arXiv

Small Black Holes and Near-Extremal CFTs

Pure theories of AdS$_3$ quantum gravity are conjectured to be dual to CFTs with sparse spectra of light primary operators. The sparsest possible spectrum consistent with modular invariance includes only black hole states above the vacuum. Witten conjectured the existence of a family of extremal CFTs, which realize this spectrum for all admissible values of the central charge. We consider the quantum corrections to the classical spectrum, and propose a specific modification of Witten's conjecture which takes into account the existence of "small" black hole states. These have zero classical horizon area, with a calculable entropy attributed solely to loop effects. Our conjecture passes various consistency checks, especially when generalized to include theories with supersymmetry. In theories with $\mathcal{N}=2$ supersymmetry, this "near-extremal CFT" proposal precisely evades the no-go results of Gaberdiel et al.

preprint2016arXiv

Universal Bounds on Charged States in 2d CFT and 3d Gravity

We derive an explicit bound on the dimension of the lightest charged state in two dimensional conformal field theories with a global abelian symmetry. We find that the bound scales with $c$ and provide examples that parametrically saturate this bound. We also prove than any such theory must contain a state with charge-to-mass ratio above a minimal lower bound. We comment on the implications for charged states in three dimensional theories of gravity.

preprint2015arXiv

An Extremal N=2 Superconformal Field Theory

We provide an example of an extremal chiral ${\cal N}=2$ superconformal field theory at $c=24$. The construction is based on a ${\mathbb Z}_2$ orbifold of the theory associated to the $A_{1}^{24}$ Niemeier lattice. The statespace is governed by representations of the sporadic group $M_{23}$.

preprint2014arXiv

Critical Exponents for Supercooled Liquids

We compute critical exponents governing universal features of supercooled liquids through the effective theory of an overlap field. The correlation length diverges with the Ising exponent; the size of dynamically heterogeneous patches grows more rapidly; and the relaxation time obeys a generalized Vogel-Fulcher-Tammann relation.

preprint2014arXiv

Super-Rényi Entropy & Wilson Loops for N=4 SYM and their Gravity Duals

We compute the supersymmetric Rényi entropies across a spherical entanglement surface in N=4 SU(N) SYM theory using localization on the four-dimensional ellipsoid. We extract the leading result at large N and λ, and match its universal part to a gravity calculation involving a hyperbolically sliced supersymmetric black hole solution of N=4+ SU(2) X U(1) gauged supergravity in five dimensions. We repeat the analysis in the presence of a Wilson loop insertion and find again a perfect match with the dual string theory. Understanding the Wilson loop operator requires knowledge of the full ten-dimensional IIB supergravity solution which we elaborate upon.

preprint2013arXiv

GLSMs for non-Kahler Geometries

We identify a simple mechanism by which H-flux satisfying the modified Bianchi identity arises in garden-variety (0,2) gauged linear sigma models. Taking suitable limits leads to effective gauged linear sigma models with Green-Schwarz anomaly cancellation. We test the quantum-consistency of a class of such effective theories by constructing an off-shell superconformal algebra, providing evidence that these models run to good CFTs in the deep IR.

preprint2013arXiv

Monopole Taxonomy in Three-Dimensional Conformal Field Theories

We study monopole operators at the infrared fixed points of Abelian and non-Abelian gauge theories with N_f fermion flavors in three dimensions. At large N_f, independent monopole operators can be defined via the state-operator correspondence only for stable monopole backgrounds. In Abelian theories, every monopole background is stable. In the non-Abelian case, we find that many (but not all) backgrounds are stable in each topological class. We calculate the infrared scaling dimensions of the corresponding operators through next-to-leading order in 1/N_f. In the case of U(N_c) QCD with N_f fundamental fermions (and in particular in the QED case, N_c =1), we find that the monopole operators transform as non-trivial irreducible representations of the SU(N_f) flavor symmetry group.

preprint2009arXiv

Boundary Terms, Variational Principles and Higher Derivative Modified Gravity

We discuss the criteria that must be satisfied by a well-posed variational principle. We clarify the role of Gibbons-Hawking-York type boundary terms in the actions of higher derivative models of gravity, such as F(R) gravity, and argue that the correct boundary terms are the naive ones obtained though the correspondence with scalar-tensor theory, despite the fact that variations of normal derivatives of the metric must be fixed on the boundary. We show in the case of F(R) gravity that these boundary terms reproduce the correct ADM energy in the hamiltonian formalism, and the correct entropy for black holes in the semi-classical approximation.

Ethan Dyer

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

Solving Quantitative Reasoning Problems with Language Models

When Do Curricula Work?

Affinity and Diversity: Quantifying Mechanisms of Data Augmentation

Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics

Asymptotics of Wide Convolutional Neural Networks

The large learning rate phase of deep learning: the catapult mechanism

Linking dynamical heterogeneity to static amorphous order

Scaling dimensions of monopole operators in the $\mathbb{CP}^{N_b - 1}$ theory in $2+1$ dimensions

Small Black Holes and Near-Extremal CFTs

Universal Bounds on Charged States in 2d CFT and 3d Gravity

An Extremal N=2 Superconformal Field Theory

Critical Exponents for Supercooled Liquids

Super-Rényi Entropy & Wilson Loops for N=4 SYM and their Gravity Duals

GLSMs for non-Kahler Geometries

Monopole Taxonomy in Three-Dimensional Conformal Field Theories

Boundary Terms, Variational Principles and Higher Derivative Modified Gravity