Source author record

Guy Gur-Ari

Guy Gur-Ari appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

hep-th Machine Learning hep-ph cond-mat.str-el Artificial Intelligence Computation and Language hep-lat Neural and Evolutionary Computing

Catalog footprint

What is connected

15works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Solving Quantitative Reasoning Problems with Language Models

Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering problems at the college level. To help close this gap, we introduce Minerva, a large language model pretrained on general natural language data and further trained on technical content. The model achieves state-of-the-art performance on technical benchmarks without the use of external tools. We also evaluate our model on over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences that require quantitative reasoning, and find that the model can correctly answer nearly a third of them.

preprint2021arXiv

On the training dynamics of deep networks with $L_2$ regularization

We study the role of $L_2$ regularization in deep learning, and uncover simple relations between the performance of the model, the $L_2$ coefficient, the learning rate, and the number of training steps. These empirical relations hold when the network is overparameterized. They can be used to predict the optimal regularization parameter of a given model. In addition, based on these observations we propose a dynamical schedule for the regularization parameter that improves performance and speeds up training. We test these proposals in modern image classification settings. Finally, we show that these empirical relations can be understood theoretically in the context of infinitely wide networks. We derive the gradient flow dynamics of such networks, and compare the role of $L_2$ regularization in this context with that of linear models.

preprint2021arXiv

Show Your Work: Scratchpads for Intermediate Computation with Language Models

Large pre-trained language models perform remarkably well on tasks that can be done "in one pass", such as generating realistic text or synthesizing computer programs. However, they struggle with tasks that require unbounded multi-step computation, such as adding integers or executing programs. Surprisingly, we find that these same models are able to perform complex multi-step computations -- even in the few-shot regime -- when asked to perform the operation "step by step", showing the results of intermediate computations. In particular, we train transformers to perform multi-step computations by asking them to emit intermediate computation steps into a "scratchpad". On a series of increasingly complex tasks ranging from long addition to the execution of arbitrary programs, we show that scratchpads dramatically improve the ability of language models to perform multi-step computations.

preprint2020arXiv

On the asymptotics of wide networks with polynomial activations

We consider an existing conjecture addressing the asymptotic behavior of neural networks in the large width limit. The results that follow from this conjecture include tight bounds on the behavior of wide networks during stochastic gradient descent, and a derivation of their finite-width dynamics. We prove the conjecture for deep networks with polynomial activation functions, greatly extending the validity of these results. Finally, we point out a difference in the asymptotic behavior of networks with analytic (and non-linear) activation functions and those with piecewise-linear activations such as ReLU.

preprint2020arXiv

The large learning rate phase of deep learning: the catapult mechanism

The choice of initial learning rate can have a profound effect on the performance of deep networks. We present a class of neural networks with solvable training dynamics, and confirm their predictions empirically in practical deep learning settings. The networks exhibit sharply distinct behaviors at small and large learning rates. The two regimes are separated by a phase transition. In the small learning rate phase, training can be understood using the existing theory of infinitely wide neural networks. At large learning rates the model captures qualitatively distinct phenomena, including the convergence of gradient descent dynamics to flatter minima. One key prediction of our model is a narrow range of large, stable learning rates. We find good agreement between our model's predictions and training dynamics in realistic deep learning settings. Furthermore, we find that the optimal performance in such settings is often found in the large learning rate phase. We believe our results shed light on characteristics of models trained at different learning rates. In particular, they fill a gap between existing wide neural network theory, and the nonlinear, large learning rate, training dynamics relevant to practice.

preprint2016arXiv

Chaos in Classical D0-Brane Mechanics

We study chaos in the classical limit of the matrix quantum mechanical system describing D0-brane dynamics. We determine a precise value of the largest Lyapunov exponent, and, with less precision, calculate the entire spectrum of Lyapunov exponents. We verify that these approach a smooth limit as $N \rightarrow \infty$. We show that a classical analog of scrambling occurs with fast scrambling scaling, $t_* \sim \log S$. These results confirm the k-locality property of matrix mechanics discussed by Sekino and Susskind.

preprint2016arXiv

Transport in Chern-Simons-Matter Theories

The frequency-dependent longitudinal and Hall conductivities --- $σ_{xx}$ and $σ_{xy}$ --- are dimensionless functions of $ω/T$ in 2+1 dimensional CFTs at nonzero temperature. These functions characterize the spectrum of charged excitations of the theory and are basic experimental observables. We compute these conductivities for large $N$ Chern-Simons theory with fermion matter. The computation is exact in the 't Hooft coupling $λ$ at $N = \infty$. We describe various physical features of the conductivity, including an explicit relation between the weight of the delta function at $ω= 0$ in $σ_{xx}$ and the existence of infinitely many higher spin conserved currents in the theory. We also compute the conductivities perturbatively in Chern-Simons theory with scalar matter and show that the resulting functions of $ω/T$ agree with the strong coupling fermionic result. This provides a new test of the conjectured 3d bosonization duality. In matching the Hall conductivities we resolve an outstanding puzzle by carefully treating an extra anomaly that arises in the regularization scheme used.

preprint2015arXiv

Brane Inflation and Moduli Stabilization on Twisted Tori

We consider supergravity compactifications on 6-dimensional twisted tori, which are 5-torus fibrations of the circle. The motion of branes on such manifolds can lead to power-law potentials at low energy, that may be useful for inflation. We classify the possible low energy potentials one can obtain by wrapping branes on different cycles of the fibre. Turning to the problem of moduli stabilization in such models, we prove a no-go result for solutions with parametrically small cosmological constant, under certain assumptions for the orientifolds and D-branes. We also consider the role of discrete Wilson lines in moduli stabilization on general closed manifolds, and show that gauge invariance restricts their contributions to the effective potential. We derive the allowed discrete Wilson lines in massive Type IIA supergravity on twisted tori. We conclude with a detailed example, computing the effective potentials in a class of models involving a twisted torus and an orientifold 6-plane.

preprint2015arXiv

Three Dimensional Bosonization From Supersymmetry

Three dimensional bosonization is a conjectured duality between non-supersymmetric Chern-Simons theories coupled to matter fields in the fundamental representation of the gauge group. There is a well-established supersymmetric version of this duality, which involves Chern-Simons theories with ${\cal N} = 2$ supersymmetry coupled to fundamental chiral multiplets. Assuming that the supersymmetric duality is valid, we prove that non-supersymmetric bosonization holds for all planar correlators of single-trace operators. The main tool we employ is a double-trace flow from the supersymmetric theory to an IR fixed point, in which the scalars and fermions are effectively decoupled in the planar limit. A generalization of this technique can be used to derive the duality mapping of all renormalizable couplings, in non-supersymmetric theories with both a scalar and a fermion. Our results do not rely on an explicit computation of planar diagrams.

preprint2013arXiv

The Thermal Free Energy in Large N Chern-Simons-Matter Theories

We compute the thermal free energy in large N U(N) Chern-Simons-matter theories with matter fields (scalars and/or fermions) in the fundamental representation, in the large temperature limit. We note that in these theories the eigenvalue distribution of the holonomy of the gauge field along the thermal circle does not localize even at very high temperatures, and this affects the computation significantly. We verify that our results are consistent with the conjectured dualities between Chern-Simons-matter theories with scalar fields and with fermion fields, as well as with the strong-weak coupling duality of the N=2 supersymmetric Chern-Simons-matter theory.

preprint2012arXiv

Correlation Functions of Large N Chern-Simons-Matter Theories and Bosonization in Three Dimensions

We consider the conformal field theory of N complex massless scalars in 2+1 dimensions, coupled to a U(N) Chern-Simons theory at level k. This theory has a 't Hooft large N limit, keeping fixed λ= N/k. We compute some correlation functions in this theory exactly as a function of λ, in the large N (planar) limit. We show that the results match with the general predictions of Maldacena and Zhiboedov for the correlators of theories that have high-spin symmetries in the large N limit. It has been suggested in the past that this theory is dual (in the large N limit) to the Legendre transform of the theory of fermions coupled to a Chern-Simons gauge field, and our results allow us to find the precise mapping between the two theories. We find that in the large N limit the theory of N scalars coupled to a U(N)_k Chern-Simons theory is equivalent to the Legendre transform of the theory of k fermions coupled to a U(k)_N Chern-Simons theory, thus providing a bosonization of the latter theory. We conjecture that perhaps this duality is valid also for finite values of N and k, where on the fermionic side we should now have (for N_f flavors) a U(k)_{N-N_f/2} theory. Similar results hold for real scalars (fermions) coupled to the O(N)_k Chern-Simons theory.

preprint2012arXiv

Correlators of Large N Fermionic Chern-Simons Vector Models

We consider the large N limit of three-dimensional U(N)_k Chern-Simons theory coupled to a Dirac fermion in the fundamental representation. In this limit, we compute several correlators to all orders in the `t Hooft coupling N/k. It was suggested recently that this theory is dual to the Legendre-transformed theory of scalar fields coupled to Chern-Simons gauge interactions. Our results show that this duality holds for any value of the `t Hooft coupling, at least at the level of the planar 3-point functions. In addition, we determine the sign in the duality transformation of the Chern-Simons level, as well as the relation between the "triple-trace" deformation which exists in the bosonic Chern-Simons theory and in the Legendre-transformed fermionic theory.

preprint2012arXiv

Three-Prong Distribution of Massive Narrow QCD Jets

We study the planar-flow distributions of narrow, highly boosted, massive QCD jets. Using the factorization properties of QCD in the collinear limit, we compute the planar-flow jet function from the one-to-three splitting function at tree-level. We derive the leading-log behavior of the jet function analytically. We also compare our semi-analytic jet function with parton-shower predictions using various generators.

preprint2011arXiv

Classification of Energy Flow Observables in Narrow Jets

We present a classification of energy flow variables for highly collimated jets. Observables are constructed by taking moments of the energy flow and forming scalars of a suitable Lorentz subgroup. The jet shapes are naturally arranged in an expansion in both angular and energy resolution, allowing us to derive the natural observables for describing an N-particle jet. We classify the leading variables that characterize jets with up to 4 particles. We rediscover the familiar jet mass, angularities, and planar flow, which dominate the lowest order substructure variables. We also discover several new observables and we briefly discuss their physical interpretation.

preprint2011arXiv

d=3 Bosonic Vector Models Coupled to Chern-Simons Gauge Theories

We study three dimensional O(N)_k and U(N)_k Chern-Simons theories coupled to a scalar field in the fundamental representation, in the large N limit. For infinite k this is just the singlet sector of the O(N) (U(N)) vector model, which is conjectured to be dual to Vasiliev's higher spin gravity theory on AdS_4. For large k and N we obtain a parity-breaking deformation of this theory, controlled by the 't Hooft coupling lambda = 4 πN / k. For infinite N we argue (and show explicitly at two-loop order) that the theories with finite lambda are conformally invariant, and also have an exactly marginal (ϕ^2)^3 deformation. For large but finite N and small 't Hooft coupling lambda, we show that there is still a line of fixed points parameterized by the 't Hooft coupling lambda. We show that, at infinite N, the interacting non-parity-invariant theory with finite lambda has the same spectrum of primary operators as the free theory, consisting of an infinite tower of conserved higher-spin currents and a scalar operator with scaling dimension Δ=1; however, the correlation functions of these operators do depend on lambda. Our results suggest that there should exist a family of higher spin gravity theories, parameterized by lambda, and continuously connected to Vasiliev's theory. For finite N the higher spin currents are not conserved.

Guy Gur-Ari

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

Solving Quantitative Reasoning Problems with Language Models

On the training dynamics of deep networks with $L_2$ regularization

Show Your Work: Scratchpads for Intermediate Computation with Language Models

On the asymptotics of wide networks with polynomial activations

The large learning rate phase of deep learning: the catapult mechanism

Chaos in Classical D0-Brane Mechanics

Transport in Chern-Simons-Matter Theories

Brane Inflation and Moduli Stabilization on Twisted Tori

Three Dimensional Bosonization From Supersymmetry

The Thermal Free Energy in Large N Chern-Simons-Matter Theories

Correlation Functions of Large N Chern-Simons-Matter Theories and Bosonization in Three Dimensions

Correlators of Large N Fermionic Chern-Simons Vector Models

Three-Prong Distribution of Massive Narrow QCD Jets

Classification of Energy Flow Observables in Narrow Jets

d=3 Bosonic Vector Models Coupled to Chern-Simons Gauge Theories