Researcher profile

Andrey Gromov

Andrey Gromov contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Learning Rate Transfer in Normalized Transformers

The Normalized Transformer, or nGPT (arXiv:2410.01131) achieves impressive training speedups and does not require weight decay or learning rate warmup. However, despite having hyperparameters that explicitly scale with model size, we observe that nGPT does not exhibit learning rate transfer across model dimension and token horizon. To rectify this, we combine numerical experiments with a principled use of alignment exponents (arXiv:2407.05872) to revisit and modify the $μ$P approach to hyperparameter transfer (arXiv:2011.14522). The result is a novel nGPT parameterization we call $ν$GPT. Through extensive empirical validation, we find $ν$GPT exhibits learning rate transfer across width, depth, and token horizon.

preprint2026arXiv

On the origin of neural scaling laws: from random graphs to natural language

Scaling laws have played a major role in the modern AI revolution, providing practitioners predictive power over how the model performance will improve with increasing data, compute, and number of model parameters. This has spurred an intense interest in the origin of neural scaling laws, with a common suggestion being that they arise from power law structure already present in the data. In this paper we study scaling laws for transformers trained to predict random walks (bigrams) on graphs with tunable complexity. We demonstrate that this simplified setting already gives rise to neural scaling laws even in the absence of power law structure in the data correlations. We further consider dialing down the complexity of natural language systematically, by training on sequences sampled from increasingly simplified generative language models, from 4,2,1-layer transformer language models down to language bigrams, revealing a monotonic evolution of the scaling exponents. Our results also include scaling laws obtained from training on random walks on random graphs drawn from Erdös-Renyi and scale-free Barabási-Albert ensembles. Finally, we revisit conventional scaling laws for language modeling, demonstrating that several essential results can be reproduced using 2 layer transformers with context length of 50, provide a critical analysis of various fits used in prior literature, demonstrate an alternative method for obtaining compute optimal curves as compared with current practice in published literature, and provide preliminary evidence that maximal update parameterization may be more parameter efficient than standard parameterization.

preprint2023arXiv

Grokking modular arithmetic

We present a simple neural network that can learn modular arithmetic tasks and exhibits a sudden jump in generalization known as ``grokking''. Concretely, we present (i) fully-connected two-layer networks that exhibit grokking on various modular arithmetic tasks under vanilla gradient descent with the MSE loss function in the absence of any regularization; (ii) evidence that grokking modular arithmetic corresponds to learning specific feature maps whose structure is determined by the task; (iii) analytic expressions for the weights -- and thus for the feature maps -- that solve a large class of modular arithmetic tasks; and (iv) evidence that these feature maps are also found by vanilla gradient descent as well as AdamW, thereby establishing complete interpretability of the representations learnt by the network.

preprint2022arXiv

AutoInit: Automatic Initialization via Jacobian Tuning

Good initialization is essential for training Deep Neural Networks (DNNs). Oftentimes such initialization is found through a trial and error approach, which has to be applied anew every time an architecture is substantially modified, or inherited from smaller size networks leading to sub-optimal initialization. In this work we introduce a new and cheap algorithm, that allows one to find a good initialization automatically, for general feed-forward DNNs. The algorithm utilizes the Jacobian between adjacent network blocks to tune the network hyperparameters to criticality. We solve the dynamics of the algorithm for fully connected networks with ReLU and derive conditions for its convergence. We then extend the discussion to more general architectures with BatchNorm and residual connections. Finally, we apply our method to ResMLP and VGG architectures, where the automatic one-shot initialization found by our method shows good performance on vision tasks.

preprint2022arXiv

Fracton Matter

We review a burgeoning field of "fractons" -- a class of models where quasi-particles are strictly immobile or display restricted mobility that can be understood through generalized multipolar symmetries and associated conservation laws. Focusing on just a corner of this fast-growing subject, we will demonstrate how one class of such theories -- symmetric tensor and coupled-vector gauge theories surprisingly emerge from familiar elasticity of a two-dimensional quantum crystal. The disclination and dislocation crystal defects respectively map onto charges and dipoles of the fracton gauge theory. This fracton-elasticity duality leads to predictions of fractonic phases and quantum phase transitions to their descendants, that are duals of the commensurate crystal, supersolid, smectic, hexatic liquid crystals, as well as amorphous solids, quasi-crystals and elastic membranes. We show how these dual gauge theories provide a field theoretic description of quantum melting transitions through a generalized Higgs mechanism. We demonstrate how they can be equivalently constructed as gauged models with global multipole symmetries. We expect extensions of such gauge-elasticity dualities to generalized elasticity theories provide a route to discovery of new fractonic models and their potential experimental realizations.

preprint2022arXiv

Very high-energy collective states of partons in fractional quantum Hall liquids

The low energy physics of fractional quantum Hall (FQH) states -- a paradigm of strongly correlated topological phases of matter -- to a large extent is captured by weakly interacting quasiparticles known as composite fermions (CFs). In this paper, based on numerical simulations and effective field theory, we argue that some \emph{high energy} states in the FQH spectra necessitate a different description based on \emph{parton} quasiparticles. We show that Jain states at filling factor $ν{=}n/(2pn\pm1)$ with integers $n,p{\geq}2$, support two kinds of collective modes: in addition to the well-known Girvin-MacDonald-Platzman (GMP) mode, they host a high energy collective mode, which is interpreted as the GMP mode of partons. We elucidate observable signatures of the parton mode in the dynamics following a geometric quench. We construct a microscopic wave function for the parton mode, and demonstrate agreement between its variational energy and exact diagonalization. Using the parton construction, we derive a field theory of the Jain states and show that the previously proposed effective theories follow from our approach. Our results point to partons being "real" quasiparticles which, in a way reminiscent of quarks, only become observable at sufficiently high energies.

preprint2020arXiv

A Duality Between U(1) Haah Code and 3D Smectic A Phase

We describe a duality between multipole gauge theories and spatially ordered phases. Our main example is a duality between the multipole gauge theory description of the U(1) Haah code and smectic A phase in three spatial dimensions. We show how multipole symmetries restrict the mobility of dislocations and disclinations in smectic A phase. Finally, we exhibit a 2D version of the duality.

preprint2020arXiv

Fracton hydrodynamics

We introduce new classes of hydrodynamic theories inspired by the recently discovered fracton phases of quantum matter. Fracton phases are characterized by elementary excitations (fractons) with restricted mobility. The hydrodynamic theories we introduce describe thermalization in systems with fracton-like mobility constraints, including fluids where charge and dipole moment are both locally conserved, and fluids where charge is conserved along every line or plane of a lattice. Each of these fluids is subdiffusive, and constitutes a new universality class of hydrodynamic behavior. There are infinitely many such classes, each with distinct subdiffusive exponents, all of which are captured by our formalism. Our framework naturally explains recent results on dynamics with constrained quantum circuits, as well as recent experiments with ultracold atoms in tilted optical lattices. We identify crisp experimental signatures of these novel hydrodynamics, and explain how they may be realized in near term ultracold atom experiments.

preprint2020arXiv

On duality between Cosserat elasticity and fractons

We present a dual formulation of the Cosserat theory of elasticity. In this theory a local element of an elastic body is described in terms of local displacement and local orientation. Upon the duality transformation these degrees of freedom map onto a coupled theory of a vector-valued one-form gauge field and an ordinary $U(1)$ gauge field. We discuss the degrees of freedom in the corresponding gauge theories, the defect matter and coupling to the curved space.

preprint2020arXiv

Quench dynamics of collective modes in fractional quantum Hall bilayers

We introduce different types of quenches to probe the non-equilibrium dynamics and multiple collective modes of bilayer fractional quantum Hall states. We show that applying an electric field in one layer induces oscillations of a spin-1 degree of freedom, whose frequency matches the long-wavelength limit of the dipole mode. On the other hand, oscillations of the long-wavelength limit of the quadrupole mode, i.e., the spin-2 graviton, as well as the combination of two spin-1 states, can be activated by a sudden change of band mass anisotropy. We construct an effective field theory to describe the quench dynamics of these collective modes. In particular, we derive the dynamics for both the spin-2 and the spin-1 states and demonstrate their excellent agreement with numerics.

preprint2020arXiv

Vortices and Fractons

We discuss a simple and experimentally available realization of fracton physics. We note that superfluid vortices form a Hamiltonian system that conserves total dipole moment and trace of the quadrupole moment of vorticity; thereby establishing a relation to a traceless scalar charge theory in two spatial dimensions. Next we consider the limit where the number of vortices is large and show that emergent vortex hydrodynamics also conserves these moments. Finally, we show the motion of vortices and of fractons on curved surfaces agree, thereby opening a route to experimental study of the interplay between fracton physics and curved space. Our conclusions also apply to charged particles in strong magnetic field.

preprint2019arXiv

Anisotropic odd viscosity via time-modulated drive

At equilibrium, the structure and response of ordered phases are typically determined by the spontaneous breaking of spatial symmetries. Out of equilibrium, spatial order itself can become a dynamically emergent concept. In this article, we show that spatially anisotropic viscous coefficients and stresses can be designed in a far-from-equilibrium fluid by applying to its constituents a time-modulated drive. If the drive induces a rotation whose rate is slowed down when the constituents point along specific directions, anisotropic structures and mechanical responses arise at long timescales. We demonstrate that the viscous response of such anisotropic driven fluids can acquire a tensorial, dissipationless component called anisotropic odd (or Hall) viscosity. Classical fluids with internal torques can display additional components of the odd viscosity neglected in previous studies of quantum Hall fluids that assumed angular momentum conservation. We show that these anisotropic and angular momentum-violating odd-viscosity coefficients can change even the bulk flow of an incompressible fluid by acting as a source of vorticity. In addition, shear distortions in the shape of an inclusion result in torques.

preprint2019arXiv

Effective response theory for Floquet topological systems

We present an effective field theory approach to the topological response of Floquet systems with symmetry group $G$. This is achieved by introducing a background $G$ gauge field in the Schwinger-Keldysh formalism, which is suitable for far from equilibrium systems. We carry out this program for chiral topological Floquet systems (anomalous Floquet-Anderson insulators) in two spatial dimensions, and the group cohomology models of topological Floquet unitaries. These response actions serve as many-body topological invariants for topological Floquet unitaries. The effective action approach also leads us to propose novel topological response functions.