Source author record

Aristide Baratin

Aristide Baratin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

gr-qc hep-th Machine Learning Artificial Intelligence math-ph math.MP math.QA math.CT

Catalog footprint

What is connected

17works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Layerwise LQR for Geometry-Aware Optimization of Deep Networks

Geometry-aware optimizers such as Newton and natural gradient can improve conditioning in deep learning, but scalable variants such as K-FAC, Shampoo, and related preconditioners usually impose structural approximations early, often discarding cross-layer interactions induced by the network computation. We introduce Layerwise LQR (LLQR), a framework for learning structured inverse preconditioners under a global layerwise optimal-control objective. The starting point is an exact equivalence: the steepest-descent step under a broad class of divergence-induced quadratic models--including Newton, Gauss-Newton, Fisher/natural-gradient, and intermediate-layer metrics--can be written as a finite-horizon Linear Quadratic Regulator (LQR) problem. This formulation serves as a reference that exposes the layerwise dynamics and cost matrices encoding the original dense geometry. We then derive a scalable relaxation that learns diagonal, (E-)Kronecker-factored, or other structured inverse preconditioners by minimizing the LQR objective and reusing them across iterations. The resulting optimizer wraps standard methods while retaining a principled connection to second-order geometry, without forming or inverting the global curvature matrix. Experiments on ResNets and Transformers show that LLQR improves optimization dynamics and often translates these gains into improved final test performance, while adding only modest wall-clock overhead. It establishes LLQR as a practical framework for geometry-aware second-order methods and a reference for evaluating scalable approximations.

preprint2026arXiv

Navigating Potholes with Geometry-Aware Sharpness Minimization

Sharpness-aware minimization (SAM) encourages flat minima by perturbing parameters along directions of high loss curvature, but treats all parameter directions uniformly, ignoring the underlying loss geometry. We introduce LLQR+SAM, which combines SAM with a learned preconditioner obtained from the recently proposed LLQR framework, a second-order method that recasts steepest descent as a layerwise linear-quadratic regulator problem. The preconditioner is updated sparsely and maintained as a slow exponential moving average, so it captures a smoothed, low-resolution picture of the loss landscape geometry. The SAM perturbation then operates on top of this learned geometry, probing curvature at a faster timescale. We show that this two-timescale structure is not merely a computational convenience: theoretically, the preconditioner amplifies the SAM escape signal in directions that are flat under the average geometry but locally sharp (potholes). Wide, flat basins, by contrast, remain stable. Empirically, LLQR+SAM gives consistent gains over both SAM and LLQR alone across standard vision and sequence modeling benchmarks, supporting the view that slow learned geometry and fast sharpness correction are genuinely complementary.

preprint2025arXiv

Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons

When training neural networks, dying neurons -- units becoming inactive or saturated -- are traditionally seen as harmful. This paper sheds new light on this phenomenon. By exploring the impact of various hyperparameter configurations on dying neurons during training, we gather insights on how to improve upon sparse training approaches to pruning. We introduce Demon Pruning (DemP), a method that controls the proliferation of dead neurons through a combination of noise injection on active units and a one-cycle schedule regularization strategy, dynamically leading to network sparsity. Experiments on CIFAR-10 and ImageNet datasets demonstrate that DemP outperforms existing dense-to-sparse structured pruning methods, achieving better accuracy-sparsity tradeoffs and accelerating training by up to 3.56$\times$. These findings provide a novel perspective on dying neurons as a resource for efficient model compression and optimization.

preprint2021arXiv

On the Regularity of Attention

Attention is a powerful component of modern neural networks across a wide variety of domains. In this paper, we seek to quantify the regularity (i.e. the amount of smoothness) of the attention operation. To accomplish this goal, we propose a new mathematical framework that uses measure theory and integral operators to model attention. We show that this framework is consistent with the usual definition, and that it captures the essential properties of attention. Then we use this framework to prove that, on compact domains, the attention operation is Lipschitz continuous and provide an estimate of its Lipschitz constant. Additionally, by focusing on a specific type of attention, we extend these Lipschitz continuity results to non-compact domains. We also discuss the effects regularity can have on NLP models, and applications to invertible and infinitely-deep networks.

preprint2020arXiv

A Mathematical Theory of Attention

Attention is a powerful component of modern neural networks across a wide variety of domains. However, despite its ubiquity in machine learning, there is a gap in our understanding of attention from a theoretical point of view. We propose a framework to fill this gap by building a mathematically equivalent model of attention using measure theory. With this model, we are able to interpret self-attention as a system of self-interacting particles, we shed light on self-attention from a maximum entropy perspective, and we show that attention is actually Lipschitz-continuous (with an appropriate metric) under suitable assumptions. We then apply these insights to the problem of mis-specified input data; infinitely-deep, weight-sharing self-attention networks; and more general Lipschitz estimates for a specific type of attention studied in concurrent work.

preprint2015arXiv

A 2-categorical state sum model

It has long been argued that higher categories provide the proper algebraic structure underlying state sum invariants of 4-manifolds. This idea has been refined recently, by proposing to use 2-groups and their representations as specific examples of 2-categories. The challenge has been to make these proposals fully explicit. Here we give a concrete realization of this program. Building upon our earlier work with Baez and Wise on the representation theory of 2-groups, we construct a four-dimensional state sum model based on a categorified version of the Euclidean group. We define and explicitly compute the simplex weights, which may be viewed a categorified analogue of Racah-Wigner 6$j$-symbols. These weights solve an hexagon equation that encodes the formal invariance of the state sum under the Pachner moves of the triangulation. This result unravels the combinatorial formulation of the Feynman amplitudes of quantum field theory on flat spacetime proposed in [1], which was shown to lead after gauge-fixing to Korepanov's invariant of 4-manifolds.

preprint2014arXiv

Melonic phase transition in group field theory

Group field theories have recently been shown to admit a 1/N expansion dominated by so-called `melonic graphs', dual to triangulated spheres. In this note, we deepen the analysis of this melonic sector. We obtain a combinatorial formula for the melonic amplitudes in terms of a graph polynomial related to a higher dimensional generalization of the Kirchhoff tree-matrix theorem. Simple bounds on these amplitudes show the existence of a phase transition driven by melonic interaction processes. We restrict our study to the Boulatov-Ooguri models, which describe topological BF theories and are the basis for the construction of four dimensional models of quantum gravity.

preprint2014arXiv

Weighting bubbles in group field theory

Group field theories (GFT) are higher dimensional generalizations of matrix models whose Feynman diagrams are dual to triangulations. Here we propose a modification of GFT models that includes extra field indices keeping track of the bubbles of the graphs in the Feynman evaluations. In dimension three, our model exhibits new symmetries, interpreted as the action of the vertex translations of the triangulation. The extra field indices have an elegant algebraic interpretation: they encode the structure of a semi-simple algebra. Remarkably, when the algebra is chosen to be associative, the new structure contributes a topological invariant from each bubble of the graph to the Feynman amplitudes.

preprint2012arXiv

Non-commutative flux representation for loop quantum gravity

The Hilbert space of loop quantum gravity is usually described in terms of cylindrical functionals of the gauge connection, the electric fluxes acting as non-commuting derivation operators. It has long been believed that this non-commutativity prevents a dual flux (or triad) representation of loop quantum gravity to exist. We show here, instead, that such a representation can be explicitly defined, by means of a non-commutative Fourier transform defined on the loop gravity state space. In this dual representation, flux operators act by *-multiplication and holonomy operators act by translation. We describe the gauge invariant dual states and discuss their geometrical meaning. Finally, we apply the construction to the simpler case of a U(1) gauge group and compare the resulting flux representation with the triad representation used in loop quantum cosmology.

preprint2012arXiv

The Holst Spin Foam Model via Cubulations

Spin foam models are an attempt for a covariant, or path integral formulation of canonical loop quantum gravity. The construction of such models usually rely on the Plebanski formulation of general relativity as a constrained BF theory and is based on the discretization of the action on a simplicial triangulation, which may be viewed as an ultraviolet regulator. The triangulation dependence can be removed by means of group field theory techniques, which allows one to sum over all triangulations. The main tasks for these models are the correct quantum implementation of the Plebanski constraints, the existence of a semiclassical sector implementing additional "Regge-like" constraints arising from simplicial triangulations, and the definition of the physical inner product of loop quantum gravity via group field theory. Here we propose a new approach to tackle these issues stemming directly from the Holst action for general relativity, which is also a proper starting point for canonical loop quantum gravity. The discretization is performed by means of a "cubulation" of the manifold rather than a triangulation. We give a direct interpretation of the resulting spin foam model as a generating functional for the n-point functions on the physical Hilbert space at finite regulator. This paper focuses on ideas and tasks to be performed before the model can be taken seriously. However, our analysis reveals some interesting features of this model: first, the structure of its amplitudes differs from the standard spin foam models. Second, the tetrad n-point functions admit a "Wick-like" structure. Third, the restriction to simple representations does not automatically occur -- unless one makes use of the time gauge, just as in the classical theory.

preprint2011arXiv

Diffeomorphisms in group field theories

We study the issue of diffeomorphism symmetry in group field theories (GFT), using the recently introduced noncommutative metric representation. In the colored Boulatov model for 3d gravity, we identify a field (quantum) symmetry which ties together the vertex translation invariance of discrete gravity, the flatness constraint of canonical quantum gravity, and the topological (coarse-graining) identities for the 6j-symbols. We also show how, for the GFT graphs dual to manifolds, the invariance of the Feynman amplitudes encodes the discrete residual action of diffeomorphisms in simplicial gravity path integrals. We extend the results to GFT models for higher dimensional BF theories and discuss various insights that they provide on the GFT formalism itself.

preprint2011arXiv

Group field theory and simplicial gravity path integrals: A model for Holst-Plebanski gravity

In a recent work, a dual formulation of group field theories as non-commutative quantum field theories has been proposed, providing an exact duality between spin foam models and non-commutative simplicial path integrals for constrained BF theories. In light of this new framework, we define a model for 4d gravity which includes the Immirzi parameter gamma. It reproduces the Barrett-Crane amplitudes when gamma goes to infinity, but differs from existing models otherwise; in particular it does not require any rationality condition for gamma. We formulate the amplitudes both as BF simplicial path integrals with explicit non-commutative B variables, and in spin foam form in terms of Wigner 15j-symbols. Finally, we briefly discuss the correlation between neighboring simplices, often argued to be a problematic feature, for example, in the Barrett-Crane model.

preprint2011arXiv

Infinite-Dimensional Representations of 2-Groups

A "2-group" is a category equipped with a multiplication satisfying laws like those of a group. Just as groups have representations on vector spaces, 2-groups have representations on "2-vector spaces", which are categories analogous to vector spaces. Unfortunately, Lie 2-groups typically have few representations on the finite-dimensional 2-vector spaces introduced by Kapranov and Voevodsky. For this reason, Crane, Sheppeard and Yetter introduced certain infinite-dimensional 2-vector spaces called "measurable categories" (since they are closely related to measurable fields of Hilbert spaces), and used these to study infinite-dimensional representations of certain Lie 2-groups. Here we continue this work. We begin with a detailed study of measurable categories. Then we give a geometrical description of the measurable representations, intertwiners and 2-intertwiners for any skeletal measurable 2-group. We study tensor products and direct sums for representations, and various concepts of subrepresentation. We describe direct sums of intertwiners, and sub-intertwiners - features not seen in ordinary group representation theory. We study irreducible and indecomposable representations and intertwiners. We also study "irretractable" representations - another feature not seen in ordinary group representation theory. Finally, we argue that measurable categories equipped with some extra structure deserve to be considered "separable 2-Hilbert spaces", and compare this idea to a tentative definition of 2-Hilbert spaces as representation categories of commutative von Neumann algebras.

preprint2011arXiv

Quantum simplicial geometry in the group field theory formalism: reconsidering the Barrett-Crane model

A dual formulation of group field theories, obtained by a Fourier transform mapping functions on a group to functions on its Lie algebra, has been proposed recently. In the case of the Ooguri model for SO(4) BF theory, the variables of the dual field variables are thus so(4) bivectors, which have a direct interpretation as the discrete B variables. Here we study a modification of the model by means of a constraint operator implementing the simplicity of the bivectors, in such a way that projected fields describe metric tetrahedra. This involves a extension of the usual GFT framework, where boundary operators are labelled by projected spin network states. By construction, the Feynman amplitudes are simplicial path integrals for constrained BF theory. We show that the spin foam formulation of these amplitudes corresponds to a variant of the Barrett-Crane model for quantum gravity. We then re-examin the arguments against the Barrett-Crane model(s), in light of our construction.

preprint2011arXiv

Ten questions on Group Field Theory (and their tentative answers)

We provide a short and non-technical summary of our current knowledge and some possible perspectives on the group field theory formalism for quantum gravity, in the form of a (partial) FAQ (with answers). Some of the questions and answers relate to aspects of the formalism that concern loop quantum gravity. This summary also aims at giving a brief, rough guide to the recent literature on group field theory (and tensor models).

preprint2010arXiv

Group field theory with non-commutative metric variables

We introduce a dual formulation of group field theories, making them a type of non-commutative field theories. In this formulation, the variables of the field are Lie algebra variables with a clear interpretation in terms of simplicial geometry. For Ooguri-type models, the Feynman amplitudes are simplicial path integrals for BF theories. This formulation suggests ways to impose the simplicity constraints involved in BF formulations of 4d gravity directly at the level of the group field theory action. We illustrate this by giving a new GFT definition of the Barrett-Crane model.

preprint2009arXiv

2-Group Representations for Spin Foams

Just as 3d state sum models, including 3d quantum gravity, can be built using categories of group representations, "2-categories of 2-group representations" may provide interesting state sum models for 4d quantum topology, if not quantum gravity. Here we focus on the "Euclidean 2-group", built from the rotation group SO(4) and its action on the group of translations of 4d Euclidean space. We explain its infinite-dimensional unitary representations, and construct a model based on the resulting representation 2-category. This model, with clear geometric content and explicit "metric data" on triangulation edges, shows up naturally in an attempt to write the amplitudes of ordinary quantum field theory in a background independent way.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint