Source author record

Jan E. Gerken

Jan E. Gerken appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning hep-th math.NT Artificial Intelligence Computer Vision

Catalog footprint

What is connected

6works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Criticality and Saturation in Orthogonal Neural Networks

It has been known for a long time that initializing weight matrices to be orthogonal instead of having i.i.d. Gaussian components can improve training performance. This phenomenon can be analyzed using finite-width corrections, where the infinite-width statistics are supplemented by a power series in $1/\mathrm{width}$. In particular, recent empirical results by Day et al. show that the tensors appearing in this treatment stabilize for large depth, as opposed to the tensors of i.i.d.-initialized networks. In this article, we derive explicit layer-wise recursion relations for the tensors appearing in the finite-width expansion of the network statistics in the case of orthogonal initializations. We also provide an extension of recently-introduced Feynman diagrams for the corresponding recursions in the i.i.d.-case which are valid to all orders in $1/\mathrm{width}$. Finally, we show explicitly that the recursions we derive reproduce the stability of the finite-width tensors which was observed for activation functions with vanishing fixed point. This work therefore provides a theoretical explanation for the stability of nonlinear networks of finite width initialized with orthogonal weights, closing a long-standing gap in the literature. We validate our theoretical results experimentally by showing that numerical solutions of our recursion relations and their analytical large-depth expansions agree excellently with Monte-Carlo estimates from network ensembles.

preprint2026arXiv

From Layers to Networks: Comparing Neural Representations via Diffusion Geometry

Diffusion geometry is a manifold learning framework that uses random walks defined by Markov transition matrices to characterize the geometry of a dataset at multiple scales. We use diffusion geometry for neural representations, incorporating tools from multi-view learning into this field for the first time. Our key technical observation is that a broad class of similarity measures based on representational similarity matrices (RSMs) admits a closed-form equivalent formulation in terms of row-stochastic Markov matrices, opening the door to manipulations from diffusion geometry. As a first application, we develop multi-scale variants of Centered Kernel Alignment and Distance Correlation, which utilise the $t^{th}$ power of the underlying transition matrix to probe the data geometry at adjustable diffusion scales. Going further, we introduce variants of these measures which fuse the Markov matrices of several layers via alternating diffusion into a single operator that captures the network's joint sample geometry, allowing similarity to be computed across multiple layers and shifting the comparison from layer-to-layer to network-to-network. We perform extensive numerical experiments, evaluating our measures on the Representational Similarity (ReSi) benchmark comprising 14 architectures trained on 7 datasets across three different domains. Our methods achieve SoTA results in accuracy and output correlation for both language and vision tasks across different models. We furthermore show SoTA performance on an additional benchmark evaluating on out-of-distribution data.

preprint2022arXiv

Diffeomorphic Counterfactuals with Generative Models

Counterfactuals can explain classification decisions of neural networks in a human interpretable way. We propose a simple but effective method to generate such counterfactuals. More specifically, we perform a suitable diffeomorphic coordinate transformation and then perform gradient ascent in these coordinates to find counterfactuals which are classified with great confidence as a specified target class. We propose two methods to leverage generative models to construct such suitable coordinate systems that are either exactly or approximately diffeomorphic. We analyze the generation process theoretically using Riemannian differential geometry and validate the quality of the generated counterfactuals using various qualitative and quantitative measures.

preprint2022arXiv

Equivariance versus Augmentation for Spherical Images

We analyze the role of rotational equivariance in convolutional neural networks (CNNs) applied to spherical images. We compare the performance of the group equivariant networks known as S2CNNs and standard non-equivariant CNNs trained with an increasing amount of data augmentation. The chosen architectures can be considered baseline references for the respective design paradigms. Our models are trained and evaluated on single or multiple items from the MNIST or FashionMNIST dataset projected onto the sphere. For the task of image classification, which is inherently rotationally invariant, we find that by considerably increasing the amount of data augmentation and the size of the networks, it is possible for the standard CNNs to reach at least the same performance as the equivariant network. In contrast, for the inherently equivariant task of semantic segmentation, the non-equivariant networks are consistently outperformed by the equivariant networks with significantly fewer parameters. We also analyze and compare the inference latency and training times of the different networks, enabling detailed tradeoff considerations between equivariant architectures and data augmentation for practical problems. The equivariant spherical networks used in the experiments are available at https://github.com/JanEGerken/sem_seg_s2cnn .

preprint2020arXiv

All-order differential equations for one-loop closed-string integrals and modular graph forms

We investigate generating functions for the integrals over world-sheet tori appearing in closed-string one-loop amplitudes of bosonic, heterotic and type-II theories. These closed-string integrals are shown to obey homogeneous and linear differential equations in the modular parameter of the torus. We spell out the first-order Cauchy-Riemann and second-order Laplace equations for the generating functions for any number of external states. The low-energy expansion of such torus integrals introduces infinite families of non-holomorphic modular forms known as modular graph forms. Our results generate homogeneous first- and second-order differential equations for arbitrary such modular graph forms and can be viewed as a step towards all-order low-energy expansions of closed-string integrals.

preprint2020arXiv

Generating series of all modular graph forms from iterated Eisenstein integrals

We study generating series of torus integrals that contain all so-called modular graph forms relevant for massless one-loop closed-string amplitudes. By analysing the differential equation of the generating series we construct a solution for its low-energy expansion to all orders in the inverse string tension $α'$. Our solution is expressed through initial data involving multiple zeta values and certain real-analytic functions of the modular parameter of the torus. These functions are built from real and imaginary parts of holomorphic iterated Eisenstein integrals and should be closely related to Brown's recent construction of real-analytic modular forms. We study the properties of our real-analytic objects in detail and give explicit examples to a fixed order in the $α'$-expansion. In particular, our solution allows for a counting of linearly independent modular graph forms at a given weight, confirming previous partial results and giving predictions for higher, hitherto unexplored weights. It also sheds new light on the topic of uniform transcendentality of the $α'$-expansion.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint