Researcher profile

Maarten V. de Hoop

Maarten V. de Hoop contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2026arXiv

Dimension-Uniform Discretization Analysis of Preconditioned Annealed Langevin Dynamics for Multimodal Gaussian Mixtures

Obtaining stable diffusion-based samplers in high- and infinite-dimensional settings is challenging because errors can accumulate across high-frequency coordinates and make the dynamics unstable under refinement of the finite-dimensional approximation of the underlying function-space problem. Discretization is a typical source of such errors, and preconditioning with a suitable spectral decay is one way to control their accumulation. In this paper, we study this problem for preconditioned annealed Langevin dynamics (ALD) applied to Gaussian mixtures. We first show that Euler-Maruyama (EM) discretization, by treating the stiff linear part of the annealed score with a forward Euler step, imposes a stability constraint coupling the preconditioner with the annealed covariance scale. Together with the conditions ensuring dimension-uniform control of the annealed dynamics, this constraint forces the initial smoothed law to remain uniformly close to the target across dimensions. We then consider an exponential-integrator scheme that integrates the stiff linear part of the annealed score exactly. Under explicit spectral summability conditions coupling the smoothing covariance, the component covariance spectra, and the preconditioner, we prove a dimension-uniform Kullback-Leibler (KL) bound for this scheme. This bound can be made arbitrarily small, uniformly in dimension, by allowing enough time for annealing and then refining the time mesh accordingly. Importantly, these conditions allow regimes in which the KL divergence between the target and the initial smoothed law diverges with dimension, showing that the restrictions imposed by EM are scheme-dependent rather than intrinsic to ALD.

preprint2026arXiv

Function graph transformers universally approximate operators between function spaces

We study the approximation of nonlinear operators between function spaces by transformers. Our approach is to lift functions to measures supported on their graphs and leverage a recently introduced measure-theoretic view of transformers. A function $h$ is represented by its graph measure $γ_h$, with finite tokens $\{(x_j,h(x_j))\}_{j=1}^N$ being its empirical approximations. We show that this framework elegantly models discretization refinement via convergence of measures and provides a natural setting for operator learning. Within this framework, we introduce function graph transformers, a graph-preserving subclass of measure-theoretic transformers that maps graph measures to graph measures, which is to say that outputs remain single-valued functions. Crucially, this additional structure does not reduce generality: we prove that the resulting graph-preserving maps can be approximated by finite compositions of standard softmax self-attention layers and pointwise MLPs, yielding universal approximation results for broad classes of nonlinear operators. Unlike existing theoretical approaches to operator learning with transformers, the measure-theoretic framework also accommodates regularized negative-order Sobolev inputs for which discretization invariance is particularly challenging, as well as query points on different output domains. Overall, function graph transformers provide a continuum viewpoint and mathematical toolkit for transformer-based operator learning, clarifying the roles of positional encodings, graph structure, regularization, and ensuring consistency across discretizations.

preprint2026arXiv

On Observation Time for Recovering Latent Hawkes Networks

Dynamics of interacting systems in engineering, society, and nature often evolve over latent networks that govern which entities can interact. We study the problem of inferring these networks from event-based observations, which arise naturally in finance, seismology, and neuroscience. While there is substantial algorithmic work addressing this important problem, theoretical results are scarce. In this paper we ask the following fundamental question: what is the minimum time that one must observe the dynamics in order to exactly recover the underlying network, as a function of the number $d$ of interacting entities? For a class of stationary Hawkes processes with sparse, weak interactions, we prove that an observation time of order $\log d$ is sufficient and necessary. For the upper bound we construct a two-stage estimator that uses clipped and binned event data for screening, followed by a least-squares refinement, and apply concentration bounds derived from the Poisson cluster representation. For the lower bound we combine Fano's inequality with Jacod's Girsanov formula for point processes on a suitable subclass of networks.

preprint2026arXiv

Training Infinitely Deep and Wide Transformers

Transformers have become the dominant architecture in modern machine learning, yet the theoretical understanding of their training dynamics remains limited. This paper develops a rigorous mathematical framework for analyzing gradient-based training of transformers in the mean-field regime, where both the depth (number of layers) and width (number of attention heads) tend to infinity. While ResNet training can be understood as controlling a neural ODE, transformer training corresponds to controlling a neural PDE, due to the coupling of multiple token distributions through the attention mechanism. Our mean-field model features two types of measure representations: token distributions evolving through layers and attention parameters at each layer. We establish well-posedness of the forward pass through infinitely deep transformers, characterizing token evolution via flow maps that satisfy ODEs in function spaces. Using adjoint sensitivity analysis, we derive an explicit formula for the conditional Wasserstein gradient of the training risk, involving adjoint variables governed by backward ODEs. We prove the existence and uniqueness of gradient flow curves in the conditional Wasserstein metric space, establishing a rigorous foundation for gradient-based transformer training. A key technical contribution is providing necessary and sufficient conditions for injectivity of the Neural Tangent Kernel (NTK) for attention mechanisms: we show that NTK injectivity is equivalent to linear independence of log-sum-exp functions modulo affine functions, a condition satisfied by diverse token distributions, including discrete distributions, uniform distributions, and Gaussian mixtures. Under this NTK injectivity assumption, we prove that gradient flow converges to global minima when the initial loss is sufficiently small, eliminating spurious local minima from the optimization landscape.

preprint2022arXiv

Deep learning architectures for nonlinear operator functions and nonlinear inverse problems

We develop a theoretical analysis for special neural network architectures, termed operator recurrent neural networks, for approximating nonlinear functions whose inputs are linear operators. Such functions commonly arise in solution algorithms for inverse boundary value problems. Traditional neural networks treat input data as vectors, and thus they do not effectively capture the multiplicative structure associated with the linear operators that correspond to the data in such inverse problems. We therefore introduce a new family that resembles a standard neural network architecture, but where the input data acts multiplicatively on vectors. Motivated by compact operators appearing in boundary control and the analysis of inverse boundary value problems for the wave equation, we promote structure and sparsity in selected weight matrices in the network. After describing this architecture, we study its representation properties as well as its approximation properties. We furthermore show that an explicit regularization can be introduced that can be derived from the mathematical analysis of the mentioned inverse problems, and which leads to certain guarantees on the generalization properties. We observe that the sparsity of the weight matrices improves the generalization estimates. Lastly, we discuss how operator recurrent networks can be viewed as a deep learning analogue to deterministic algorithms such as boundary control for reconstructing the unknown wavespeed in the acoustic wave equation from boundary measurements.

preprint2022arXiv

Recovery of piecewise smooth density and Lamé parameters from high-frequency exterior Cauchy data

We consider an isotropic elastic medium occupying a bounded domain D whose density and Lamé parameters are piecewise smooth. In the elastic wave initial value inverse problem, we are given the solution operator for the elastic wave equation, but only outside the domain D and only for initial data supported outside D, and we study the recovery of the density and Lamé parameters. For known density, results have recently been obtained using the scattering control method to recover wave speeds. Here, we extend this result to include the recovery of the density in addition to the Lamé parameters under certain geometric conditions using techniques from microlocal analysis and a connection to local tensor tomography.

preprint2022arXiv

Recovery of wave speeds and density of mass across a heterogeneous smooth interface from acoustic and elastic wave reflection operators

We revisit the problem of recovering wave speeds and density across a curved interface from reflected wave amplitudes. Such amplitudes have been exploited for decades in (exploration) seismology in this context. However, the analysis in seismology has been based on linearization and mostly flat interfaces. Here, we present a nonlinear analysis allowing curved interfaces, establish uniqueness and provide a reconstruction, while making the notion of amplitude precise through a procedure rooted in microlocal analysis.

preprint2022arXiv

The Cost-Accuracy Trade-Off In Operator Learning With Neural Networks

The term `surrogate modeling' in computational science and engineering refers to the development of computationally efficient approximations for expensive simulations, such as those arising from numerical solution of partial differential equations (PDEs). Surrogate modeling is an enabling methodology for many-query computations in science and engineering, which include iterative methods in optimization and sampling methods in uncertainty quantification. Over the last few years, several approaches to surrogate modeling for PDEs using neural networks have emerged, motivated by successes in using neural networks to approximate nonlinear maps in other areas. In principle, the relative merits of these different approaches can be evaluated by understanding, for each one, the cost required to achieve a given level of accuracy. However, the absence of a complete theory of approximation error for these approaches makes it difficult to assess this cost-accuracy trade-off. The purpose of the paper is to provide a careful numerical study of this issue, comparing a variety of different neural network architectures for operator approximation across a range of problems arising from PDE models in continuum mechanics.

preprint2021arXiv

A foliated and reversible Finsler manifold is determined by its broken scattering relation

The broken scattering relation consists of the total lengths of broken geodesics that start from the boundary, change direction once inside the manifold, and propagate to the boundary. We show that if two reversible Finsler manifolds satisfying a convex foliation condition have the same broken scattering relation, then they are isometric. This implies that some anisotropic material parameters of the Earth can be in principle reconstructed from single scattering measurements at the surface.

preprint2020arXiv

Full Reciprocity-Gap Waveform Inversion in the frequency domain, enabling sparse-source acquisition

The quantitative reconstruction of sub-surface Earth properties from the propagation of waves follows an iterative minimization of a misfit functional. In marine seismic exploration, the observed data usually consist of measurements of the pressure field but dual-sensor devices also provide the normal velocity. Consequently, a reciprocity-based misfit functional is specifically designed, and defines the Full Reciprocity-gap Waveform Inversion (FRgWI ) method. This misfit functional provides additional features compared to the more traditional least-squares approaches with, in particular, that the observational and computational acquisitions can be different. Therefore, the positions and wavelets of the sources from which the measurements are acquired are not needed in the reconstruction procedure and, in fact, the numerical acquisition (for the simulations) can be arbitrarily chosen. Based on three-dimensional experiments, FRgWI is shown to behave better than Full Waveform Inversion (FWI) in the same context. Then, it allows for arbitrary numerical acquisitions in two ways: when few measurements are given, a dense numerical acquisition (compared to the observational one) can be used to compensate. On the other hand, with a dense observational acquisition, a sparse computational one is shown to be sufficient, for instance with multiple-point sources, hence reducing the numerical cost. FRgWI displays accurate reconstructions in both situations and appears more robust with respect to cross-talk than the least-squares shot-stacking.

preprint2020arXiv

Generic uniqueness and stability for the mixed ray transform

We consider the mixed ray transform of tensor fields on a three-dimensional compact simple Riemannian manifold with boundary. We prove the injectivity of the transform, up to natural obstructions, and establish stability estimates for the normal operator on generic three dimensional simple manifold in the case of 1+1 and 2+2 tensors fields. We show how the anisotropic perturbations of averaged isotopic travel-times of qS-polarized elastic waves provide partial information about the mixed ray transform of 2+2 tensors fields. If in addition we include the measurement of the shear wave amplitude, the complete mixed ray transform can be recovered. We also show how one can obtain the mixed ray transform from an anisotropic perturbation of the Dirichlet-to-Neumann map of an isotropic elastic wave equation on a smooth and bounded domain in three dimensional Euclidean space.

preprint2019arXiv

A weight-adjusted discontinuous Galerkin method for the poroelastic wave equation: penalty fluxes and micro-heterogeneities

We introduce a high-order weight-adjusted discontinuous Galerkin (WADG) scheme for the numerical solution of three-dimensional (3D) wave propagation problems in anisotropic porous media. We use a coupled first-order symmetric stress-velocity formulation. Careful attention is directed at (a) the derivation of an energy-stable penalty-based numerical flux, which offers high-order accuracy in presence of material discontinuities, and (b) proper treatment of micro-heterogeneities (sub-element variations) in the numerical scheme. The use of a penalty-based numerical flux avoids the diagonalization of Jacobian matrices into polarized wave constituents necessary when solving element-wise Riemann problems. Micro-heterogeneities are accurately and stably incorporated in the numerical scheme using easily-invertible weight-adjusted mass matrices. The convergence of the proposed numerical scheme is proven and verified by using convergence studies against analytical plane wave solutions. The proposed method is also compared against an existing implementation using the spectral element method to solve the poroelastic wave equation.

preprint2018arXiv

Inverting the local geodesic ray transform of higher rank tensors

Consider a Riemannian manifold in dimension $n\geq 3$ with strictly convex boundary. We prove the local invertibility, up to potential fields, of the geodesic ray transform on tensor fields of rank four near a boundary point. This problem is closely related with elastic \textit{qP}-wave tomography. Under the condition that the manifold can be foliated with a continuous family of strictly convex hypersurfaces, the local invertibility implies a global result. One can straightforwardedly adapt the proof to show similar results for tensor fields of arbitrary rank.