Researcher profile

Takashi Furuya

Takashi Furuya contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

Approximation of Maximally Monotone Operators : A Graph Convergence Perspective

Operator learning has been highly successful for continuous mappings between infinite-dimensional spaces, such as PDE solution operators. However, many operators of interest-including differential operators-are discontinuous or set-valued, and lie outside classical approximation frameworks. We propose a paradigm shift by formulating approximation via graph convergence (Painlevé-Kuratowski convergence), which is well-suited for closed operators. We show that uniform and $L^p$ approximation are fundamentally inadequate in this setting. Focusing on maximally monotone operators, we prove that any such operator can be approximated in the sense of local graph convergence by continuous encoder-decoder architectures, and further construct structure-preserving approximations that retain maximal monotonicity via resolvent-based parameterizations.

preprint2026arXiv

Approximation Theory of Laplacian-Based Neural Operators for Reaction-Diffusion System

Neural operators provide a framework for learning solution operators of partial differential equations (PDEs), enabling efficient surrogate modeling for complex systems. While universal approximation results are now well understood, approximation analysis specific to nonlinear reaction-diffusion systems remains limited. In this paper, we study neural operators applied to the solution mapping from initial conditions to time-dependent solutions of a generalized Gierer-Meinhardt reaction-diffusion system, a prototypical model of nonlinear pattern formation. Our main results establish explicit approximation error bounds in terms of network depth, width, and spectral rank by exploiting the Laplacian spectral representation of the Green's function underlying the PDE. We show that the required parameter complexity grows at most polynomially with respect to the target accuracy, demonstrating that Laplacian eigenfunction-based neural operator architectures alleviate the curse of parametric complexity encountered in generic operator learning. Numerical experiments on the Gierer-Meinhardt system support the theoretical findings.

preprint2026arXiv

Function graph transformers universally approximate operators between function spaces

We study the approximation of nonlinear operators between function spaces by transformers. Our approach is to lift functions to measures supported on their graphs and leverage a recently introduced measure-theoretic view of transformers. A function $h$ is represented by its graph measure $γ_h$, with finite tokens $\{(x_j,h(x_j))\}_{j=1}^N$ being its empirical approximations. We show that this framework elegantly models discretization refinement via convergence of measures and provides a natural setting for operator learning. Within this framework, we introduce function graph transformers, a graph-preserving subclass of measure-theoretic transformers that maps graph measures to graph measures, which is to say that outputs remain single-valued functions. Crucially, this additional structure does not reduce generality: we prove that the resulting graph-preserving maps can be approximated by finite compositions of standard softmax self-attention layers and pointwise MLPs, yielding universal approximation results for broad classes of nonlinear operators. Unlike existing theoretical approaches to operator learning with transformers, the measure-theoretic framework also accommodates regularized negative-order Sobolev inputs for which discretization invariance is particularly challenging, as well as query points on different output domains. Overall, function graph transformers provide a continuum viewpoint and mathematical toolkit for transformer-based operator learning, clarifying the roles of positional encodings, graph structure, regularization, and ensuring consistency across discretizations.

preprint2026arXiv

Training Infinitely Deep and Wide Transformers

Transformers have become the dominant architecture in modern machine learning, yet the theoretical understanding of their training dynamics remains limited. This paper develops a rigorous mathematical framework for analyzing gradient-based training of transformers in the mean-field regime, where both the depth (number of layers) and width (number of attention heads) tend to infinity. While ResNet training can be understood as controlling a neural ODE, transformer training corresponds to controlling a neural PDE, due to the coupling of multiple token distributions through the attention mechanism. Our mean-field model features two types of measure representations: token distributions evolving through layers and attention parameters at each layer. We establish well-posedness of the forward pass through infinitely deep transformers, characterizing token evolution via flow maps that satisfy ODEs in function spaces. Using adjoint sensitivity analysis, we derive an explicit formula for the conditional Wasserstein gradient of the training risk, involving adjoint variables governed by backward ODEs. We prove the existence and uniqueness of gradient flow curves in the conditional Wasserstein metric space, establishing a rigorous foundation for gradient-based transformer training. A key technical contribution is providing necessary and sufficient conditions for injectivity of the Neural Tangent Kernel (NTK) for attention mechanisms: we show that NTK injectivity is equivalent to linear independence of log-sum-exp functions modulo affine functions, a condition satisfied by diverse token distributions, including discrete distributions, uniform distributions, and Gaussian mixtures. Under this NTK injectivity assumption, we prove that gradient flow converges to global minima when the initial loss is sufficiently small, eliminating spurious local minima from the optimization landscape.

preprint2022arXiv

Inverse medium scattering problems with Kalman filter techniques

We study the inverse medium scattering problem to reconstruct the unknown inhomogeneous medium from the far-field patterns of scattered waves. The inverse scattering problem is generally ill-posed and nonlinear, and the iterative optimization method is often adapted. A natural iterative approach to this problem is to place all available measurements and mappings into one long vector and mapping, respectively, and to iteratively solve the linearized large system equation using the Tikhonov regularization method, which is called the Levenberg-Marquardt scheme. However, this is computationally expensive because we must construct the larger system equations when the number of available measurements increases. In this paper, we propose two reconstruction algorithms based on the Kalman filter. One is the algorithm equivalent to the Levenberg-Marquardt scheme, and the other is inspired by the Extended Kalman Filter. For the algorithm derivation, we iteratively apply the Kalman filter to the linearized equation for our nonlinear equation. Our proposed algorithms sequentially update the state and the weight of the norm for the state space, which avoids the construction of a large system equation and retains the information of past updates. Finally, we provide numerical examples to demonstrate our proposed algorithms.