Researcher profile

Jinchao Xu

Jinchao Xu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2026arXiv

Solving High-Dimensional PDEs Using Linearized Neural Networks

Linearized shallow neural networks that are constructed by fixing the hidden-layer parameters have recently shown strong performance in solving partial differential equations (PDEs). Such models, widely used in the random feature method (RFM) and extreme learning machines (ELM), transform network training into a linear least-squares problem. In this paper, we conduct a numerical study of the variational (Galerkin) and collocation formulations for these linearized networks. Our numerical results reveal that, in the variational formulation, the associated linear systems are severely ill-conditioned, forming the primary computational bottleneck in scaling the neural network size, even when direct solvers are employed. In contrast, collocation methods combined with robust least-squares solvers exhibit better numerical stability and achieve higher accuracy as we increase neuron numbers. This behavior is consistently observed for both ReLU$^k$ and $\tanh$ activations, with $\tanh$ networks exhibiting even worse conditioning. Furthermore, we demonstrate that random sampling of the hidden layer parameters, commonly used in RFM and ELM, is not necessary for achieving high accuracy. For ReLU$^k$ activations, this follows from existing theory and is verified numerically in this paper, while for $\tanh$ activations, we introduce two deterministic schemes that achieve comparable accuracy.

preprint2022arXiv

A Priori Analysis of Stable Neural Network Solutions to Numerical PDEs

Methods for solving PDEs using neural networks have recently become a very important topic. We provide an a priori error analysis for such methods which is based on the $\mathcal{K}_1(\mathbb{D})$-norm of the solution. We show that the resulting constrained optimization problem can be efficiently solved using a greedy algorithm, which replaces stochastic gradient descent. Following this, we show that the error arising from discretizing the energy integrals is bounded both in the deterministic case, i.e. when using numerical quadrature, and also in the stochastic case, i.e. when sampling points to approximate the integrals. In the later case, we use a Rademacher complexity analysis, and in the former we use standard numerical quadrature bounds. This extends existing results to methods which use a general dictionary of functions to learn solutions to PDEs and importantly gives a consistent analysis which incorporates the optimization, approximation, and generalization aspects of the problem. In addition, the Rademacher complexity analysis is simplified and generalized, which enables application to a wide range of problems.

preprint2022arXiv

A sharp Korn's inequality for piecewise $H^1$ space and its application

In this paper, we revisit Korn's inequality for the piecewise $H^1$ space based on general polygonal or polyhedral decompositions of the domain. Our Korn's inequality is expressed with minimal jump terms. These minimal jump terms are identified by characterizing the restriction of rigid body mode to edge/face of the partitions. Such minimal jump conditions are shown to be sharp for achieving the Korn's inequality as well. The sharpness of our result and explicitly given minimal conditions can be used to test whether any given finite element spaces satisfy Korn's inequality, immediately as well as to build or modify nonconforming finite elements for Korn's inequality to hold.

preprint2022arXiv

Approximation Properties of Deep ReLU CNNs

This paper focuses on establishing $L^2$ approximation properties for deep ReLU convolutional neural networks (CNNs) in two-dimensional space. The analysis is based on a decomposition theorem for convolutional kernels with a large spatial size and multi-channels. Given the decomposition result, the property of the ReLU activation function, and a specific structure for channels, a universal approximation theorem of deep ReLU CNNs with classic structure is obtained by showing its connection with one-hidden-layer ReLU neural networks (NNs). Furthermore, approximation properties are obtained for one version of neural networks with ResNet, pre-act ResNet, and MgNet architecture based on connections between these networks.

preprint2022arXiv

Characterization of the Variation Spaces Corresponding to Shallow Neural Networks

We study the variation space corresponding to a dictionary of functions in $L^2(Ω)$ for a bounded domain $Ω\subset \mathbb{R}^d$. Specifically, we compare the variation space, which is defined in terms of a convex hull with related notions based on integral representations. This allows us to show that three important notions relating to the approximation theory of shallow neural networks, the Barron space, the spectral Barron space, and the Radon BV space, are actually variation spaces with respect to certain natural dictionaries.

preprint2022arXiv

Extended Regularized Dual Averaging Methods for Stochastic Optimization

We introduce a new algorithm, extended regularized dual averaging (XRDA), for solving regularized stochastic optimization problems, which generalizes the regularized dual averaging (RDA) method. The main novelty of the method is that it allows a flexible control of the backward step size. For instance, the backward step size used in RDA grows without bound, while for XRDA the backward step size can be kept bounded. We demonstrate experimentally that additional control over the backward step size can significantly improve the convergence rate of the algorithm while preserving desired properties of the iterates, such as sparsity. Theoretically, we show that the XRDA method achieves the same convergence rate as RDA for general convex objectives.

preprint2022arXiv

Optimal Convergence Rates for the Orthogonal Greedy Algorithm

We analyze the orthogonal greedy algorithm when applied to dictionaries $\mathbb{D}$ whose convex hull has small entropy. We show that if the metric entropy of the convex hull of $\mathbb{D}$ decays at a rate of $O(n^{-\frac{1}{2}-α})$ for $α> 0$, then the orthogonal greedy algorithm converges at the same rate on the variation space of $\mathbb{D}$. This improves upon the well-known $O(n^{-\frac{1}{2}})$ convergence rate of the orthogonal greedy algorithm in many cases, most notably for dictionaries corresponding to shallow neural networks. These results hold under no additional assumptions on the dictionary beyond the decay rate of the entropy of its convex hull. In addition, they are robust to noise in the target function and can be extended to convergence rates on the interpolation spaces of the variation norm. We show empirically that the predicted rates are obtained for the dictionary corresponding to shallow neural networks with Heaviside activation function in two dimensions. Finally, we show that these improved rates are sharp and prove a negative result showing that the iterates generated by the orthogonal greedy algorithm cannot in general be bounded in the variation norm of $\mathbb{D}$.

preprint2022arXiv

ReLU Deep Neural Networks from the Hierarchical Basis Perspective

We study ReLU deep neural networks (DNNs) by investigating their connections with the hierarchical basis method in finite element methods. First, we show that the approximation schemes of ReLU DNNs for $x^2$ and $xy$ are composition versions of the hierarchical basis approximation for these two functions. Based on this fact, we obtain a geometric interpretation and systematic proof for the approximation result of ReLU DNNs for polynomials, which plays an important role in a series of recent exponential approximation results of ReLU DNNs. Through our investigation of connections between ReLU DNNs and the hierarchical basis approximation for $x^2$ and $xy$, we show that ReLU DNNs with this special structure can be applied only to approximate quadratic functions. Furthermore, we obtain a concise representation to explicitly reproduce any linear finite element function on a two-dimensional uniform mesh by using ReLU DNNs with only two hidden layers.

preprint2021arXiv

Approximation Rates for Neural Networks with General Activation Functions

We prove some new results concerning the approximation rate of neural networks with general activation functions. Our first result concerns the rate of approximation of a two layer neural network with a polynomially-decaying non-sigmoidal activation function. We extend the dimension independent approximation rates previously obtained to this new class of activation functions. Our second result gives a weaker, but still dimension independent, approximation rate for a larger class of activation functions, removing the polynomial decay assumption. This result applies to any bounded, integrable activation function. Finally, we show that a stratified sampling approach can be used to improve the approximation rate for polynomially decaying activation functions under mild additional assumptions.

preprint2020arXiv

An Abstract Stabilization Method with Applications to Nonlinear Incompressible Elasticity

In this paper, we propose and analyze an abstract stabilized mixed finite element framework that can be applied to nonlinear incompressible elasticity problems. In the abstract stabilized framework, we prove that any mixed finite element method that satisfies the discrete inf-sup condition can be modified so that it is stable and optimal convergent as long as the mixed continuous problem is stable. Furthermore, we apply the abstract stabilized framework to nonlinear incompressible elasticity problems and present numerical experiments to verify the theoretical results.

preprint2020arXiv

Constrained Linear Data-feature Mapping for Image Classification

In this paper, we propose a constrained linear data-feature mapping model as an interpretable mathematical model for image classification using convolutional neural network (CNN) such as the ResNet. From this viewpoint, we establish the detailed connections in a technical level between the traditional iterative schemes for constrained linear system and the architecture for the basic blocks of ResNet. Under these connections, we propose some natural modifications of ResNet type models which will have less parameters but still maintain almost the same accuracy as these corresponding original models. Some numerical experiments are shown to demonstrate the validity of this constrained learning data-feature mapping assumption.

preprint2020arXiv

Robust block preconditioners for poroelasticity

In this paper we study the linear systems arising from discretized poroelasticity problems. We formulate one block preconditioner for the two-filed Biot model and several preconditioners for the classical three-filed Biot model under the unified relationship framework between well-posedness and preconditioners. By the unified theory, we show all the considered preconditioners are uniformly optimal with respect to material and discretization parameters. Numerical tests demonstrate the robustness of these preconditioners.

preprint2018arXiv

ReLU Deep Neural Networks and Linear Finite Elements

In this paper, we investigate the relationship between deep neural networks (DNN) with rectified linear unit (ReLU) function as the activation function and continuous piecewise linear (CPWL) functions, especially CPWL functions from the simplicial linear finite element method (FEM). We first consider the special case of FEM. By exploring the DNN representation of its nodal basis functions, we present a ReLU DNN representation of CPWL in FEM. We theoretically establish that at least $2$ hidden layers are needed in a ReLU DNN to represent any linear finite element functions in $Ω\subseteq \mathbb{R}^d$ when $d\ge2$. Consequently, for $d=2,3$ which are often encountered in scientific and engineering computing, the minimal number of two hidden layers are necessary and sufficient for any CPWL function to be represented by a ReLU DNN. Then we include a detailed account on how a general CPWL in $\mathbb R^d$ can be represented by a ReLU DNN with at most $\lceil\log_2(d+1)\rceil$ hidden layers and we also give an estimation of the number of neurons in DNN that are needed in such a representation. Furthermore, using the relationship between DNN and FEM, we theoretically argue that a special class of DNN models with low bit-width are still expected to have an adequate representation power in applications. Finally, as a proof of concept, we present some numerical results for using ReLU DNNs to solve a two point boundary problem to demonstrate the potential of applying DNN for numerical solution of partial differential equations.