Source author record

Jerry Zhijian Yang

Jerry Zhijian Yang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.NA Numerical Analysis physics.comp-ph math.OC Neural and Evolutionary Computing

Catalog footprint

What is connected

6works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Approximation Error Upper and Lower Bounds for Hölder Class with Transformers

We explore the expressive power of Transformers by establishing precise approximation error upper and lower bounds for Hölder class. Specifically, a new approximation upper bound is derived for the standard Transformer architecture equipped with Softmax operators, ReLU activation functions, and residual connections. We prove that a Transformer network composed of at most $\mathcal{O}(\varepsilon^{-{d_{0}}/α})$ blocks can approximate any bounded Hölder function with $d_{0}$-dimensional input and smoothness $α\in(0,1]$ under any accuracy $\varepsilon>0$. In the case of approximation lower bounds, leveraging the VC-dimension upper bound, we are the first to rigorously prove that Transformers demand for at least $Ω(\varepsilon^{-{d_{0}}/({4α})})$ blocks to achieve the $\varepsilon$ approximation accuracy. As a final step, we extend the derived results for standard Transformers to a general regression task and establish the corresponding excess risk rates demonstrating Transformers' empirical effectiveness in real-world settings.

preprint2022arXiv

A rate of convergence of Physics Informed Neural Networks for the linear second order elliptic PDEs

In recent years, physical informed neural networks (PINNs) have been shown to be a powerful tool for solving PDEs empirically. However, numerical analysis of PINNs is still missing. In this paper, we prove the convergence rate to PINNs for the second order elliptic equations with Dirichlet boundary condition, by establishing the upper bounds on the number of training samples, depth and width of the deep neural networks to achieve desired accuracy. The error of PINNs is decomposed into approximation error and statistical error, where the approximation error is given in $C^2$ norm with $\mathrm{ReLU}^{3}$ networks (deep network with activations function $\max\{0,x^3\}$) and the statistical error is estimated by Rademacher complexity. We derive the bound on the Rademacher complexity of the non-Lipschitz composition of gradient norm with $\mathrm{ReLU}^{3}$ network, which is of immense independent interest.

preprint2022arXiv

Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse of Dimensionality in Approximation on Hölder Class

In this paper, we construct neural networks with ReLU, sine and $2^x$ as activation functions. For general continuous $f$ defined on $[0,1]^d$ with continuity modulus $ω_f(\cdot)$, we construct ReLU-sine-$2^x$ networks that enjoy an approximation rate $\mathcal{O}(ω_f(\sqrt{d})\cdot2^{-M}+ω_{f}\left(\frac{\sqrt{d}}{N}\right))$, where $M,N\in \mathbb{N}^{+}$ denote the hyperparameters related to widths of the networks. As a consequence, we can construct ReLU-sine-$2^x$ network with the depth $5$ and width $\max\left\{\left\lceil2d^{3/2}\left(\frac{3μ}ε\right)^{1/α}\right\rceil,2\left\lceil\log_2\frac{3μd^{α/2}}{2ε}\right\rceil+2\right\}$ that approximates $f\in \mathcal{H}_μ^α([0,1]^d)$ within a given tolerance $ε>0$ measured in $L^p$ norm $p\in[1,\infty)$, where $\mathcal{H}_μ^α([0,1]^d)$ denotes the Hölder continuous function class defined on $[0,1]^d$ with order $α\in (0,1]$ and constant $μ> 0$. Therefore, the ReLU-sine-$2^x$ networks overcome the curse of dimensionality on $\mathcal{H}_μ^α([0,1]^d)$. In addition to its supper expressive power, functions implemented by ReLU-sine-$2^x$ networks are (generalized) differentiable, enabling us to apply SGD to train.

preprint2022arXiv

Global Optimization via Schr{ö}dinger-F{ö}llmer Diffusion

We study the problem of finding global minimizers of $V(x):\mathbb{R}^d\rightarrow\mathbb{R}$ approximately via sampling from a probability distribution $μ_σ$ with density $p_σ(x)=\dfrac{\exp(-V(x)/σ)}{\int_{\mathbb R^d} \exp(-V(y)/σ) dy }$ with respect to the Lebesgue measure for $σ\in (0,1]$ small enough. We analyze a sampler based on the Euler-Maruyama discretization of the Schr{ö}dinger-F{ö}llmer diffusion processes with stochastic approximation under appropriate assumptions on the step size $s$ and the potential $V$. We prove that the output of the proposed sampler is an approximate global minimizer of $V(x)$ with high probability at cost of sampling $\mathcal{O}(d^{3})$ standard normal random variables. Numerical studies illustrate the effectiveness of the proposed method and its superiority to the Langevin method.

preprint2014arXiv

Calculation of Cauchy stress tensor in molecular dynamics system with a generalized Irving-Kirkwood formulism

Irving and Kirkwood formulism (IK formulism) provides a way to compute continuum mechanics quantities at certain location in terms of molecular variables. To make the approach more practical in computer simulation, Hardy proposed to use a spacial kernel function that couples continuum quantities with atomistic information. To reduce irrational fluctuations, Murdoch proposed to use a temporal kernel function to smooth the physical quantities obtained in Hardy's approach. In this paper, we generalize the original IK formulism to systematically incorporate both spacial and temporal average. The Cauchy stress tensor is derived in this generalized IK formulism (g-IK formulism). Analysis is given to illuminate the connection and difference between g-IK formulism and traditional temporal post-process approach. The relationship between Cauchy stress and first Piola-Kirchhoff stress is restudied in the framework of g-IK formulism. Numerical experiments using molecular dynamics are conducted to examine the analysis results.

preprint2013arXiv

Accurate Evaluations of Strain and Stress in Atomistic Simulations of Crystalline Solids

In this paper, we study the accuracy of Irving-Kirkwood type of formulas for the approximation of continuum quantities from atomistic simulations. Such formulas are derived by expressing the displacement, deformation gradient and stress in terms of certain kernel functions. We propose two criteria for choosing the kernel functions to significantly improve the sampling accuracy. We present a simple procedure to construct kernel functions that meet these criteria. Further, numerical tests on homogeneous and non-homogeneous systems provide validations for our analysis.

Jerry Zhijian Yang

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Approximation Error Upper and Lower Bounds for Hölder Class with Transformers

A rate of convergence of Physics Informed Neural Networks for the linear second order elliptic PDEs

Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse of Dimensionality in Approximation on Hölder Class

Global Optimization via Schr{ö}dinger-F{ö}llmer Diffusion

Calculation of Cauchy stress tensor in molecular dynamics system with a generalized Irving-Kirkwood formulism

Accurate Evaluations of Strain and Stress in Atomistic Simulations of Crystalline Solids