Source author record

Shijun Zhang

Shijun Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.NA Numerical Analysis

Catalog footprint

What is connected

2works

3topics

2close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Deep Network Approximation in Terms of Intrinsic Parameters

One of the arguments to explain the success of deep learning is the powerful approximation capacity of deep neural networks. Such capacity is generally accompanied by the explosive growth of the number of parameters, which, in turn, leads to high computational costs. It is of great interest to ask whether we can achieve successful deep learning with a small number of learnable parameters adapting to the target function. From an approximation perspective, this paper shows that the number of parameters that need to be learned can be significantly smaller than people typically expect. First, we theoretically design ReLU networks with a few learnable parameters to achieve an attractive approximation. We prove by construction that, for any Lipschitz continuous function $f$ on $[0,1]^d$ with a Lipschitz constant $λ>0$, a ReLU network with $n+2$ intrinsic parameters (those depending on $f$) can approximate $f$ with an exponentially small error $5λ\sqrt{d}\,2^{-n}$. Such a result is generalized to generic continuous functions. Furthermore, we show that the idea of learning a small number of parameters to achieve a good approximation can be numerically observed. We conduct several experiments to verify that training a small part of parameters can also achieve good results for classification problems if other parameters are pre-specified or pre-trained from a related problem.

preprint2021arXiv

Deep Network Approximation Characterized by Number of Neurons

This paper quantitatively characterizes the approximation power of deep feed-forward neural networks (FNNs) in terms of the number of neurons. It is shown by construction that ReLU FNNs with width $\mathcal{O}\big(\max\{d\lfloor N^{1/d}\rfloor,\, N+1\}\big)$ and depth $\mathcal{O}(L)$ can approximate an arbitrary Hölder continuous function of order $α\in (0,1]$ on $[0,1]^d$ with a nearly tight approximation rate $\mathcal{O}\big(\sqrt{d} N^{-2α/d}L^{-2α/d}\big)$ measured in $L^p$-norm for any $N,L\in \mathbb{N}^+$ and $p\in[1,\infty]$. More generally for an arbitrary continuous function $f$ on $[0,1]^d$ with a modulus of continuity $ω_f(\cdot)$, the constructive approximation rate is $\mathcal{O}\big(\sqrt{d}\,ω_f( N^{-2/d}L^{-2/d})\big)$. We also extend our analysis to $f$ on irregular domains or those localized in an $\varepsilon$-neighborhood of a $d_{\mathcal{M}}$-dimensional smooth manifold $\mathcal{M}\subseteq [0,1]^d$ with $d_{\mathcal{M}}\ll d$. Especially, in the case of an essentially low-dimensional domain, we show an approximation rate $\mathcal{O}\big(ω_f(\tfrac{\varepsilon}{1-δ}\sqrt{\tfrac{d}{d_δ}}+\varepsilon)+\sqrt{d}\,ω_f(\tfrac{\sqrt{d}}{(1-δ)\sqrt{d_δ}}N^{-2/d_δ}L^{-2/d_δ})\big)$ for ReLU FNNs to approximate $f$ in the $\varepsilon$-neighborhood, where $d_δ=\mathcal{O}\big(d_{\mathcal{M}}\tfrac{\ln (d/δ)}{δ^2}\big)$ for any $δ\in(0,1)$ as a relative error for a projection to approximate an isometry when projecting $\mathcal{M}$ to a $d_δ$-dimensional domain.