Researcher profile

Haizhang Zhang

Haizhang Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - Emerging
16works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

16 published item(s)

preprint2023arXiv

Convergence of Deep ReLU Networks

We explore convergence of deep neural networks with the popular ReLU activation function, as the depth of the networks tends to infinity. To this end, we introduce the notion of activation domains and activation matrices of a ReLU network. By replacing applications of the ReLU activation function by multiplications with activation matrices on activation domains, we obtain an explicit expression of the ReLU network. We then identify the convergence of the ReLU networks as convergence of a class of infinite products of matrices. Sufficient and necessary conditions for convergence of these infinite products of matrices are studied. As a result, we establish necessary conditions for ReLU networks to converge that the sequence of weight matrices converges to the identity matrix and the sequence of the bias vectors converges to zero as the depth of ReLU networks increases to infinity. Moreover, we obtain sufficient conditions in terms of the weight matrices and bias vectors at hidden layers for pointwise convergence of deep ReLU networks. These results provide mathematical insights to the design strategy of the well-known deep residual networks in image classification.

preprint2022arXiv

Convergence Analysis of Deep Residual Networks

Various powerful deep neural network architectures have made great contribution to the exciting successes of deep learning in the past two decades. Among them, deep Residual Networks (ResNets) are of particular importance because they demonstrated great usefulness in computer vision by winning the first place in many deep learning competitions. Also, ResNets were the first class of neural networks in the development history of deep learning that are really deep. It is of mathematical interest and practical meaning to understand the convergence of deep ResNets. We aim at characterizing the convergence of deep ResNets as the depth tends to infinity in terms of the parameters of the networks. Toward this purpose, we first give a matrix-vector description of general deep neural networks with shortcut connections and formulate an explicit expression for the networks by using the notions of activation domains and activation matrices. The convergence is then reduced to the convergence of two series involving infinite products of non-square matrices. By studying the two series, we establish a sufficient condition for pointwise convergence of ResNets. Our result is able to give justification for the design of ResNets. We also conduct experiments on benchmark machine learning data to verify our results.

preprint2022arXiv

Convergence of Deep Convolutional Neural Networks

Convergence of deep neural networks as the depth of the networks tends to infinity is fundamental in building the mathematical foundation for deep learning. In a previous study, we investigated this question for deep ReLU networks with a fixed width. This does not cover the important convolutional neural networks where the widths are increasing from layer to layer. For this reason, we first study convergence of general ReLU networks with increasing widths and then apply the results obtained to deep convolutional neural networks. It turns out the convergence reduces to convergence of infinite products of matrices with increasing sizes, which has not been considered in the literature. We establish sufficient conditions for convergence of such infinite products of matrices. Based on the conditions, we present sufficient conditions for piecewise convergence of general deep ReLU networks with increasing widths, and as well as pointwise convergence of deep ReLU convolutional neural networks.

preprint2016arXiv

Convergence Analysis of the Gaussian Regularized Shannon Sampling Formula

We consider the reconstruction of a bandlimited function from its finite localized sample data. Truncating the classical Shannon sampling series results in an unsatisfactory convergence rate due to the slow decayness of the sinc function. To overcome this drawback, a simple and highly effective method, called the Gaussian regularization of the Shannon series, was proposed in the engineering and has received remarkable attention. It works by multiplying the sinc function in the Shannon series with a regularized Gaussian function. L. Qian (Proc. Amer. Math. Soc., 2003) established the convergence rate of $O(\sqrt{n}\exp(-\frac{π-δ}2n))$ for this method, where $δ<π$ is the bandwidth and $n$ is the number of sample data. C. Micchelli {\it et al.} (J. Complexity, 2009) proposed a different regularized method and obtained the corresponding convergence rate of $O(\frac1{\sqrt{n}}\exp(-\frac{π-δ}2n))$. This latter rate is by far the best among all regularized methods for the Shannon series. However, their regularized method involves the solving of a linear system and is implicit and more complicated. The main objective of this note is to show that the Gaussian regularization of the Shannon series can also achieve the same best convergence rate as that by C. Micchelli {\it et al}. We also show that the Gaussian regularization method can improve the convergence rate for the useful average sampling. Finally, the outstanding performance of numerical experiments justifies our results.

preprint2014arXiv

Existence of the Bedrosian Identity for Singular Integral Operators

The Hilbert transform $H$ satisfies the Bedrosian identity $H(fg)=fHg$ whenever the supports of the Fourier transforms of $f,g\in L^2(R)$ are respectively contained in $A=[-a,b]$ and $B=R\setminus(-b,a)$, $0\le a,b\le+\infty$. Attracted by this interesting result arising from the time-frequency analysis, we investigate the existence of such an identity for a general bounded singular integral operator on $L^2(R^d)$ and for general support sets $A$ and $B$. A geometric characterization of the support sets for the existence of the Bedrosian identity is established. Moreover, the support sets for the partial Hilbert transforms are all found. In particular, for the Hilbert transform to satisfy the Bedrosian identity, the support sets must be given as above.

preprint2014arXiv

Exponential Approximation of Bandlimited Functions from Average Oversampling

Weighted average sampling is more practical and numerically more stable than sampling at single points as in the classical Shannon sampling framework. Using the frame theory, one can completely reconstruct a bandlimited function from its suitably-chosen average sample data. When only finitely many sample data are available, truncating the complete reconstruction series with the standard dual frame results in very slow convergence. We present in this note a method of reconstructing a bandlimited function from finite average oversampling with an exponentially-decaying approximation error.

preprint2014arXiv

Exponential Approximation of Bandlimited Random Processes from Oversampling

The Shannon sampling theorem for bandlimited wide sense stationary random processes was established in 1957, which and its extensions to various random processes have been widely studied since then. However, truncation of the Shannon series suffers the drawback of slow convergence. Specifically, it is well-known that the mean-square approximation error of the truncated series at $n$ points sampled at the exact Nyquist rate is of the order $O(\frac1{\sqrt{n}})$. We consider the reconstruction of bandlimited random processes from finite oversampling points, namely, the distance between consecutive points is less than the Nyquist sampling rate. The optimal deterministic linear reconstruction method and the associated intrinsic approximation error are studied. It is found that one can achieve exponentially-decaying (but not faster) approximation errors from oversampling. Two practical reconstruction methods with exponential approximation ability are also presented.

preprint2014arXiv

Exponential Approximation of Multivariate Bandlimited Functions from Average Oversampling

Instead of sampling a function at a single point, average sampling takes the weighted sum of function values around the point. Such a sampling strategy is more practical and more stable. In this note, we present an explicit method with an exponentially-decaying approximation error to reconstruct a multivariate bandlimited function from its finite average oversampling data. The key problem in our analysis is how to extend a function so that its Fourier transform decays at an optimal rate to zero at infinity.

preprint2013arXiv

Universalities of Reproducing Kernels Revisited

Kernel methods have been widely applied to machine learning and other questions of approximating an unknown function from its finite sample data. To ensure arbitrary accuracy of such approximation, various denseness conditions are imposed on the selected kernel. This note contributes to the study of universal, characteristic, and $C_0$-universal kernels. We first give simple and direct description of the difference and relation among these three kinds of universalities of kernels. We then focus on translation-invariant and weighted polynomial kernels. A simple and shorter proof of the known characterization of characteristic translation-invariant kernels will be presented. The main purpose of the note is to give a delicate discussion on the universalities of weighted polynomial kernels.

preprint2012arXiv

Multidimensional Analytic Signals and the Bedrosian Identity

The analytic signal method via the Hilbert transform is a key tool in signal analysis and processing, especially in the time-frquency analysis. Imaging and other applications to multidimensional signals call for extension of the method to higher dimensions. We justify the usage of partial Hilbert transforms to define multidimensional analytic signals from both engineering and mathematical perspectives. The important associated Bedrosian identity $T(fg)=fTg$ for partial Hilbert transforms $T$ are then studied. Characterizations and several necessity theorems are established. We also make use of the identity to construct basis functions for the time-frequency analysis.

preprint2012arXiv

Optimal Sampling Points in Reproducing Kernel Hilbert Spaces

The recent developments of basis pursuit and compressed sensing seek to extract information from as few samples as possible. In such applications, since the number of samples is restricted, one should deploy the sampling points wisely. We are motivated to study the optimal distribution of finite sampling points. Formulation under the framework of optimal reconstruction yields a minimization problem. In the discrete case, we estimate the distance between the optimal subspace resulting from a general Karhunen-Loeve transform and the kernel space to obtain another algorithm that is computationally favorable. Numerical experiments are then presented to illustrate the performance of the algorithms for the searching of optimal sampling points.

preprint2012arXiv

Reproducing Kernel Banach Spaces with the l1 Norm

Targeting at sparse learning, we construct Banach spaces B of functions on an input space X with the properties that (1) B possesses an l1 norm in the sense that it is isometrically isomorphic to the Banach space of integrable functions on X with respect to the counting measure; (2) point evaluations are continuous linear functionals on B and are representable through a bilinear form with a kernel function; (3) regularized learning schemes on B satisfy the linear representer theorem. Examples of kernel functions admissible for the construction of such spaces are given.

preprint2012arXiv

Vector-valued Reproducing Kernel Banach Spaces with Applications to Multi-task Learning

Motivated by multi-task machine learning with Banach spaces, we propose the notion of vector-valued reproducing kernel Banach spaces (RKBS). Basic properties of the spaces and the associated reproducing kernels are investigated. We also present feature map constructions and several concrete examples of vector-valued RKBS. The theory is then applied to multi-task machine learning. Especially, the representer theorem and characterization equations for the minimizer of regularized learning schemes in vector-valued RKBS are established.

preprint2011arXiv

On the Inclusion Relation of Reproducing Kernel Hilbert Spaces

To help understand various reproducing kernels used in applied sciences, we investigate the inclusion relation of two reproducing kernel Hilbert spaces. Characterizations in terms of feature maps of the corresponding reproducing kernels are established. A full table of inclusion relations among widely-used translation invariant kernels is given. Concrete examples for Hilbert-Schmidt kernels are presented as well. We also discuss the preservation of such a relation under various operations of reproducing kernels. Finally, we briefly discuss the special inclusion with a norm equivalence.

preprint2011arXiv

Refinement of Operator-valued Reproducing Kernels

This paper studies the construction of a refinement kernel for a given operator-valued reproducing kernel such that the vector-valued reproducing kernel Hilbert space of the refinement kernel contains that of the given one as a subspace. The study is motivated from the need of updating the current operator-valued reproducing kernel in multi-task learning when underfitting or overfitting occurs. Numerical simulations confirm that the established refinement kernel method is able to meet this need. Various characterizations are provided based on feature maps and vector-valued integral representations of operator-valued reproducing kernels. Concrete examples of refining translation invariant and finite Hilbert-Schmidt operator-valued reproducing kernels are provided. Other examples include refinement of Hessian of scalar-valued translation-invariant kernels and transformation kernels. Existence and properties of operator-valued reproducing kernels preserved during the refinement process are also investigated.

preprint2011arXiv

Reproducing Kernel Banach Spaces with the l1 Norm II: Error Analysis for Regularized Least Square Regression

A typical approach in estimating the learning rate of a regularized learning scheme is to bound the approximation error by the sum of the sampling error, the hypothesis error and the regularization error. Using a reproducing kernel space that satisfies the linear representer theorem brings the advantage of discarding the hypothesis error from the sum automatically. Following this direction, we illustrate how reproducing kernel Banach spaces with the l1 norm can be applied to improve the learning rate estimate of l1-regularization in machine learning.