Researcher profile

Yuesheng Xu

Yuesheng Xu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
15works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

15 published item(s)

preprint2026arXiv

Adaptive Multi-Grade Deep Learning for Highly Oscillatory Fredholm Integral Equations of the Second Kind

This paper studies the use of Multi-Grade Deep Learning (MGDL) for solving highly oscillatory Fredholm integral equations of the second kind. We provide rigorous error analyses of continuous and discrete MGDL models, showing that the discrete model retains the convergence and stability of its continuous counterpart under sufficiently small quadrature error. We identify the DNN training error as the primary source of approximation error, motivating a novel adaptive MGDL algorithm that selects the network grade based on training performance. Numerical experiments with highly oscillatory (including wavenumber 500) and singular solutions confirm the accuracy, effectiveness and robustness of the proposed approach.

preprint2023arXiv

Convergence of Deep ReLU Networks

We explore convergence of deep neural networks with the popular ReLU activation function, as the depth of the networks tends to infinity. To this end, we introduce the notion of activation domains and activation matrices of a ReLU network. By replacing applications of the ReLU activation function by multiplications with activation matrices on activation domains, we obtain an explicit expression of the ReLU network. We then identify the convergence of the ReLU networks as convergence of a class of infinite products of matrices. Sufficient and necessary conditions for convergence of these infinite products of matrices are studied. As a result, we establish necessary conditions for ReLU networks to converge that the sequence of weight matrices converges to the identity matrix and the sequence of the bias vectors converges to zero as the depth of ReLU networks increases to infinity. Moreover, we obtain sufficient conditions in terms of the weight matrices and bias vectors at hidden layers for pointwise convergence of deep ReLU networks. These results provide mathematical insights to the design strategy of the well-known deep residual networks in image classification.

preprint2022arXiv

A Fast Convergent Ordered-Subsets Algorithm with Subiteration-Dependent Preconditioners for PET Image Reconstruction

We investigated the imaging performance of a fast convergent ordered-subsets algorithm with subiteration-dependent preconditioners (SDPs) for positron emission tomography (PET) image reconstruction. In particular, we considered the use of SDP with the block sequential regularized expectation maximization (BSREM) approach with the relative difference prior (RDP) regularizer due to its prior clinical adaptation by vendors. Because the RDP regularization promotes smoothness in the reconstructed image, the directions of the gradients in smooth areas more accurately point toward the objective function's minimizer than those in variable areas. Motivated by this observation, two SDPs have been designed to increase iteration step-sizes in the smooth areas and reduce iteration step-sizes in the variable areas relative to a conventional expectation maximization preconditioner. The momentum technique used for convergence acceleration can be viewed as a special case of SDP. We have proved the global convergence of SDP-BSREM algorithms by assuming certain characteristics of the preconditioner. By means of numerical experiments using both simulated and clinical PET data, we have shown that the SDP-BSREM algorithms substantially improve the convergence rate, as compared to conventional BSREM and a vendor's implementation as Q.Clear. Specifically, SDP-BSREM algorithms converge 35\%-50\% faster in reaching the same objective function value than conventional BSREM and commercial Q.Clear algorithms. Moreover, we showed in phantoms with hot, cold and background regions that the SDP-BSREM algorithms approached the values of a highly converged reference image faster than conventional BSREM and commercial Q.Clear algorithms.

preprint2022arXiv

Convergence of Deep Convolutional Neural Networks

Convergence of deep neural networks as the depth of the networks tends to infinity is fundamental in building the mathematical foundation for deep learning. In a previous study, we investigated this question for deep ReLU networks with a fixed width. This does not cover the important convolutional neural networks where the widths are increasing from layer to layer. For this reason, we first study convergence of general ReLU networks with increasing widths and then apply the results obtained to deep convolutional neural networks. It turns out the convergence reduces to convergence of infinite products of matrices with increasing sizes, which has not been considered in the literature. We establish sufficient conditions for convergence of such infinite products of matrices. Based on the conditions, we present sufficient conditions for piecewise convergence of general deep ReLU networks with increasing widths, and as well as pointwise convergence of deep ReLU convolutional neural networks.

preprint2022arXiv

Inverting Incomplete Fourier Transforms by a Sparse Regularization Model and Applications in Seismic Wavefield Modeling

We propose a sparse regularization model for inversion of incomplete Fourier transforms and apply it to seismic wavefield modeling. The objective function of the proposed model employs the Moreau envelope of the $\ell_0$ norm under a tight framelet system as a regularization to promote sparsity. This model leads to a non-smooth, non-convex optimization problem for which traditional iteration schemes are inefficient or even divergent. By exploiting special structures of the $\ell_0$ norm, we identify a local minimizer of the proposed non-convex optimization problem with a global minimizer of a convex optimization problem, which provides us insights for the development of efficient and convergence guaranteed algorithms to solve it. We characterize the solution of the regularization model in terms of a fixed-point of a map defined by the proximity operator of the $\ell_0$ norm and develop a fixed-point iteration algorithm to solve it. By connecting the map with an $α$-averaged nonexpansive operator, we prove that the sequence generated by the proposed fixed-point proximity algorithm converges to a local minimizer of the proposed model. Our numerical examples confirm that the proposed model outperforms significantly the existing model based on the $\ell_1$-norm. The seismic wavefield modeling in the frequency domain requires solving a series of the Helmholtz equation with large wave numbers, which is a computationally intensive task. Applying the proposed sparse regularization model to the seismic wavefield modeling requires data of only a few low frequencies, avoiding solving the Helmholtz equation with large wave numbers. Numerical results show that the proposed method performs better than the existing method based on the $\ell_1$ norm in terms of the SNR values and visual quality of the restored synthetic seismograms.

preprint2022arXiv

Parameter Choices for Sparse Regularization with the $\ell_1$ Norm

We consider a regularization problem whose objective function consists of a convex fidelity term and a regularization term determined by the $\ell_1$ norm composed with a linear transform. Empirical results show that the regularization with the $\ell_1$ norm can promote sparsity of a regularized solution. It is the goal of this paper to understand theoretically the effect of the regularization parameter on the sparsity of the regularized solutions. We establish a characterization of the sparsity under the transform matrix of the solution. When the fidelity term has a special structure and the transform matrix coincides with a identity matrix, the resulting characterization can be taken as a regularization parameter choice strategy with which the regularization problem has a solution having a sparsity of a certain level. We study choices of the regularization parameter so that the regularization term alleviates the ill-posedness and promote sparsity of the resulting regularized solution. Numerical experiments demonstrate that choices of the regularization parameters can balance the sparsity of the solutions of the regularization problem and its approximation to the minimizer of the fidelity function.

preprint2022arXiv

Sparse Deep Neural Network for Nonlinear Partial Differential Equations

More competent learning models are demanded for data processing due to increasingly greater amounts of data available in applications. Data that we encounter often have certain embedded sparsity structures. That is, if they are represented in an appropriate basis, their energies can concentrate on a small number of basis functions. This paper is devoted to a numerical study of adaptive approximation of solutions of nonlinear partial differential equations whose solutions may have singularities, by deep neural networks (DNNs) with a sparse regularization with multiple parameters. Noting that DNNs have an intrinsic multi-scale structure which is favorable for adaptive representation of functions, by employing a penalty with multiple parameters, we develop DNNs with a multi-scale sparse regularization (SDNN) for effectively representing functions having certain singularities. We then apply the proposed SDNN to numerical solutions of the Burgers equation and the Schrödinger equation. Numerical examples confirm that solutions generated by the proposed SDNN are sparse and accurate.

preprint2020arXiv

Multiplicative Noise Removal: Nonlocal Low-Rank Model and Its Proximal Alternating Reweighted Minimization Algorithm

The goal of this paper is to develop a novel numerical method for efficient multiplicative noise removal. The nonlocal self-similarity of natural images implies that the matrices formed by their nonlocal similar patches are low-rank. By exploiting this low-rank prior with application to multiplicative noise removal, we propose a nonlocal low-rank model for this task and develop a proximal alternating reweighted minimization (PARM) algorithm to solve the optimization problem resulting from the model. Specifically, we utilize a generalized nonconvex surrogate of the rank function to regularize the patch matrices and develop a new nonlocal low-rank model, which is a nonconvex nonsmooth optimization problem having a patchwise data fidelity and a generalized nonlocal low-rank regularization term. To solve this optimization problem, we propose the PARM algorithm, which has a proximal alternating scheme with a reweighted approximation of its subproblem. A theoretical analysis of the proposed PARM algorithm is conducted to guarantee its global convergence to a critical point. Numerical experiments demonstrate that the proposed method for multiplicative noise removal significantly outperforms existing methods such as the benchmark SAR-BM3D method in terms of the visual quality of the denoised images, and the PSNR (the peak-signal-to-noise ratio) and SSIM (the structural similarity index measure) values.

preprint2020arXiv

Representer Theorems in Banach Spaces: Minimum Norm Interpolation, Regularized Learning and Semi-Discrete Inverse Problems

Constructing or learning a function from a finite number of sampled data points (measurements) is a fundamental problem in science and engineering. This is often formulated as a minimum norm interpolation problem, regularized learning problem or, in general, a semi-discrete inverse problem, in certain functional spaces. The choice of an appropriate space is crucial for solutions of these problems. Motivated by sparse representations of the reconstructed functions such as compressed sensing and sparse learning, much of the recent research interest has been directed to considering these problems in certain Banach spaces in order to obtain their sparse solutions, which is a feasible approach to overcome challenges coming from the big data nature of most practical applications. It is the goal of this paper to provide a systematic study of the representer theorems for these problems in Banach spaces. There are a few existing results for these problems in a Banach space, with all of them regarding implicit representer theorems. We aim at obtaining explicit representer theorems based on which convenient solution methods will then be developed. For the minimum norm interpolation, the explicit representer theorems enable us to express the infimum in terms of the norm of the linear combination of the interpolation functionals. For the purpose of developing efficient computational algorithms, we establish the fixed-point equation formulation of solutions of these problems. We reveal that unlike in a Hilbert space, in general, solutions of these problems in a Banach space may not be able to be reduced to truly finite dimensional problems (with certain infinite dimensional components hidden). We demonstrate how this obstacle can be removed, reducing the original problem to a truly finite dimensional one, in the special case when the Banach space is $\ell_1(\mathbb{N})$.

preprint2016arXiv

Two-step Fixed-point proximity algorithms for multi-block separable convex problems

Multi-block separable convex problems recently received considerable attention. This class of optimization problems minimizes a separable convex objective function with linear constraints. The algorithmic challenges come from the fact that the classic alternating direction method of multipliers (ADMM) for the problem is not necessarily convergent. However, it is observed that ADMM outperforms numerically many of its variants with guaranteed theoretical convergence. The goal of this paper is to develop convergent and computationally efficient algorithms for solving multi-block separable convex problems. We first characterize the solutions of the optimization problems by proximity operators of the convex functions involved in their objective function. We then design a two-step fixed-point iterative scheme for solving these problems based on the characterization. We further prove convergence of the iterative scheme and show that it has O(1/k) convergence rate in the ergodic sense and the sense of the partial primal-dual gap, where k denotes the iteration number. Moreover, we derive specific two-step fixed-point proximity algorithms (2SFPPA) from the proposed iterative scheme and establish their global convergence. Numerical experiments for solving the sparse MRI problem demonstrate the numerical efficiency of the proposed 2SFPPA.

preprint2015arXiv

Computing Highly Oscillatory Integrals

We develop two classes of composite moment-free numerical quadratures for computing highly oscillatory integrals having integrable singularities and stationary points. The first class of the quadrature rules has a polynomial order of convergence and the second class has an exponential order of convergence. We first modify the moment-free Filon-type method for the oscillatory integrals without a singularity or a stationary point to accelerate their convergence. We then consider the oscillatory integrals without a singularity or a stationary point and then those with singularities and stationary points. The composite quadrature rules are developed based on partitioning the integration domain according to the wave number and the singularity of the integrand. The integral defined on a subinterval has either a weak singularity without rapid oscillation or oscillation without a singularity. The classical quadrature rules for weakly singular integrals using graded points are employed for the singular integral without rapid oscillation and the modified moment-free Filon-type method is used for the oscillatory integrals without a singularity. Unlike the existing methods, the proposed methods do not have to compute the inverse of the oscillator. Numerical experiments are presented to demonstrate the approximation accuracy and the computational efficiency of the proposed methods. Numerical results show that the proposed methods outperform methods published most recently.

preprint2015arXiv

Operator Reproducing Kernel Hilbert Spaces

Motivated by the need of processing functional-valued data, or more general, operatorvalued data, we introduce the notion of the operator reproducing kernel Hilbert space (ORKHS). This space admits a unique operator reproducing kernel which reproduces a family of continuous linear operators on the space. The theory of ORKHSs and the associated operator reproducing kernels are established. A special class of ORKHSs, known as the perfect ORKHSs, are studied, which reproduce the family of the standard point-evaluation operators and at the same time another different family of continuous linear operators. The perfect ORKHSs are characterized in terms of features, especially for those with respect to integral operators. In particular, several specific examples of the perfect ORKHSs are presented. We apply the theory of ORKHSs to sampling and regularized learning, where operator-valued data are considered. Specifically, a general complete reconstruction formula from linear operators values is established in the framework of ORKHSs. The average sampling and the reconstruction of vector-valued functions are considered in specific ORKHSs. We also investigate in the ORKHSs setting the regularized learning schemes, which learn a target element from operator-valued data. The desired representer theorems of the learning problems are established to demonstrate the key roles played by the ORKHSs and the operator reproducing kernels in machine learning from operator-valued data. We finally point out that the continuity of linear operators, used to obtain the operator-valued data, on an ORKHS is necessary for the stability of the numerical reconstruction algorithm using the resulting data.

preprint2015arXiv

Oscillation Preserving Galerkin Methods for Fredholm Integral Equations of the Second Kind with Oscillatory Kernels

Solutions of Fredholm integral equations of the second kind with oscillatory kernels likely exhibit oscillation. Standard numerical methods applied to solving equations of this type have poor numerical performance due to the influence of the highly rapid oscillation in the solutions. Understanding of the oscillation of the solutions is still inadequate in the literature and thus it requires further investigation. For this purpose, we introduce a notion to describe the degree of oscillation of an oscillatory function based on the dependence of its norm in a certain function space on the wavenumber. Based on this new notion, we construct structured oscillatory spaces with oscillatory structures. The structured spaces with a specific oscillatory structure can capture the oscillatory components of the solutions of Fredholm integral equations with oscillatory kernels. We then further propose oscillation preserving Galerkin methods for solving the equations by incorporating the standard approximation subspace of spline functions with a finite number of oscillatory functions which capture the oscillation of the exact solutions of the integral equations. We prove that the proposed methods have the optimal convergence order uniformly with respect to the wavenumber and they are numerically stable. A numerical example is presented to confirm the theoretical estimates.

preprint2011arXiv

Efficient First Order Methods for Linear Composite Regularizers

A wide class of regularization problems in machine learning and statistics employ a regularization term which is obtained by composing a simple convex function ωwith a linear transformation. This setting includes Group Lasso methods, the Fused Lasso and other total variation methods, multi-task learning methods and many more. In this paper, we present a general approach for computing the proximity operator of this class of regularizers, under the assumption that the proximity operator of the function ωis known in advance. Our approach builds on a recent line of research on optimal first order optimization methods and uses fixed point iterations for numerically computing the proximity operator. It is more general than current approaches and, as we show with numerical simulations, computationally more efficient than available first order methods which do not achieve the optimal rate. In particular, our method outperforms state of the art O(1/T) methods for overlapping Group Lasso and matches optimal O(1/T^2) methods for the Fused Lasso and tree structured Group Lasso.

preprint2011arXiv

Refinement of Operator-valued Reproducing Kernels

This paper studies the construction of a refinement kernel for a given operator-valued reproducing kernel such that the vector-valued reproducing kernel Hilbert space of the refinement kernel contains that of the given one as a subspace. The study is motivated from the need of updating the current operator-valued reproducing kernel in multi-task learning when underfitting or overfitting occurs. Numerical simulations confirm that the established refinement kernel method is able to meet this need. Various characterizations are provided based on feature maps and vector-valued integral representations of operator-valued reproducing kernels. Concrete examples of refining translation invariant and finite Hilbert-Schmidt operator-valued reproducing kernels are provided. Other examples include refinement of Hessian of scalar-valued translation-invariant kernels and transformation kernels. Existence and properties of operator-valued reproducing kernels preserved during the refinement process are also investigated.