Source author record

Xiliang Lu

Xiliang Lu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.NA Machine Learning Numerical Analysis Information Theory math.IT math.OC Computation eess.IV math.AP Neural and Evolutionary Computing quant-ph

Catalog footprint

What is connected

17works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Approximation Error Upper and Lower Bounds for Hölder Class with Transformers

We explore the expressive power of Transformers by establishing precise approximation error upper and lower bounds for Hölder class. Specifically, a new approximation upper bound is derived for the standard Transformer architecture equipped with Softmax operators, ReLU activation functions, and residual connections. We prove that a Transformer network composed of at most $\mathcal{O}(\varepsilon^{-{d_{0}}/α})$ blocks can approximate any bounded Hölder function with $d_{0}$-dimensional input and smoothness $α\in(0,1]$ under any accuracy $\varepsilon>0$. In the case of approximation lower bounds, leveraging the VC-dimension upper bound, we are the first to rigorously prove that Transformers demand for at least $Ω(\varepsilon^{-{d_{0}}/({4α})})$ blocks to achieve the $\varepsilon$ approximation accuracy. As a final step, we extend the derived results for standard Transformers to a general regression task and establish the corresponding excess risk rates demonstrating Transformers' empirical effectiveness in real-world settings.

preprint2022arXiv

A rate of convergence of Physics Informed Neural Networks for the linear second order elliptic PDEs

In recent years, physical informed neural networks (PINNs) have been shown to be a powerful tool for solving PDEs empirically. However, numerical analysis of PINNs is still missing. In this paper, we prove the convergence rate to PINNs for the second order elliptic equations with Dirichlet boundary condition, by establishing the upper bounds on the number of training samples, depth and width of the deep neural networks to achieve desired accuracy. The error of PINNs is decomposed into approximation error and statistical error, where the approximation error is given in $C^2$ norm with $\mathrm{ReLU}^{3}$ networks (deep network with activations function $\max\{0,x^3\}$) and the statistical error is estimated by Rademacher complexity. We derive the bound on the Rademacher complexity of the non-Lipschitz composition of gradient norm with $\mathrm{ReLU}^{3}$ network, which is of immense independent interest.

preprint2022arXiv

Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse of Dimensionality in Approximation on Hölder Class

In this paper, we construct neural networks with ReLU, sine and $2^x$ as activation functions. For general continuous $f$ defined on $[0,1]^d$ with continuity modulus $ω_f(\cdot)$, we construct ReLU-sine-$2^x$ networks that enjoy an approximation rate $\mathcal{O}(ω_f(\sqrt{d})\cdot2^{-M}+ω_{f}\left(\frac{\sqrt{d}}{N}\right))$, where $M,N\in \mathbb{N}^{+}$ denote the hyperparameters related to widths of the networks. As a consequence, we can construct ReLU-sine-$2^x$ network with the depth $5$ and width $\max\left\{\left\lceil2d^{3/2}\left(\frac{3μ}ε\right)^{1/α}\right\rceil,2\left\lceil\log_2\frac{3μd^{α/2}}{2ε}\right\rceil+2\right\}$ that approximates $f\in \mathcal{H}_μ^α([0,1]^d)$ within a given tolerance $ε>0$ measured in $L^p$ norm $p\in[1,\infty)$, where $\mathcal{H}_μ^α([0,1]^d)$ denotes the Hölder continuous function class defined on $[0,1]^d$ with order $α\in (0,1]$ and constant $μ> 0$. Therefore, the ReLU-sine-$2^x$ networks overcome the curse of dimensionality on $\mathcal{H}_μ^α([0,1]^d)$. In addition to its supper expressive power, functions implemented by ReLU-sine-$2^x$ networks are (generalized) differentiable, enabling us to apply SGD to train.

preprint2022arXiv

Efficient and practical quantum compiler towards multi-qubit systems with deep reinforcement learning

Efficient quantum compiling tactics greatly enhance the capability of quantum computers to execute complicated quantum algorithms. Due to its fundamental importance, a plethora of quantum compilers has been designed in past years. However, there are several caveats to current protocols, which are low optimality, high inference time, limited scalability, and lack of universality. To compensate for these defects, here we devise an efficient and practical quantum compiler assisted by advanced deep reinforcement learning (RL) techniques, i.e., data generation, deep Q-learning, and AQ* search. In this way, our protocol is compatible with various quantum machines and can be used to compile multi-qubit operators. We systematically evaluate the performance of our proposal in compiling quantum operators with both inverse-closed and inverse-free universal basis sets. In the task of single-qubit operator compiling, our proposal outperforms other RL-based quantum compilers in the measure of compiling sequence length and inference time. Meanwhile, the output solution is near-optimal, guaranteed by the Solovay-Kitaev theorem. Notably, for the inverse-free universal basis set, the achieved sequence length complexity is comparable with the inverse-based setting and dramatically advances previous methods. These empirical results contribute to improving the inverse-free Solovay-Kitaev theorem. In addition, for the first time, we demonstrate how to leverage RL-based quantum compilers to accomplish two-qubit operator compiling. The achieved results open an avenue for integrating RL with quantum compiling to unify efficiency and practicality and thus facilitate the exploration of quantum advantages.

preprint2022arXiv

Global Optimization via Schr{ö}dinger-F{ö}llmer Diffusion

We study the problem of finding global minimizers of $V(x):\mathbb{R}^d\rightarrow\mathbb{R}$ approximately via sampling from a probability distribution $μ_σ$ with density $p_σ(x)=\dfrac{\exp(-V(x)/σ)}{\int_{\mathbb R^d} \exp(-V(y)/σ) dy }$ with respect to the Lebesgue measure for $σ\in (0,1]$ small enough. We analyze a sampler based on the Euler-Maruyama discretization of the Schr{ö}dinger-F{ö}llmer diffusion processes with stochastic approximation under appropriate assumptions on the step size $s$ and the potential $V$. We prove that the output of the proposed sampler is an approximate global minimizer of $V(x)$ with high probability at cost of sampling $\mathcal{O}(d^{3})$ standard normal random variables. Numerical studies illustrate the effectiveness of the proposed method and its superiority to the Langevin method.

preprint2022arXiv

Imaging Anisotropic Conductivities from Current Densities

In this paper, we propose and analyze a reconstruction algorithm for imaging an anisotropic conductivity tensor in a second-order elliptic PDE with a nonzero Dirichlet boundary condition from internal current densities. It is based on a regularized output least-squares formulation with the standard $L^2(Ω)^{d,d}$ penalty, which is then discretized by the standard Galerkin finite element method. We establish the continuity and differentiability of the forward map with respect to the conductivity tensor in the $L^p(Ω)^{d,d}$-norms, the existence of minimizers and optimality systems of the regularized formulation using the concept of H-convergence. Further, we provide a detailed analysis of the discretized problem, especially the convergence of the discrete approximations with respect to the mesh size, using the discrete counterpart of H-convergence. In addition, we develop a projected Newton algorithm for solving the first-order optimality system. We present extensive two-dimensional numerical examples to show the efficiency of the proposed method.

preprint2022arXiv

Imaging Conductivity from Current Density Magnitude using Neural Networks

Conductivity imaging represents one of the most important tasks in medical imaging. In this work we develop a neural network based reconstruction technique for imaging the conductivity from the magnitude of the internal current density. It is achieved by formulating the problem as a relaxed weighted least-gradient problem, and then approximating its minimizer by standard fully connected feedforward neural networks. We derive bounds on two components of the generalization error, i.e., approximation error and statistical error, explicitly in terms of properties of the neural networks (e.g., depth, total number of parameters, and the bound of the network parameters). We illustrate the performance and distinct features of the approach on several numerical experiments. Numerically, it is observed that the approach enjoys remarkable robustness with respect to the presence of data noise.

preprint2021arXiv

Convergence Rate Analysis for Deep Ritz Method

Using deep neural networks to solve PDEs has attracted a lot of attentions recently. However, why the deep learning method works is falling far behind its empirical success. In this paper, we provide a rigorous numerical analysis on deep Ritz method (DRM) \cite{wan11} for second order elliptic equations with Neumann boundary conditions. We establish the first nonasymptotic convergence rate in $H^1$ norm for DRM using deep networks with $\mathrm{ReLU}^2$ activation functions. In addition to providing a theoretical justification of DRM, our study also shed light on how to set the hyper-parameter of depth and width to achieve the desired convergence rate in terms of number of training samples. Technically, we derive bounds on the approximation error of deep $\mathrm{ReLU}^2$ network in $H^1$ norm and on the Rademacher complexity of the non-Lipschitz composition of gradient norm and $\mathrm{ReLU}^2$ network, both of which are of independent interest.

preprint2020arXiv

A Support Detection and Root Finding Approach for Learning High-dimensional Generalized Linear Models

Feature selection is important for modeling high-dimensional data, where the number of variables can be much larger than the sample size. In this paper, we develop a support detection and root finding procedure to learn the high dimensional sparse generalized linear models and denote this method by GSDAR. Based on the KKT condition for $\ell_0$-penalized maximum likelihood estimations, GSDAR generates a sequence of estimators iteratively. Under some restricted invertibility conditions on the maximum likelihood function and sparsity assumption on the target coefficients, the errors of the proposed estimate decays exponentially to the optimal order. Moreover, the oracle estimator can be recovered if the target signal is stronger than the detectable level. We conduct simulations and real data analysis to illustrate the advantages of our proposed method over several existing methods, including Lasso and MCP.

preprint2020arXiv

Robust Decoding from Binary Measurements with Cardinality Constraint Least Squares

The main goal of 1-bit compressive sampling is to decode $n$ dimensional signals with sparsity level $s$ from $m$ binary measurements. This is a challenging task due to the presence of nonlinearity, noises and sign flips. In this paper, the cardinality constraint least square is proposed as a desired decoder. We prove that, up to a constant $c$, with high probability, the proposed decoder achieves a minimax estimation error as long as $m \geq \mathcal{O}( s\log n)$. Computationally, we utilize a generalized Newton algorithm (GNA) to solve the cardinality constraint minimization problem with the cost of solving a least squares problem with small size at each iteration. We prove that, with high probability, the $\ell_{\infty}$ norm of the estimation error between the output of GNA and the underlying target decays to $\mathcal{O}(\sqrt{\frac{\log n }{m}}) $ after at most $\mathcal{O}(\log s)$ iterations. Moreover, the underlying support can be recovered with high probability in $\mathcal{O}(\log s)$ steps provided that the target signal is detectable. Extensive numerical simulations and comparisons with state-of-the-art methods are presented to illustrate the robustness of our proposed decoder and the efficiency of the GNA algorithm.

preprint2016arXiv

Alternating Direction Method of Multipliers for Linear Inverse Problems

In this paper we propose an iterative method using alternating direction method of multipliers (ADMM) strategy to solve linear inverse problems in Hilbert spaces with general convex penalty term. When the data is given exactly, we give a convergence analysis of our ADMM algorithm without assuming the existence of Lagrange multiplier. In case the data contains noise, we show that our method is a regularization method as long as it is terminated by a suitable stopping rule. Various numerical simulations are performed to test the efficiency of the method.

preprint2016arXiv

Group Sparse Recovery via the $\ell^0(\ell^2)$ Penalty: Theory and Algorithm

In this work we propose and analyze a novel approach for group sparse recovery. It is based on regularized least squares with an $\ell^0(\ell^2)$ penalty, which penalizes the number of nonzero groups. One distinct feature of the approach is that it has the built-in decorrelation mechanism within each group, and thus can handle challenging strong inner-group correlation. We provide a complete analysis of the regularized model, e.g., existence of a global minimizer, invariance property, support recovery, and properties of block coordinatewise minimizers. Further, the regularized problem admits an efficient primal dual active set algorithm with a provable finite-step global convergence. At each iteration, it involves solving a least-squares problem on the active set only, and exhibits a fast local convergence, which makes the method extremely efficient for recovering group sparse signals. Extensive numerical experiments are presented to illustrate salient features of the model and the efficiency and accuracy of the algorithm. A comparative study indicates its competitiveness with existing approaches.

preprint2015arXiv

A simple finite element method for the boundary value problem with a Riemann-Liouville derivative

We consider a boundary value problem involving a Riemann-Liouville fractional derivative of order $α\in (3/2,2)$ on the unit interval $(0,1)$. The standard Galerkin finite element approximation converges slowly due to the presence of singularity term $x^{α-1}$ in the solution representation. In this work, we develop a simple technique, by transforming it into a second-order two-point boundary value problem with nonlocal low order terms, whose solution can reconstruct directly the solution to the original problem. The stability of the variational formulation, and the optimal regularity pickup of the solution are analyzed. A novel Galerkin finite element method with piecewise linear or quadratic finite elements is developed, and $L^2(D)$ error estimates are provided. The approach is then applied to the corresponding fractional Sturm-Liouville problem, and error estimates of the eigenvalue approximations are given. Extensive numerical results fully confirm our theoretical study.

preprint2014arXiv

A fast nonstationary iterative method with convex penalty for inverse problems in Hilbert spaces

In this paper we consider the computation of approximate solutions for inverse problems in Hilbert spaces. In order to capture the special feature of solutions, non-smooth convex functions are introduced as penalty terms. By exploiting the Hilbert space structure of the underlying problems, we propose a fast iterative regularization method which reduces to the classical nonstationary iterated Tikhonov regularization when the penalty term is chosen to be the square of norm. Each iteration of the method consists of two steps: the first step involves only the operator from the problem while the second step involves only the penalty term. This splitting character has the advantage of making the computation efficient. In case the data is corrupted by noise, a stopping rule is proposed to terminate the method and the corresponding regularization property is established. Finally, we test the performance of the method by reporting various numerical simulations, including the image deblurring, the determination of source term in Poisson equation, and the de-autoconvolution problem.

preprint2014arXiv

A Primal Dual Active Set with Continuation Algorithm for the \ell^0-Regularized Optimization Problem

We develop a primal dual active set with continuation algorithm for solving the \ell^0-regularized least-squares problem that frequently arises in compressed sensing. The algorithm couples the the primal dual active set method with a continuation strategy on the regularization parameter. At each inner iteration, it first identifies the active set from both primal and dual variables, and then updates the primal variable by solving a (typically small) least-squares problem defined on the active set, from which the dual variable can be updated explicitly. Under certain conditions on the sensing matrix, i.e., mutual incoherence property or restricted isometry property, and the noise level, the finite step global convergence of the algorithm is established. Extensive numerical examples are presented to illustrate the efficiency and accuracy of the algorithm and the convergence analysis.

preprint2013arXiv

A Primal Dual Active Set Algorithm with Continuation for Compressed Sensing

The success of compressed sensing relies essentially on the ability to efficiently find an approximately sparse solution to an under-determined linear system. In this paper, we developed an efficient algorithm for the sparsity promoting $\ell_1$-regularized least squares problem by coupling the primal dual active set strategy with a continuation technique (on the regularization parameter). In the active set strategy, we first determine the active set from primal and dual variables, and then update the primal and dual variables by solving a low-dimensional least square problem on the active set, which makes the algorithm very efficient. The continuation technique globalizes the convergence of the algorithm, with provable global convergence under restricted isometry property (RIP). Further, we adopt two alternative methods, i.e., a modified discrepancy principle and a Bayesian information criterion, to choose the regularization parameter. Numerical experiments indicate that our algorithm is very competitive with state-of-the-art algorithms in terms of accuracy and efficiency.

preprint2013arXiv

An Analysis of Finite Element Approximation in Electrical Impedance Tomography

We present a finite element analysis of electrical impedance tomography for reconstructing the conductivity distribution from electrode voltage measurements by means of Tikhonov regularization. Two popular choices of the penalty term, i.e., $H^1(Ω)$-norm smoothness penalty and total variation seminorm penalty, are considered. A piecewise linear finite element method is employed for discretizing the forward model, i.e., the complete electrode model, the conductivity, and the penalty functional. The convergence of the finite element approximations for the Tikhonov model on both polyhedral and smooth curved domains is established. This provides rigorous justifications for the ad hoc discretization procedures in the literature.

Xiliang Lu

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Approximation Error Upper and Lower Bounds for Hölder Class with Transformers

A rate of convergence of Physics Informed Neural Networks for the linear second order elliptic PDEs

Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse of Dimensionality in Approximation on Hölder Class

Efficient and practical quantum compiler towards multi-qubit systems with deep reinforcement learning

Global Optimization via Schr{ö}dinger-F{ö}llmer Diffusion

Imaging Anisotropic Conductivities from Current Densities

Imaging Conductivity from Current Density Magnitude using Neural Networks

Convergence Rate Analysis for Deep Ritz Method

A Support Detection and Root Finding Approach for Learning High-dimensional Generalized Linear Models

Robust Decoding from Binary Measurements with Cardinality Constraint Least Squares

Alternating Direction Method of Multipliers for Linear Inverse Problems

Group Sparse Recovery via the $\ell^0(\ell^2)$ Penalty: Theory and Algorithm

A simple finite element method for the boundary value problem with a Riemann-Liouville derivative

A fast nonstationary iterative method with convex penalty for inverse problems in Hilbert spaces

A Primal Dual Active Set with Continuation Algorithm for the \ell^0-Regularized Optimization Problem

A Primal Dual Active Set Algorithm with Continuation for Compressed Sensing

An Analysis of Finite Element Approximation in Electrical Impedance Tomography