Source author record

Yuling Jiao

Yuling Jiao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.NA Information Theory math.IT math.OC math.ST Numerical Analysis Statistics Theory Computation Methodology Neural and Evolutionary Computing quant-ph

Catalog footprint

What is connected

19works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Approximation Error Upper and Lower Bounds for Hölder Class with Transformers

We explore the expressive power of Transformers by establishing precise approximation error upper and lower bounds for Hölder class. Specifically, a new approximation upper bound is derived for the standard Transformer architecture equipped with Softmax operators, ReLU activation functions, and residual connections. We prove that a Transformer network composed of at most $\mathcal{O}(\varepsilon^{-{d_{0}}/α})$ blocks can approximate any bounded Hölder function with $d_{0}$-dimensional input and smoothness $α\in(0,1]$ under any accuracy $\varepsilon>0$. In the case of approximation lower bounds, leveraging the VC-dimension upper bound, we are the first to rigorously prove that Transformers demand for at least $Ω(\varepsilon^{-{d_{0}}/({4α})})$ blocks to achieve the $\varepsilon$ approximation accuracy. As a final step, we extend the derived results for standard Transformers to a general regression task and establish the corresponding excess risk rates demonstrating Transformers' empirical effectiveness in real-world settings.

preprint2023arXiv

Deep Nonparametric Regression on Approximate Manifolds: Non-Asymptotic Error Bounds with Polynomial Prefactors

We study the properties of nonparametric least squares regression using deep neural networks. We derive non-asymptotic upper bounds for the prediction error of the empirical risk minimizer of feedforward deep neural regression. Our error bounds achieve minimax optimal rate and significantly improve over the existing ones in the sense that they depend polynomially on the dimension of the predictor, instead of exponentially on dimension. We show that the neural regression estimator can circumvent the curse of dimensionality under the assumption that the predictor is supported on an approximate low-dimensional manifold or a set with low Minkowski dimension. We also establish the optimal convergence rate under the exact manifold support assumption. We investigate how the prediction error of the neural regression estimator depends on the structure of neural networks and propose a notion of network relative efficiency between two types of neural networks, which provides a quantitative measure for evaluating the relative merits of different network structures. To establish these results, we derive a novel approximation error bound for the Hölder smooth functions with a positive smoothness index using ReLU activated neural networks, which may be of independent interest. Our results are derived under weaker assumptions on the data distribution and the neural network structure than those in the existing literature.

preprint2022arXiv

A rate of convergence of Physics Informed Neural Networks for the linear second order elliptic PDEs

In recent years, physical informed neural networks (PINNs) have been shown to be a powerful tool for solving PDEs empirically. However, numerical analysis of PINNs is still missing. In this paper, we prove the convergence rate to PINNs for the second order elliptic equations with Dirichlet boundary condition, by establishing the upper bounds on the number of training samples, depth and width of the deep neural networks to achieve desired accuracy. The error of PINNs is decomposed into approximation error and statistical error, where the approximation error is given in $C^2$ norm with $\mathrm{ReLU}^{3}$ networks (deep network with activations function $\max\{0,x^3\}$) and the statistical error is estimated by Rademacher complexity. We derive the bound on the Rademacher complexity of the non-Lipschitz composition of gradient norm with $\mathrm{ReLU}^{3}$ network, which is of immense independent interest.

preprint2022arXiv

An error analysis of generative adversarial networks for learning distributions

This paper studies how well generative adversarial networks (GANs) learn probability distributions from finite samples. Our main results establish the convergence rates of GANs under a collection of integral probability metrics defined through Hölder classes, including the Wasserstein distance as a special case. We also show that GANs are able to adaptively learn data distributions with low-dimensional structures or have Hölder densities, when the network architectures are chosen properly. In particular, for distributions concentrated around a low-dimensional set, we show that the learning rates of GANs do not depend on the high ambient dimension, but on the lower intrinsic dimension. Our analysis is based on a new oracle inequality decomposing the estimation error into the generator and discriminator approximation error and the statistical error, which may be of independent interest.

preprint2022arXiv

Deep Dimension Reduction for Supervised Representation Learning

The goal of supervised representation learning is to construct effective data representations for prediction. Among all the characteristics of an ideal nonparametric representation of high-dimensional complex data, sufficiency, low dimensionality and disentanglement are some of the most essential ones. We propose a deep dimension reduction approach to learning representations with these characteristics. The proposed approach is a nonparametric generalization of the sufficient dimension reduction method. We formulate the ideal representation learning task as that of finding a nonparametric representation that minimizes an objective function characterizing conditional independence and promoting disentanglement at the population level. We then estimate the target representation at the sample level nonparametrically using deep neural networks. We show that the estimated deep nonparametric representation is consistent in the sense that its excess risk converges to zero. Our extensive numerical experiments using simulated and real benchmark data demonstrate that the proposed methods have better performance than several existing dimension reduction methods and the standard deep learning models in the context of classification and regression.

preprint2022arXiv

Deep Generative Survival Analysis: Nonparametric Estimation of Conditional Survival Function

We propose a deep generative approach to nonparametric estimation of conditional survival and hazard functions with right-censored data. The key idea of the proposed method is to first learn a conditional generator for the joint conditional distribution of the observed time and censoring indicator given the covariates, and then construct the Kaplan-Meier and Nelson-Aalen estimators based on this conditional generator for the conditional hazard and survival functions. Our method combines ideas from the recently developed deep generative learning and classical nonparametric estimation in survival analysis. We analyze the convergence properties of the proposed method and establish the consistency of the generative nonparametric estimators of the conditional survival and hazard functions. Our numerical experiments validate the proposed method and demonstrate its superior performance in a range of simulated models. We also illustrate the applications of the proposed method in constructing prediction intervals for survival times with the PBC (Primary Biliary Cholangitis) and SUPPORT (Study to Understand Prognoses and Preferences for Outcomes and Risks of Treatments) datasets.

preprint2022arXiv

Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse of Dimensionality in Approximation on Hölder Class

In this paper, we construct neural networks with ReLU, sine and $2^x$ as activation functions. For general continuous $f$ defined on $[0,1]^d$ with continuity modulus $ω_f(\cdot)$, we construct ReLU-sine-$2^x$ networks that enjoy an approximation rate $\mathcal{O}(ω_f(\sqrt{d})\cdot2^{-M}+ω_{f}\left(\frac{\sqrt{d}}{N}\right))$, where $M,N\in \mathbb{N}^{+}$ denote the hyperparameters related to widths of the networks. As a consequence, we can construct ReLU-sine-$2^x$ network with the depth $5$ and width $\max\left\{\left\lceil2d^{3/2}\left(\frac{3μ}ε\right)^{1/α}\right\rceil,2\left\lceil\log_2\frac{3μd^{α/2}}{2ε}\right\rceil+2\right\}$ that approximates $f\in \mathcal{H}_μ^α([0,1]^d)$ within a given tolerance $ε>0$ measured in $L^p$ norm $p\in[1,\infty)$, where $\mathcal{H}_μ^α([0,1]^d)$ denotes the Hölder continuous function class defined on $[0,1]^d$ with order $α\in (0,1]$ and constant $μ> 0$. Therefore, the ReLU-sine-$2^x$ networks overcome the curse of dimensionality on $\mathcal{H}_μ^α([0,1]^d)$. In addition to its supper expressive power, functions implemented by ReLU-sine-$2^x$ networks are (generalized) differentiable, enabling us to apply SGD to train.

preprint2022arXiv

Efficient and practical quantum compiler towards multi-qubit systems with deep reinforcement learning

Efficient quantum compiling tactics greatly enhance the capability of quantum computers to execute complicated quantum algorithms. Due to its fundamental importance, a plethora of quantum compilers has been designed in past years. However, there are several caveats to current protocols, which are low optimality, high inference time, limited scalability, and lack of universality. To compensate for these defects, here we devise an efficient and practical quantum compiler assisted by advanced deep reinforcement learning (RL) techniques, i.e., data generation, deep Q-learning, and AQ* search. In this way, our protocol is compatible with various quantum machines and can be used to compile multi-qubit operators. We systematically evaluate the performance of our proposal in compiling quantum operators with both inverse-closed and inverse-free universal basis sets. In the task of single-qubit operator compiling, our proposal outperforms other RL-based quantum compilers in the measure of compiling sequence length and inference time. Meanwhile, the output solution is near-optimal, guaranteed by the Solovay-Kitaev theorem. Notably, for the inverse-free universal basis set, the achieved sequence length complexity is comparable with the inverse-based setting and dramatically advances previous methods. These empirical results contribute to improving the inverse-free Solovay-Kitaev theorem. In addition, for the first time, we demonstrate how to leverage RL-based quantum compilers to accomplish two-qubit operator compiling. The achieved results open an avenue for integrating RL with quantum compiling to unify efficiency and practicality and thus facilitate the exploration of quantum advantages.

preprint2022arXiv

Estimation of Non-Crossing Quantile Regression Process with Deep ReQU Neural Networks

We propose a penalized nonparametric approach to estimating the quantile regression process (QRP) in a nonseparable model using rectifier quadratic unit (ReQU) activated deep neural networks and introduce a novel penalty function to enforce non-crossing of quantile regression curves. We establish the non-asymptotic excess risk bounds for the estimated QRP and derive the mean integrated squared error for the estimated QRP under mild smoothness and regularity conditions. To establish these non-asymptotic risk and estimation error bounds, we also develop a new error bound for approximating $C^s$ smooth functions with $s >0$ and their derivatives using ReQU activated neural networks. This is a new approximation result for ReQU networks and is of independent interest and may be useful in other problems. Our numerical experiments demonstrate that the proposed method is competitive with or outperforms two existing methods, including methods using reproducing kernels and random forests, for nonparametric quantile regression.

preprint2022arXiv

Global Optimization via Schr{ö}dinger-F{ö}llmer Diffusion

We study the problem of finding global minimizers of $V(x):\mathbb{R}^d\rightarrow\mathbb{R}$ approximately via sampling from a probability distribution $μ_σ$ with density $p_σ(x)=\dfrac{\exp(-V(x)/σ)}{\int_{\mathbb R^d} \exp(-V(y)/σ) dy }$ with respect to the Lebesgue measure for $σ\in (0,1]$ small enough. We analyze a sampler based on the Euler-Maruyama discretization of the Schr{ö}dinger-F{ö}llmer diffusion processes with stochastic approximation under appropriate assumptions on the step size $s$ and the potential $V$. We prove that the output of the proposed sampler is an approximate global minimizer of $V(x)$ with high probability at cost of sampling $\mathcal{O}(d^{3})$ standard normal random variables. Numerical studies illustrate the effectiveness of the proposed method and its superiority to the Langevin method.

preprint2021arXiv

Convergence Rate Analysis for Deep Ritz Method

Using deep neural networks to solve PDEs has attracted a lot of attentions recently. However, why the deep learning method works is falling far behind its empirical success. In this paper, we provide a rigorous numerical analysis on deep Ritz method (DRM) \cite{wan11} for second order elliptic equations with Neumann boundary conditions. We establish the first nonasymptotic convergence rate in $H^1$ norm for DRM using deep networks with $\mathrm{ReLU}^2$ activation functions. In addition to providing a theoretical justification of DRM, our study also shed light on how to set the hyper-parameter of depth and width to achieve the desired convergence rate in terms of number of training samples. Technically, we derive bounds on the approximation error of deep $\mathrm{ReLU}^2$ network in $H^1$ norm and on the Rademacher complexity of the non-Lipschitz composition of gradient norm and $\mathrm{ReLU}^2$ network, both of which are of independent interest.

preprint2020arXiv

$\ell_0$-Regularized High-dimensional Accelerated Failure Time Model

We develop a constructive approach for $\ell_0$-penalized estimation in the sparse accelerated failure time (AFT) model with high-dimensional covariates. Our proposed method is based on Stute's weighted least squares criterion combined with $\ell_0$-penalization. This method is a computational algorithm that generates a sequence of solutions iteratively, based on active sets derived from primal and dual information and root finding according to the KKT conditions. We refer to the proposed method as AFT-SDAR (for support detection and root finding). An important aspect of our theoretical results is that we directly concern the sequence of solutions generated based on the AFT-SDAR algorithm. We prove that the estimation errors of the solution sequence decay exponentially to the optimal error bound with high probability, as long as the covariate matrix satisfies a mild regularity condition which is necessary and sufficient for model identification even in the setting of high-dimensional linear regression. We also proposed an adaptive version of AFT-SDAR, or AFT-ASDAR, which determines the support size of the estimated coefficient in a data-driven fashion. We conduct simulation studies to demonstrate the superior performance of the proposed method over the lasso and MCP in terms of accuracy and speed. We also apply the proposed method to a real data set to illustrate its application.

preprint2020arXiv

A Support Detection and Root Finding Approach for Learning High-dimensional Generalized Linear Models

Feature selection is important for modeling high-dimensional data, where the number of variables can be much larger than the sample size. In this paper, we develop a support detection and root finding procedure to learn the high dimensional sparse generalized linear models and denote this method by GSDAR. Based on the KKT condition for $\ell_0$-penalized maximum likelihood estimations, GSDAR generates a sequence of estimators iteratively. Under some restricted invertibility conditions on the maximum likelihood function and sparsity assumption on the target coefficients, the errors of the proposed estimate decays exponentially to the optimal order. Moreover, the oracle estimator can be recovered if the target signal is stronger than the detectable level. We conduct simulations and real data analysis to illustrate the advantages of our proposed method over several existing methods, including Lasso and MCP.

preprint2020arXiv

Learning Implicit Generative Models with Theoretical Guarantees

We propose a \textbf{uni}fied \textbf{f}ramework for \textbf{i}mplicit \textbf{ge}nerative \textbf{m}odeling (UnifiGem) with theoretical guarantees by integrating approaches from optimal transport, numerical ODE, density-ratio (density-difference) estimation and deep neural networks. First, the problem of implicit generative learning is formulated as that of finding the optimal transport map between the reference distribution and the target distribution, which is characterized by a totally nonlinear Monge-Ampère equation. Interpreting the infinitesimal linearization of the Monge-Ampère equation from the perspective of gradient flows in measure spaces leads to the continuity equation or the McKean-Vlasov equation. We then solve the McKean-Vlasov equation numerically using the forward Euler iteration, where the forward Euler map depends on the density ratio (density difference) between the distribution at current iteration and the underlying target distribution. We further estimate the density ratio (density difference) via deep density-ratio (density-difference) fitting and derive explicit upper bounds on the estimation error. Experimental results on both synthetic datasets and real benchmark datasets support our theoretical findings and demonstrate the effectiveness of UnifiGem.

preprint2020arXiv

Robust Decoding from Binary Measurements with Cardinality Constraint Least Squares

The main goal of 1-bit compressive sampling is to decode $n$ dimensional signals with sparsity level $s$ from $m$ binary measurements. This is a challenging task due to the presence of nonlinearity, noises and sign flips. In this paper, the cardinality constraint least square is proposed as a desired decoder. We prove that, up to a constant $c$, with high probability, the proposed decoder achieves a minimax estimation error as long as $m \geq \mathcal{O}( s\log n)$. Computationally, we utilize a generalized Newton algorithm (GNA) to solve the cardinality constraint minimization problem with the cost of solving a least squares problem with small size at each iteration. We prove that, with high probability, the $\ell_{\infty}$ norm of the estimation error between the output of GNA and the underlying target decays to $\mathcal{O}(\sqrt{\frac{\log n }{m}}) $ after at most $\mathcal{O}(\log s)$ iterations. Moreover, the underlying support can be recovered with high probability in $\mathcal{O}(\log s)$ steps provided that the target signal is detectable. Extensive numerical simulations and comparisons with state-of-the-art methods are presented to illustrate the robustness of our proposed decoder and the efficiency of the GNA algorithm.

preprint2016arXiv

Alternating Direction Method of Multipliers for Linear Inverse Problems

In this paper we propose an iterative method using alternating direction method of multipliers (ADMM) strategy to solve linear inverse problems in Hilbert spaces with general convex penalty term. When the data is given exactly, we give a convergence analysis of our ADMM algorithm without assuming the existence of Lagrange multiplier. In case the data contains noise, we show that our method is a regularization method as long as it is terminated by a suitable stopping rule. Various numerical simulations are performed to test the efficiency of the method.

preprint2016arXiv

Group Sparse Recovery via the $\ell^0(\ell^2)$ Penalty: Theory and Algorithm

In this work we propose and analyze a novel approach for group sparse recovery. It is based on regularized least squares with an $\ell^0(\ell^2)$ penalty, which penalizes the number of nonzero groups. One distinct feature of the approach is that it has the built-in decorrelation mechanism within each group, and thus can handle challenging strong inner-group correlation. We provide a complete analysis of the regularized model, e.g., existence of a global minimizer, invariance property, support recovery, and properties of block coordinatewise minimizers. Further, the regularized problem admits an efficient primal dual active set algorithm with a provable finite-step global convergence. At each iteration, it involves solving a least-squares problem on the active set only, and exhibits a fast local convergence, which makes the method extremely efficient for recovering group sparse signals. Extensive numerical experiments are presented to illustrate salient features of the model and the efficiency and accuracy of the algorithm. A comparative study indicates its competitiveness with existing approaches.

preprint2014arXiv

A Primal Dual Active Set with Continuation Algorithm for the \ell^0-Regularized Optimization Problem

We develop a primal dual active set with continuation algorithm for solving the \ell^0-regularized least-squares problem that frequently arises in compressed sensing. The algorithm couples the the primal dual active set method with a continuation strategy on the regularization parameter. At each inner iteration, it first identifies the active set from both primal and dual variables, and then updates the primal variable by solving a (typically small) least-squares problem defined on the active set, from which the dual variable can be updated explicitly. Under certain conditions on the sensing matrix, i.e., mutual incoherence property or restricted isometry property, and the noise level, the finite step global convergence of the algorithm is established. Extensive numerical examples are presented to illustrate the efficiency and accuracy of the algorithm and the convergence analysis.

preprint2013arXiv

A Primal Dual Active Set Algorithm with Continuation for Compressed Sensing

The success of compressed sensing relies essentially on the ability to efficiently find an approximately sparse solution to an under-determined linear system. In this paper, we developed an efficient algorithm for the sparsity promoting $\ell_1$-regularized least squares problem by coupling the primal dual active set strategy with a continuation technique (on the regularization parameter). In the active set strategy, we first determine the active set from primal and dual variables, and then update the primal and dual variables by solving a low-dimensional least square problem on the active set, which makes the algorithm very efficient. The continuation technique globalizes the convergence of the algorithm, with provable global convergence under restricted isometry property (RIP). Further, we adopt two alternative methods, i.e., a modified discrepancy principle and a Bayesian information criterion, to choose the regularization parameter. Numerical experiments indicate that our algorithm is very competitive with state-of-the-art algorithms in terms of accuracy and efficiency.

Yuling Jiao

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

Approximation Error Upper and Lower Bounds for Hölder Class with Transformers

Deep Nonparametric Regression on Approximate Manifolds: Non-Asymptotic Error Bounds with Polynomial Prefactors

A rate of convergence of Physics Informed Neural Networks for the linear second order elliptic PDEs

An error analysis of generative adversarial networks for learning distributions

Deep Dimension Reduction for Supervised Representation Learning

Deep Generative Survival Analysis: Nonparametric Estimation of Conditional Survival Function

Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse of Dimensionality in Approximation on Hölder Class

Efficient and practical quantum compiler towards multi-qubit systems with deep reinforcement learning

Estimation of Non-Crossing Quantile Regression Process with Deep ReQU Neural Networks

Global Optimization via Schr{ö}dinger-F{ö}llmer Diffusion

Convergence Rate Analysis for Deep Ritz Method

$\ell_0$-Regularized High-dimensional Accelerated Failure Time Model

A Support Detection and Root Finding Approach for Learning High-dimensional Generalized Linear Models

Learning Implicit Generative Models with Theoretical Guarantees

Robust Decoding from Binary Measurements with Cardinality Constraint Least Squares

Alternating Direction Method of Multipliers for Linear Inverse Problems

Group Sparse Recovery via the $\ell^0(\ell^2)$ Penalty: Theory and Algorithm

A Primal Dual Active Set with Continuation Algorithm for the \ell^0-Regularized Optimization Problem

A Primal Dual Active Set Algorithm with Continuation for Compressed Sensing