Source author record

Wenjing Liao

Wenjing Liao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.NA Information Theory math.IT physics.optics Computer Vision math.ST Statistics Theory

Catalog footprint

What is connected

13works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

GemDepth: Geometry-Embedded Features for 3D-Consistent Video Depth

Video depth estimation extends monocular prediction into the temporal domain to ensure coherence. However, existing methods often suffer from spatial blurring in fine-detail regions and temporal inconsistencies. We argue that current approaches, which primarily rely on temporal smoothing via Transformers, struggle to maintain strict 3D geometric consistency-particularly under rotations or drastic view changes. To address this, we propose GemDepth, a framework built on the insight that an explicit awareness of camera motion and global 3D structure is a prerequisite for 3D consistency. Distinctively, GemDepth introduces a Geometry-Embedding Module (GEM) that predicts inter-frame camera poses to generate implicit geometric embeddings. This injection of motion priors equips the network with intrinsic 3D perception and alignment capabilities. Guided by these geometric cues, our Alternating Spatio-Temporal Transformer (ASTT) captures latent point-level correspondences to simultaneously enhance spatial precision for sharp details and enforce rigorous temporal consistency. Furthermore, GemDepth employs a data-efficient training strategy, effectively bridging the gap between high efficiency and robust geometric consistency. As shown in Fig.2, comprehensive evaluations demonstrate that GemDepth achieves state-of-the-art performance across multiple datasets, particularly in complex dynamic scenarios. The code is publicly available at: https://github.com/Yuecheng919/GemDepth.

preprint2026arXiv

Learning Theory of Transformers: Local-to-Global Approximation via Softmax Partition of Unity

This paper investigates the learning theory of Transformer networks for regression tasks on the compact Euclidean domain $[0,1]^d$ and $d$-dimensional compact Riemannian manifolds. We propose a novel constructive approximation framework for Transformers that builds local approximations of the target function and aggregates them into a global approximation via softmax partition of unity. This approach leverages the attention mechanism to achieve spatial localization through affine transformations of the input. The softmax activation plays a crucial role in aggregating local approximations to a global output. From an approximation perspective, we prove that a dense Transformer equipped with only two encoder blocks and standard single-hidden-layer point-wise feed-forward networks can achieve a uniform $\varepsilon$-approximation error for $α$-Hölder continuous functions with $α\in (0,1]$ using $\mathcal{O}(\varepsilon^{-d/α})$ total parameters. Building upon this approximation guarantee, we establish a near minimax-optimal generalization error bound of order $\mathcal{O}\big(n^{-\frac{2α}{2α+d}} \log n\big)$ for the empirical risk minimizer, where $n$ is the training data size. The Transformer architecture studied in this paper is dense, shallow and wide, and employs softmax activation and sinusoidal positional encodings, closely reflecting practical implementations.

preprint2026arXiv

Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer

Pre-trained transformers are able to learn from examples provided as part of the prompt without any weight updates, a remarkable ability known as in-context learning (ICL). Despite its demonstrated efficacy across various domains, the theoretical understanding of ICL is still developing. Whereas most existing theory has focused on linear models, we study ICL in the nonlinear regression setting. Through the interaction mechanism in attention, we explicitly construct transformer networks to realize nonlinear features, such as polynomial or spline bases, which span a wide class of functions. Based on this construction, we establish a framework to analyze end-to-end in-context nonlinear regression with the constructed features. Our theory provides finite-sample generalization error bounds in terms of context length and training set size. We numerically validate the theory on synthetic regression tasks.

preprint2022arXiv

Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint

Overparameterized neural networks enjoy great representation power on complex data, and more importantly yield sufficiently smooth output, which is crucial to their generalization and robustness. Most existing function approximation theories suggest that with sufficiently many parameters, neural networks can well approximate certain classes of functions in terms of the function value. The neural network themselves, however, can be highly nonsmooth. To bridge this gap, we take convolutional residual networks (ConvResNets) as an example, and prove that large ConvResNets can not only approximate a target function in terms of function value, but also exhibit sufficient first-order smoothness. Moreover, we extend our theory to approximating functions supported on a low-dimensional manifold. Our theory partially justifies the benefits of using deep and wide networks in practice. Numerical experiments on adversarial robust image classification are provided to support our theory.

preprint2022arXiv

Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces

Learning operators between infinitely dimensional spaces is an important learning task arising in wide applications in machine learning, imaging science, mathematical modeling and simulations, etc. This paper studies the nonparametric estimation of Lipschitz operators using deep neural networks. Non-asymptotic upper bounds are derived for the generalization error of the empirical risk minimizer over a properly chosen network class. Under the assumption that the target operator exhibits a low dimensional structure, our error bounds decay as the training sample size increases, with an attractive fast rate depending on the intrinsic dimension in our estimation. Our assumptions cover most scenarios in real applications and our results give rise to fast rates by exploiting low dimensional structures of data in operator estimation. We also investigate the influence of network structures (e.g., network width, depth, and sparsity) on the generalization error of the neural network estimator and propose a general suggestion on the choice of network structures to maximize the learning efficiency quantitatively.

preprint2022arXiv

Distribution Approximation and Statistical Estimation Guarantees of Generative Adversarial Networks

Generative Adversarial Networks (GANs) have achieved a great success in unsupervised learning. Despite its remarkable empirical performance, there are limited theoretical studies on the statistical properties of GANs. This paper provides approximation and statistical guarantees of GANs for the estimation of data distributions that have densities in a Hölder space. Our main result shows that, if the generator and discriminator network architectures are properly chosen, GANs are consistent estimators of data distributions under strong discrepancy metrics, such as the Wasserstein-1 distance. Furthermore, when the data distribution exhibits low-dimensional structures, we show that GANs are capable of capturing the unknown low-dimensional structures in data and enjoy a fast statistical convergence, which is free of curse of the ambient dimensionality. Our analysis for low-dimensional data builds upon a universal approximation theory of neural networks with Lipschitz continuity guarantees, which may be of independent interest.

preprint2022arXiv

Nonparametric Regression on Low-Dimensional Manifolds using Deep ReLU Networks : Function Approximation and Statistical Recovery

Real world data often exhibit low-dimensional geometric structures, and can be viewed as samples near a low-dimensional manifold. This paper studies nonparametric regression of Hölder functions on low-dimensional manifolds using deep ReLU networks. Suppose $n$ training data are sampled from a Hölder function in $\mathcal{H}^{s,α}$ supported on a $d$-dimensional Riemannian manifold isometrically embedded in $\mathbb{R}^D$, with sub-gaussian noise. A deep ReLU network architecture is designed to estimate the underlying function from the training data. The mean squared error of the empirical estimator is proved to converge in the order of $n^{-\frac{2(s+α)}{2(s+α) + d}}\log^3 n$. This result shows that deep ReLU networks give rise to a fast convergence rate depending on the data intrinsic dimension $d$, which is usually much smaller than the ambient dimension $D$. It therefore demonstrates the adaptivity of deep ReLU networks to low-dimensional geometric structures of data, and partially explains the power of deep ReLU networks in tackling high-dimensional data with low-dimensional geometric structures.

preprint2021arXiv

Multiscale regression on unknown manifolds

We consider the regression problem of estimating functions on $\mathbb{R}^D$ but supported on a $d$-dimensional manifold $ \mathcal{M} \subset \mathbb{R}^D $ with $ d \ll D $. Drawing ideas from multi-resolution analysis and nonlinear approximation, we construct low-dimensional coordinates on $\mathcal{M}$ at multiple scales, and perform multiscale regression by local polynomial fitting. We propose a data-driven wavelet thresholding scheme that automatically adapts to the unknown regularity of the function, allowing for efficient estimation of functions exhibiting nonuniform regularity at different locations and scales. We analyze the generalization error of our method by proving finite sample bounds in high probability on rich classes of priors. Our estimator attains optimal learning rates (up to logarithmic factors) as if the function was defined on a known Euclidean domain of dimension $d$, instead of an unknown manifold embedded in $\mathbb{R}^D$. The implemented algorithm has quasilinear complexity in the sample size, with constants linear in $D$ and exponential in $d$. Our work therefore establishes a new framework for regression on low-dimensional sets embedded in high dimensions, with fast implementation and strong theoretical guarantees.

preprint2015arXiv

MUSIC for multidimensional spectral estimation: stability and super-resolution

This paper presents a performance analysis of the MUltiple SIgnal Classification (MUSIC) algorithm applied on $D$ dimensional single-snapshot spectral estimation while $s$ true frequencies are located on the continuum of a bounded domain. Inspired by the matrix pencil form, we construct a D-fold Hankel matrix from the measurements and exploit its Vandermonde decomposition in the noiseless case. MUSIC amounts to identifying a noise subspace, evaluating a noise-space correlation function, and localizing frequencies by searching the $s$ smallest local minima of the noise-space correlation function. In the noiseless case, $(2s)^D$ measurements guarantee an exact reconstruction by MUSIC as the noise-space correlation function vanishes exactly at true frequencies. When noise exists, we provide an explicit estimate on the perturbation of the noise-space correlation function in terms of noise level, dimension $D$, the minimum separation among frequencies, the maximum and minimum amplitudes while frequencies are separated by two Rayleigh Length (RL) at each direction. As a by-product the maximum and minimum non-zero singular values of the multidimensional Vandermonde matrix whose nodes are on the unit sphere are estimated under a gap condition of the nodes. Under the 2-RL separation condition, if noise is i.i.d. gaussian, we show that perturbation of the noise-space correlation function decays like $\sqrt{\log(\#(\mathbf{N}))/\#(\mathbf{N})}$ as the sample size $\#(\mathbf{N})$ increases. When the separation among frequencies drops below 2 RL, our numerical experiments show that the noise tolerance of MUSIC obeys a power law with the minimum separation of frequencies.

preprint2014arXiv

MUSIC for Single-Snapshot Spectral Estimation: Stability and Super-resolution

This paper studies the problem of line spectral estimation in the continuum of a bounded interval with one snapshot of array measurement. The single-snapshot measurement data is turned into a Hankel data matrix which admits the Vandermonde decomposition and is suitable for the MUSIC algorithm. The MUSIC algorithm amounts to finding the null space (the noise space) of the Hankel matrix, forming the noise-space correlation function and identifying the s smallest local minima of the noise-space correlation as the frequency set. In the noise-free case exact reconstruction is guaranteed for any arbitrary set of frequencies as long as the number of measurements is at least twice the number of distinct frequencies to be recovered. In the presence of noise the stability analysis shows that the perturbation of the noise-space correlation is proportional to the spectral norm of the noise matrix as long as the latter is smaller than the smallest (nonzero) singular value of the noiseless Hankel data matrix. Under the assumption that frequencies are separated by at least twice the Rayleigh Length (RL), the stability of the noise-space correlation is proved by means of novel discrete Ingham inequalities which provide bounds on nonzero singular values of the noiseless Hankel data matrix. The numerical performance of MUSIC is tested in comparison with other algorithms such as BLO-OMP and SDP (TV-min). While BLO-OMP is the stablest algorithm for frequencies separated above 4 RL, MUSIC becomes the best performing one for frequencies separated between 2 RL and 3 RL. Also, MUSIC is more efficient than other methods. MUSIC truly shines when the frequency separation drops to 1 RL or below when all other methods fail. Indeed, the resolution length of MUSIC decreases to zero as noise decreases to zero as a power law with an exponent much smaller than an upper bound established by Donoho.

preprint2013arXiv

Fourier phasing with phase-uncertain mask

Fourier phasing is the problem of retrieving Fourier phase information from Fourier intensity data. The standard Fourier phase retrieval (without a mask) is known to have many solutions which cause the standard phasing algorithms to stagnate and produce wrong or inaccurate solutions. In this paper Fourier phase retrieval is carried out with the introduction of a randomly fabricated mask in measurement and reconstruction. Highly probable uniqueness of solution, up to a global phase, was previously proved with exact knowledge of the mask. Here the uniqueness result is extended to the case where only rough information about the mask's phases is assumed. The exponential probability bound for uniqueness is given in terms of the uncertainty-to-diversity ratio (UDR) of the unknown mask. New phasing algorithms alternating between the object update and the mask update are systematically tested and demonstrated to have the capability of recovering both the object and the mask (within the object support) simultaneously, consistent with the uniqueness result. Phasing with a phase-uncertain mask is shown to be robust with respect to the correlation in the mask as well as the Gaussian and Poisson noises.

preprint2012arXiv

Phase Retrieval with Random Phase Illumination

This paper presents a detailed, numerical study on the performance of the standard phasing algorithms with random phase illumination (RPI). Phasing with high resolution RPI and the oversampling ratio $σ=4$ determines a unique phasing solution up to a global phase factor. Under this condition, the standard phasing algorithms converge rapidly to the true solution without stagnation. Excellent approximation is achieved after a small number of iterations, not just with high resolution but also low resolution RPI in the presence of additive as well multiplicative noises. It is shown that RPI with $σ=2$ is sufficient for phasing complex-valued images under a sector condition and $σ=1$ for phasing nonnegative images. The Error Reduction algorithm with RPI is proved to converge to the true solution under proper conditions.

preprint2011arXiv

Mismatch and resolution in compressive imaging

Highly coherent sensing matrices arise in discretization of continuum problems such as radar and medical imaging when the grid spacing is below the Rayleigh threshold as well as in using highly coherent, redundant dictionaries as sparsifying operators. Algorithms (BOMP, BLOOMP) based on techniques of band exclusion and local optimization are proposed to enhance Orthogonal Matching Pursuit (OMP) and deal with such coherent sensing matrices. BOMP and BLOOMP have provably performance guarantee of reconstructing sparse, widely separated objects {\em independent} of the redundancy and have a sparsity constraint and computational cost similar to OMP's. Numerical study demonstrates the effectiveness of BLOOMP for compressed sensing with highly coherent, redundant sensing matrices.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint