Source author record

Yuejie Chi

Yuejie Chi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Information Theory math.IT math.OC eess.SP math.ST Statistics Theory math.NA Computer Science and Game Theory Applications Artificial Intelligence Methodology Systems and Control

Catalog footprint

What is connected

34works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Beyond Expectations: Learning with Stochastic Dominance Made Practical

Stochastic dominance serves as a general framework for modeling a broad spectrum of decision preferences under uncertainty, with risk aversion as one notable example, as it naturally captures the intrinsic structure of the underlying uncertainty, in contrast to simply resorting to the expectations. Despite theoretical appeal, the application of stochastic dominance in machine learning has been scarce, due to the following challenges: $\textbf{i)}$, the original concept of stochastic dominance only provides a $\textit{partial order}$, and therefore, is not amenable to serve as a general optimality criterion; and $\textbf{ii)}$, an efficient computational recipe remains lacking due to the continuum nature of evaluating stochastic dominance. In this work, we make the first attempt towards establishing a general framework of learning with stochastic dominance. We first generalize the stochastic dominance concept to enable feasible comparisons between any arbitrary pair of random variables. We next develop a simple and computationally efficient approach for finding the optimal solution in terms of stochastic dominance, which can be seamlessly plugged into many learning tasks. Numerical experiments demonstrate that the proposed method achieves comparable performance as standard risk-neutral strategies and obtains better trade-offs against risk across a variety of applications including supervised learning, reinforcement learning, and portfolio optimization.

preprint2026arXiv

Polynomial Convergence of Riemannian Diffusion Models

Diffusion models have demonstrated remarkable empirical success in the recent years and are considered one of the state-of-the-art generative models in modern AI. These models consist of a forward process, which gradually diffuses the data distribution to a noise distribution spanning the whole space, and a backward process, which inverts this transformation to recover the data distribution from noise. Most of the existing literature assumes that the underlying space is Euclidean. However, in many practical applications, the data are constrained to lie on a submanifold of Euclidean space. Addressing this setting, De Bortoli et al. (2022) introduced Riemannian diffusion models and proved that using an exponentially small step size yields a small sampling error in the Wasserstein distance, provided the data distribution is smooth and strictly positive, and the score estimate is $L_\infty$-accurate. In this paper, we greatly strengthen this theory by establishing that, under $L_2$-accurate score estimate, a {\em polynomially small stepsize} suffices to guarantee small sampling error in the total variation distance, without requiring smoothness or positivity of the data distribution. Our analysis only requires mild and standard curvature assumptions on the underlying manifold. The main ingredients in our analysis are Li-Yau estimate for the log-gradient of heat kernel, and Minakshisundaram-Pleijel parametrix expansion of the perturbed heat equation. Our approach opens the door to a sharper analysis of diffusion models on non-Euclidean spaces.

preprint2025arXiv

The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing

We propose $\textsf{ScaledGD($λ$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($λ$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning to combat bad curvatures induced by overparameterization and ill-conditioning. At the expense of light computational overhead incurred by preconditioners, $\textsf{ScaledGD($λ$)}$ is remarkably robust to ill-conditioning compared to vanilla gradient descent ($\textsf{GD}$) even with overprameterization. Specifically, we show that, under the Gaussian design, $\textsf{ScaledGD($λ$)}$ converges to the true low-rank matrix at a constant linear rate after a small number of iterations that scales only logarithmically with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla $\textsf{GD}$ which suffers from a polynomial dependency on the condition number. Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning.

preprint2023arXiv

Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity

This paper concerns the central issues of model robustness and sample efficiency in offline reinforcement learning (RL), which aims to learn to perform decision making from history data without active exploration. Due to uncertainties and variabilities of the environment, it is critical to learn a robust policy -- with as few samples as possible -- that performs well even when the deployed environment deviates from the nominal one used to collect the history dataset. We consider a distributionally robust formulation of offline RL, focusing on tabular robust Markov decision processes with an uncertainty set specified by the Kullback-Leibler divergence in both finite-horizon and infinite-horizon settings. To combat with sample scarcity, a model-based algorithm that combines distributionally robust value iteration with the principle of pessimism in the face of uncertainty is proposed, by penalizing the robust value estimates with a carefully designed data-driven penalty term. Under a mild and tailored assumption of the history dataset that measures distribution shift without requiring full coverage of the state-action space, we establish the finite-sample complexity of the proposed algorithms. We further develop an information-theoretic lower bound, which suggests that learning RMDPs is at least as hard as the standard MDPs when the uncertainty level is sufficient small, and corroborates the tightness of our upper bound up to polynomial factors of the (effective) horizon length for a range of uncertainty levels. To the best our knowledge, this provides the first provably near-optimal robust offline RL algorithm that learns under model uncertainty and partial coverage.

preprint2023arXiv

Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization

This paper investigates the problem of computing the equilibrium of competitive games, which is often modeled as a constrained saddle-point optimization problem with probability simplex constraints. Despite recent efforts in understanding the last-iterate convergence of extragradient methods in the unconstrained setting, the theoretical underpinnings of these methods in the constrained settings, especially those using multiplicative updates, remain highly inadequate, even when the objective function is bilinear. Motivated by the algorithmic role of entropy regularization in single-agent reinforcement learning and game theory, we develop provably efficient extragradient methods to find the quantal response equilibrium (QRE) -- which are solutions to zero-sum two-player matrix games with entropy regularization -- at a linear rate. The proposed algorithms can be implemented in a decentralized manner, where each player executes symmetric and multiplicative updates iteratively using its own payoff without observing the opponent's actions directly. In addition, by controlling the knob of entropy regularization, the proposed algorithms can locate an approximate Nash equilibrium of the unregularized matrix game at a sublinear rate without assuming the Nash equilibrium to be unique. Our methods also lead to efficient policy extragradient algorithms for solving (entropy-regularized) zero-sum Markov games at similar rates. All of our convergence rates are nearly dimension-free, which are independent of the size of the state and action spaces up to logarithm factors, highlighting the positive role of entropy regularization for accelerating convergence.

preprint2023arXiv

Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

Policy optimization, which finds the desired policy by maximizing value functions via optimization techniques, lies at the heart of reinforcement learning (RL). In addition to value maximization, other practical considerations arise as well, including the need of encouraging exploration, and that of ensuring certain structural properties of the learned policy due to safety, resource and operational constraints. These can often be accounted for via regularized RL, which augments the target value function with a structure-promoting regularizer. Focusing on discounted infinite-horizon Markov decision processes, we propose a generalized policy mirror descent (GPMD) algorithm for solving regularized RL. As a generalization of policy mirror descent (arXiv:2102.00135), our algorithm accommodates a general class of convex regularizers and promotes the use of Bregman divergence in cognizant of the regularizer in use. We demonstrate that our algorithm converges linearly to the global solution over an entire range of learning rates, in a dimension-free fashion, even when the regularizer lacks strong convexity and smoothness. In addition, this linear convergence feature is provably stable in the face of inexact policy evaluation and imperfect policy updates. Numerical experiments are provided to corroborate the appealing performance of GPMD.

preprint2022arXiv

A Large Collection of Real-world Pediatric Sleep Studies

Despite being crucial to health and quality of life, sleep -- especially pediatric sleep -- is not yet well understood. This is exacerbated by lack of access to sufficient pediatric sleep data with clinical annotation. In order to accelerate research on pediatric sleep and its connection to health, we create the Nationwide Children's Hospital (NCH) Sleep DataBank and publish it at Physionet and the National Sleep Research Resource (NSRR), which is a large sleep data common with physiological data, clinical data, and tools for analyses. The NCH Sleep DataBank consists of 3,984 polysomnography studies and over 5.6 million clinical observations on 3,673 unique patients between 2017 and 2019 at NCH. The novelties of this dataset include: 1) large-scale sleep dataset suitable for discovering new insights via data mining, 2) explicit focus on pediatric patients, 3) gathered in a real-world clinical setting, and 4) the accompanying rich set of clinical data. The NCH Sleep DataBank is a valuable resource for advancing automatic sleep scoring and real-time sleep disorder prediction, among many other potential scientific discoveries.

preprint2022arXiv

Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization

A major challenge in multi-agent systems is that the system complexity grows dramatically with the number of agents as well as the size of their action spaces, which is typical in real world scenarios such as autonomous vehicles, robotic teams, network routing, etc. It is hence in imminent need to design decentralized or independent algorithms where the update of each agent is only based on their local observations without the need of introducing complex communication/coordination mechanisms. In this work, we study the finite-time convergence of independent entropy-regularized natural policy gradient (NPG) methods for potential games, where the difference in an agent's utility function due to unilateral deviation matches exactly that of a common potential function. The proposed entropy-regularized NPG method enables each agent to deploy symmetric, decentralized, and multiplicative updates according to its own payoff. We show that the proposed method converges to the quantal response equilibrium (QRE) -- the equilibrium to the entropy-regularized game -- at a sublinear rate, which is independent of the size of the action space and grows at most sublinearly with the number of agents. Appealingly, the convergence rate further becomes independent with the number of agents for the important special case of identical-interest games, leading to the first method that converges at a dimension-free rate. Our approach can be used as a smoothing technique to find an approximate Nash equilibrium (NE) of the unregularized problem without assuming that stationary policies are isolated.

preprint2022arXiv

Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity

Offline or batch reinforcement learning seeks to learn a near-optimal policy using history data without active exploration of the environment. To counter the insufficient coverage and sample scarcity of many offline datasets, the principle of pessimism has been recently introduced to mitigate high bias of the estimated values. While pessimistic variants of model-based algorithms (e.g., value iteration with lower confidence bounds) have been theoretically investigated, their model-free counterparts -- which do not require explicit model estimation -- have not been adequately studied, especially in terms of sample efficiency. To address this inadequacy, we study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes, and characterize its sample complexity under the single-policy concentrability assumption which does not require the full coverage of the state-action space. In addition, a variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity. Altogether, this work highlights the efficiency of model-free algorithms in offline RL when used in conjunction with pessimism and variance reduction.

preprint2022arXiv

Scaling and Scalability: Provable Nonconvex Low-Rank Tensor Estimation from Incomplete Measurements

Tensors, which provide a powerful and flexible model for representing multi-attribute data and multi-way interactions, play an indispensable role in modern data science across various fields in science and engineering. A fundamental task is to faithfully recover the tensor from highly incomplete measurements in a statistically and computationally efficient manner. Harnessing the low-rank structure of tensors in the Tucker decomposition, this paper develops a scaled gradient descent (ScaledGD) algorithm to directly recover the tensor factors with tailored spectral initializations, and shows that it provably converges at a linear rate independent of the condition number of the ground truth tensor for two canonical problems -- tensor completion and tensor regression -- as soon as the sample size is above the order of $n^{3/2}$ ignoring other parameter dependencies, where $n$ is the dimension of the tensor. This leads to an extremely scalable approach to low-rank tensor estimation compared with prior art, which suffers from at least one of the following drawbacks: extreme sensitivity to ill-conditioning, high per-iteration costs in terms of memory and computation, or poor sample complexity guarantees. To the best of our knowledge, ScaledGD is the first algorithm that achieves near-optimal statistical and computational complexities simultaneously for low-rank tensor completion with the Tucker decomposition. Our algorithm highlights the power of appropriate preconditioning in accelerating nonconvex statistical estimation, where the iteration-varying preconditioners promote desirable invariance properties of the trajectory with respect to the underlying symmetry in low-rank tensor factorization.

preprint2021arXiv

Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization

Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms in contemporary reinforcement learning. This class of methods is often applied in conjunction with entropy regularization -- an algorithmic scheme that encourages exploration -- and is closely related to soft policy iteration and trust region policy optimization. Despite the empirical success, the theoretical underpinnings for NPG methods remain limited even for the tabular setting. This paper develops $\textit{non-asymptotic}$ convergence guarantees for entropy-regularized NPG methods under softmax parameterization, focusing on discounted Markov decision processes (MDPs). Assuming access to exact policy evaluation, we demonstrate that the algorithm converges linearly -- or even quadratically once it enters a local region around the optimal policy -- when computing optimal value functions of the regularized MDP. Moreover, the algorithm is provably stable vis-à-vis inexactness of policy evaluation. Our convergence results accommodate a wide range of learning rates, and shed light upon the role of entropy regularization in enabling fast convergence.

preprint2021arXiv

Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP), based on a single trajectory of Markovian samples induced by a behavior policy. Focusing on a $γ$-discounted MDP with state space $\mathcal{S}$ and action space $\mathcal{A}$, we demonstrate that the $\ell_{\infty}$-based sample complexity of classical asynchronous Q-learning --- namely, the number of samples needed to yield an entrywise $\varepsilon$-accurate estimate of the Q-function --- is at most on the order of $\frac{1}{μ_{\min}(1-γ)^5\varepsilon^2}+ \frac{t_{mix}}{μ_{\min}(1-γ)}$ up to some logarithmic factor, provided that a proper constant learning rate is adopted. Here, $t_{mix}$ and $μ_{\min}$ denote respectively the mixing time and the minimum state-action occupancy probability of the sample trajectory. The first term of this bound matches the sample complexity in the synchronous case with independent samples drawn from the stationary distribution of the trajectory. The second term reflects the cost taken for the empirical distribution of the Markovian trajectory to reach a steady state, which is incurred at the very beginning and becomes amortized as the algorithm runs. Encouragingly, the above bound improves upon the state-of-the-art result \cite{qu2020finite} by a factor of at least $|\mathcal{S}||\mathcal{A}|$ for all scenarios, and by a factor of at least $t_{mix}|\mathcal{S}||\mathcal{A}|$ for any sufficiently small accuracy level $\varepsilon$. Further, we demonstrate that the scaling on the effective horizon $\frac{1}{1-γ}$ can be improved by means of variance reduction.

preprint2020arXiv

Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction

There is growing interest in large-scale machine learning and optimization over decentralized networks, e.g. in the context of multi-agent learning and federated learning. Due to the imminent need to alleviate the communication burden, the investigation of communication-efficient distributed optimization algorithms - particularly for empirical risk minimization - has flourished in recent years. A large fraction of these algorithms have been developed for the master/slave setting, relying on a central parameter server that can communicate with all agents. This paper focuses on distributed optimization over networks, or decentralized optimization, where each agent is only allowed to aggregate information from its neighbors. By properly adjusting the global gradient estimate via local averaging in conjunction with proper correction, we develop a communication-efficient approximate Newton-type method Network-DANE, which generalizes DANE to the decentralized scenarios. Our key ideas can be applied in a systematic manner to obtain decentralized versions of other master/slave distributed algorithms. A notable development is Network-SVRG/SARAH, which employs variance reduction to further accelerate local computation. We establish linear convergence of Network-DANE and Network-SVRG for strongly convex losses, and Network-SARAH for quadratic losses, which shed light on the impacts of data homogeneity, network connectivity, and local averaging upon the rate of convergence. We further extend Network-DANE to composite optimization by allowing a nonsmooth penalty term. Numerical evidence is provided to demonstrate the appealing performance of our algorithms over competitive baselines, in terms of both communication and computation efficiency. Our work suggests that performing a certain amount of local communications and computations per iteration can substantially improve the overall efficiency.

preprint2020arXiv

Compressed Super-Resolution of Positive Sources

Atomic norm minimization is a convex optimization framework to recover point sources from a subset of their low-pass observations, or equivalently the underlying frequencies of a spectrally-sparse signal. When the amplitudes of the sources are positive, a positive atomic norm can be formulated, and exact recovery can be ensured without imposing a separation between the sources, as long as the number of observations is greater than the number of sources. However, the classic formulation of the atomic norm requires to solve a semidefinite program involving a linear matrix inequality of a size on the order of the signal dimension, which can be prohibitive. In this letter, we introduce a novel "compressed" semidefinite program, which involves a linear matrix inequality of a reduced dimension on the order of the number of sources. We guarantee the tightness of this program under certain conditions on the operator involved in the dimensionality reduction. Finally, we apply the proposed method to direction finding over sparse arrays based on second-order statistics and achieve significant computational savings.

preprint2020arXiv

Convergence of Distributed Stochastic Variance Reduced Methods without Sampling Extra Data

Stochastic variance reduced methods have gained a lot of interest recently for empirical risk minimization due to its appealing run time complexity. When the data size is large and disjointly stored on different machines, it becomes imperative to distribute the implementation of such variance reduced methods. In this paper, we consider a general framework that directly distributes popular stochastic variance reduced methods in the master/slave model, by assigning outer loops to the parameter server, and inner loops to worker machines. This framework is natural and friendly to implement, but its theoretical convergence is not well understood. We obtain a comprehensive understanding of algorithmic convergence with respect to data homogeneity by measuring the smoothness of the discrepancy between the local and global loss functions. We establish the linear convergence of distributed versions of a family of stochastic variance reduced algorithms, including those using accelerated and recursive gradient updates, for minimizing strongly convex losses. Our theory captures how the convergence of distributed algorithms behaves as the number of machines and the size of local data vary. Furthermore, we show that when the data are less balanced, regularization can be used to ensure convergence at a slower rate. We also demonstrate that our analysis can be further extended to handle nonconvex loss functions.

preprint2020arXiv

Guaranteed Recovery of One-Hidden-Layer Neural Networks via Cross Entropy

We study model recovery for data classification, where the training labels are generated from a one-hidden-layer neural network with sigmoid activations, also known as a single-layer feedforward network, and the goal is to recover the weights of the neural network. We consider two network models, the fully-connected network (FCN) and the non-overlapping convolutional neural network (CNN). We prove that with Gaussian inputs, the empirical risk based on cross entropy exhibits strong convexity and smoothness {\em uniformly} in a local neighborhood of the ground truth, as soon as the sample complexity is sufficiently large. This implies that if initialized in this neighborhood, gradient descent converges linearly to a critical point that is provably close to the ground truth. Furthermore, we show such an initialization can be obtained via the tensor method. This establishes the global convergence guarantee for empirical risk minimization using cross entropy via gradient descent for learning one-hidden-layer neural networks, at the near-optimal sample and computational complexity with respect to the network input dimension without unrealistic assumptions such as requiring a fresh set of samples at each iteration.

preprint2020arXiv

Learning Latent Features with Pairwise Penalties in Low-Rank Matrix Completion

Low-rank matrix completion has achieved great success in many real-world data applications. A matrix factorization model that learns latent features is usually employed and, to improve prediction performance, the similarities between latent variables can be exploited by pairwise learning using the graph regularized matrix factorization (GRMF) method. However, existing GRMF approaches often use the squared loss to measure the pairwise differences, which may be overly influenced by dissimilar pairs and lead to inferior prediction. To fully empower pairwise learning for matrix completion, we propose a general optimization framework that allows a rich class of (non-)convex pairwise penalty functions. A new and efficient algorithm is developed to solve the proposed optimization problem, with a theoretical convergence guarantee under mild assumptions. In an important situation where the latent variables form a small number of subgroups, its statistical guarantee is also fully considered. In particular, we theoretically characterize the performance of the complexity-regularized maximum likelihood estimator, as a special case of our framework, which is shown to have smaller errors when compared to the standard matrix completion framework without pairwise penalties. We conduct extensive experiments on both synthetic and real datasets to demonstrate the superior performance of this general framework.

preprint2020arXiv

Subspace Estimation from Unbalanced and Incomplete Data Matrices: $\ell_{2,\infty}$ Statistical Guarantees

This paper is concerned with estimating the column space of an unknown low-rank matrix $\boldsymbol{A}^{\star}\in\mathbb{R}^{d_{1}\times d_{2}}$, given noisy and partial observations of its entries. There is no shortage of scenarios where the observations -- while being too noisy to support faithful recovery of the entire matrix -- still convey sufficient information to enable reliable estimation of the column space of interest. This is particularly evident and crucial for the highly unbalanced case where the column dimension $d_{2}$ far exceeds the row dimension $d_{1}$, which is the focal point of the current paper. We investigate an efficient spectral method, which operates upon the sample Gram matrix with diagonal deletion. While this algorithmic idea has been studied before, we establish new statistical guarantees for this method in terms of both $\ell_{2}$ and $\ell_{2,\infty}$ estimation accuracy, which improve upon prior results if $d_{2}$ is substantially larger than $d_{1}$. To illustrate the effectiveness of our findings, we derive matching minimax lower bounds with respect to the noise levels, and develop consequences of our general theory for three applications of practical importance: (1) tensor completion from noisy data, (2) covariance estimation / principal component analysis with missing data, and (3) community recovery in bipartite graphs. Our theory leads to improved performance guarantees for all three cases.

preprint2019arXiv

Harnessing Sparsity over the Continuum: Atomic Norm Minimization for Super Resolution

Convex optimization recently emerges as a compelling framework for performing super resolution, garnering significant attention from multiple communities spanning signal processing, applied mathematics, and optimization. This article offers a friendly exposition to atomic norm minimization as a canonical convex approach to solve super resolution problems. The mathematical foundations and performances guarantees of this approach are presented, and its application in super resolution image reconstruction for single-molecule fluorescence microscopy are highlighted.

preprint2019arXiv

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation problems. Due to the highly nonconvex nature of the empirical loss, state-of-the-art procedures often require proper regularization (e.g. trimming, regularized cost, projection) in order to guarantee fast convergence. For vanilla procedures such as gradient descent, however, prior theory either recommends highly conservative learning rates to avoid overshooting, or completely lacks performance guarantees. This paper uncovers a striking phenomenon in nonconvex optimization: even in the absence of explicit regularization, gradient descent enforces proper regularization implicitly under various statistical models. In fact, gradient descent follows a trajectory staying within a basin that enjoys nice geometry, consisting of points incoherent with the sampling mechanism. This "implicit regularization" feature allows gradient descent to proceed in a far more aggressive fashion without overshooting, which in turn results in substantial computational savings. Focusing on three fundamental statistical estimation problems, i.e. phase retrieval, low-rank matrix completion, and blind deconvolution, we establish that gradient descent achieves near-optimal statistical and computational guarantees without explicit regularization. In particular, by marrying statistical modeling with generic optimization theory, we develop a general recipe for analyzing the trajectories of iterative algorithms via a leave-one-out perturbation argument. As a byproduct, for noisy matrix completion, we demonstrate that gradient descent achieves near-optimal error control --- measured entrywise and by the spectral norm --- which might be of independent interest.

preprint2019arXiv

Vector-Valued Graph Trend Filtering with Non-Convex Penalties

This work studies the denoising of piecewise smooth graph signals that exhibit inhomogeneous levels of smoothness over a graph, where the value at each node can be vector-valued. We extend the graph trend filtering framework to denoising vector-valued graph signals with a family of non-convex regularizers, which exhibit superior recovery performance over existing convex regularizers. Using an oracle inequality, we establish the statistical error rates of first-order stationary points of the proposed non-convex method for generic graphs. Furthermore, we present an ADMM-based algorithm to solve the proposed method and establish its convergence. Numerical experiments are conducted on both synthetic and real-world data for denoising, support recovery, event detection, and semi-supervised classification.

preprint2017arXiv

Subspace Learning From Bits

Networked sensing, where the goal is to perform complex inference using a large number of inexpensive and decentralized sensors, has become an increasingly attractive research topic due to its applications in wireless sensor networks and internet-of-things. To reduce the communication, sensing and storage complexity, this paper proposes a simple sensing and estimation framework to faithfully recover the principal subspace of high-dimensional data streams using a collection of binary measurements from distributed sensors, without transmitting the whole data. The binary measurements are designed to indicate comparison outcomes of aggregated energy projections of the data samples over pairs of randomly selected directions. When the covariance matrix is a low-rank matrix, we propose a spectral estimator that recovers the principal subspace of the covariance matrix as the subspace spanned by the top eigenvectors of a properly designed surrogate matrix, which is provably accurate as soon as the number of binary measurements is sufficiently large. An adaptive rank selection strategy based on soft thresholding is also presented. Furthermore, we propose a tailored spectral estimator when the covariance matrix is additionally Toeplitz, and show reliable estimation can be obtained from a substantially smaller number of binary measurements. Our results hold even when a constant fraction of the binary measurements is randomly flipped. Finally, we develop a low-complexity online algorithm to track the principal subspace when new measurements arrive sequentially. Numerical examples are provided to validate the proposed approach.

preprint2016arXiv

Guaranteed Blind Sparse Spikes Deconvolution via Lifting and Convex Optimization

Neural recordings, returns from radars and sonars, images in astronomy and single-molecule microscopy can be modeled as a linear superposition of a small number of scaled and delayed copies of a band-limited or diffraction-limited point spread function, which is either determined by the nature or designed by the users; in other words, we observe the convolution between a point spread function and a sparse spike signal with unknown amplitudes and delays. While it is of great interest to accurately resolve the spike signal from as few samples as possible, however, when the point spread function is not known a priori, this problem is terribly ill-posed. This paper proposes a convex optimization framework to simultaneously estimate the point spread function as well as the spike signal, by mildly constraining the point spread function to lie in a known low-dimensional subspace. By applying the lifting trick, we obtain an underdetermined linear system of an ensemble of signals with joint spectral sparsity, to which atomic norm minimization is applied. Under mild randomness assumptions of the low-dimensional subspace as well as a separation condition of the spike signal, we prove the proposed algorithm, dubbed as AtomicLift, is guaranteed to recover the spike signal up to a scaling factor as soon as the number of samples is large enough. The extension of AtomicLift to handle noisy measurements is also discussed. Numerical examples are provided to validate the effectiveness of the proposed approaches.

preprint2016arXiv

Low-Rank Positive Semidefinite Matrix Recovery from Corrupted Rank-One Measurements

We study the problem of estimating a low-rank positive semidefinite (PSD) matrix from a set of rank-one measurements using sensing vectors composed of i.i.d. standard Gaussian entries, which are possibly corrupted by arbitrary outliers. This problem arises from applications such as phase retrieval, covariance sketching, quantum space tomography, and power spectrum estimation. We first propose a convex optimization algorithm that seeks the PSD matrix with the minimum $\ell_1$-norm of the observation residual. The advantage of our algorithm is that it is free of parameters, therefore eliminating the need for tuning parameters and allowing easy implementations. We establish that with high probability, a low-rank PSD matrix can be exactly recovered as soon as the number of measurements is large enough, even when a fraction of the measurements are corrupted by outliers with arbitrary magnitudes. Moreover, the recovery is also stable against bounded noise. With the additional information of an upper bound of the rank of the PSD matrix, we propose another non-convex algorithm based on subgradient descent that demonstrates excellent empirical performance in terms of computational efficiency and accuracy.

preprint2016arXiv

Reshaped Wirtinger Flow and Incremental Algorithm for Solving Quadratic System of Equations

We study the phase retrieval problem, which solves quadratic system of equations, i.e., recovers a vector $\boldsymbol{x}\in \mathbb{R}^n$ from its magnitude measurements $y_i=|\langle \boldsymbol{a}_i, \boldsymbol{x}\rangle|, i=1,..., m$. We develop a gradient-like algorithm (referred to as RWF representing reshaped Wirtinger flow) by minimizing a nonconvex nonsmooth loss function. In comparison with existing nonconvex Wirtinger flow (WF) algorithm \cite{candes2015phase}, although the loss function becomes nonsmooth, it involves only the second power of variable and hence reduces the complexity. We show that for random Gaussian measurements, RWF enjoys geometric convergence to a global optimal point as long as the number $m$ of measurements is on the order of $n$, the dimension of the unknown $\boldsymbol{x}$. This improves the sample complexity of WF, and achieves the same sample complexity as truncated Wirtinger flow (TWF) \cite{chen2015solving}, but without truncation in gradient loop. Furthermore, RWF costs less computationally than WF, and runs faster numerically than both WF and TWF. We further develop the incremental (stochastic) reshaped Wirtinger flow (IRWF) and show that IRWF converges linearly to the true signal. We further establish performance guarantee of an existing Kaczmarz method for the phase retrieval problem based on its connection to IRWF. We also empirically demonstrate that IRWF outperforms existing ITWF algorithm (stochastic version of TWF) as well as other batch algorithms.

preprint2015arXiv

Exact and Stable Covariance Estimation from Quadratic Sampling via Convex Programming

Statistical inference and information processing of high-dimensional data often require efficient and accurate estimation of their second-order statistics. With rapidly changing data, limited processing power and storage at the acquisition devices, it is desirable to extract the covariance structure from a single pass over the data and a small number of stored measurements. In this paper, we explore a quadratic (or rank-one) measurement model which imposes minimal memory requirements and low computational complexity during the sampling process, and is shown to be optimal in preserving various low-dimensional covariance structures. Specifically, four popular structural assumptions of covariance matrices, namely low rank, Toeplitz low rank, sparsity, jointly rank-one and sparse structure, are investigated, while recovery is achieved via convex relaxation paradigms for the respective structure. The proposed quadratic sampling framework has a variety of potential applications including streaming data processing, high-frequency wireless communication, phase space tomography and phase retrieval in optics, and non-coherent subspace detection. Our method admits universally accurate covariance estimation in the absence of noise, as soon as the number of measurements exceeds the information theoretic limits. We also demonstrate the robustness of this approach against noise and imperfect structural assumptions. Our analysis is established upon a novel notion called the mixed-norm restricted isometry property (RIP-$\ell_{2}/\ell_{1}$), as well as the conventional RIP-$\ell_{2}/\ell_{2}$ for near-isotropic and bounded measurements. In addition, our results improve upon the best-known phase retrieval (including both dense and sparse signals) guarantees using PhaseLift with a significantly simpler approach.

preprint2015arXiv

Off-the-Grid Line Spectrum Denoising and Estimation with Multiple Measurement Vectors

Compressed Sensing suggests that the required number of samples for reconstructing a signal can be greatly reduced if it is sparse in a known discrete basis, yet many real-world signals are sparse in a continuous dictionary. One example is the spectrally-sparse signal, which is composed of a small number of spectral atoms with arbitrary frequencies on the unit interval. In this paper we study the problem of line spectrum denoising and estimation with an ensemble of spectrally-sparse signals composed of the same set of continuous-valued frequencies from their partial and noisy observations. Two approaches are developed based on atomic norm minimization and structured covariance estimation, both of which can be solved efficiently via semidefinite programming. The first approach aims to estimate and denoise the set of signals from their partial and noisy observations via atomic norm minimization, and recover the frequencies via examining the dual polynomial of the convex program. We characterize the optimality condition of the proposed algorithm and derive the expected convergence rate for denoising, demonstrating the benefit of including multiple measurement vectors. The second approach aims to recover the population covariance matrix from the partially observed sample covariance matrix by motivating its low-rank Toeplitz structure without recovering the signal ensemble. Performance guarantee is derived with a finite number of measurement vectors. The frequencies can be recovered via conventional spectrum estimation methods such as MUSIC from the estimated covariance matrix. Finally, numerical examples are provided to validate the favorable performance of the proposed algorithms, with comparisons against several existing approaches.

preprint2015arXiv

Super-Resolution of Mutually Interfering Signals

We consider simultaneously identifying the membership and locations of point sources that are convolved with different low-pass point spread functions, from the observation of their superpositions. This problem arises in three-dimensional super-resolution single-molecule imaging, neural spike sorting, multi-user channel identification, among others. We propose a novel algorithm, based on convex programming, and establish its near-optimal performance guarantee for exact recovery by exploiting the sparsity of the point source model as well as incoherence between the point spread functions. Numerical examples are provided to demonstrate the effectiveness of the proposed approach.

preprint2014arXiv

Joint Sparsity Recovery for Spectral Compressed Sensing

Compressed Sensing (CS) is an effective approach to reduce the required number of samples for reconstructing a sparse signal in an a priori basis, but may suffer severely from the issue of basis mismatch. In this paper we study the problem of simultaneously recovering multiple spectrally-sparse signals that are supported on the same frequencies lying arbitrarily on the unit circle. We propose an atomic norm minimization problem, which can be regarded as a continuous counterpart of the discrete CS formulation and be solved efficiently via semidefinite programming. Through numerical experiments, we show that the number of samples per signal may be further reduced by harnessing the joint sparsity pattern of multiple signals.

preprint2014arXiv

Robust Spectral Compressed Sensing via Structured Matrix Completion

The paper explores the problem of \emph{spectral compressed sensing}, which aims to recover a spectrally sparse signal from a small random subset of its $n$ time domain samples. The signal of interest is assumed to be a superposition of $r$ multi-dimensional complex sinusoids, while the underlying frequencies can assume any \emph{continuous} values in the normalized frequency domain. Conventional compressed sensing paradigms suffer from the basis mismatch issue when imposing a discrete dictionary on the Fourier representation. To address this issue, we develop a novel algorithm, called \emph{Enhanced Matrix Completion (EMaC)}, based on structured matrix completion that does not require prior knowledge of the model order. The algorithm starts by arranging the data into a low-rank enhanced form exhibiting multi-fold Hankel structure, and then attempts recovery via nuclear norm minimization. Under mild incoherence conditions, EMaC allows perfect recovery as soon as the number of samples exceeds the order of $r\log^{4}n$, and is stable against bounded noise. Even if a constant portion of samples are corrupted with arbitrary magnitude, EMaC still allows exact recovery, provided that the sample complexity exceeds the order of $r^{2}\log^{3}n$. Along the way, our results demonstrate the power of convex relaxation in completing a low-rank multi-fold Hankel or Toeplitz matrix from minimal observed entries. The performance of our algorithm and its applicability to super resolution are further validated by numerical experiments.

preprint2013arXiv

Compressive Demodulation of Mutually Interfering Signals

Multi-User Detection is fundamental not only to cellular wireless communication but also to Radio-Frequency Identification (RFID) technology that supports supply chain management. The challenge of Multi-user Detection (MUD) is that of demodulating mutually interfering signals, and the two biggest impediments are the asynchronous character of random access and the lack of channel state information. Given that at any time instant the number of active users is typically small, the promise of Compressive Sensing (CS) is the demodulation of sparse superpositions of signature waveforms from very few measurements. This paper begins by unifying two front-end architectures proposed for MUD by showing that both lead to the same discrete signal model. Algorithms are presented for coherent and noncoherent detection that are based on iterative matching pursuit. Noncoherent detection is all that is needed in the application to RFID technology where it is only the identity of the active users that is required. The coherent detector is also able to recover the transmitted symbols. It is shown that compressive demodulation requires $\mathcal{O}(K\log N(τ+1))$ samples to recover $K$ active users whereas standard MUD requires $N(τ+1)$ samples to process $N$ total users with a maximal delay $τ$. Performance guarantees are derived for both coherent and noncoherent detection that are identical in the way they scale with number of active users. The power profile of the active users is shown to be less important than the SNR of the weakest user. Gabor frames and Kerdock codes are proposed as signature waveforms and numerical examples demonstrate the superior performance of Kerdock codes - the same probability of error with less than half the samples.

preprint2013arXiv

PETRELS: Parallel Subspace Estimation and Tracking by Recursive Least Squares from Partial Observations

Many real world data sets exhibit an embedding of low-dimensional structure in a high-dimensional manifold. Examples include images, videos and internet traffic data. It is of great significance to reduce the storage requirements and computational complexity when the data dimension is high. Therefore we consider the problem of reconstructing a data stream from a small subset of its entries, where the data is assumed to lie in a low-dimensional linear subspace, possibly corrupted by noise. We further consider tracking the change of the underlying subspace, which can be applied to applications such as video denoising, network monitoring and anomaly detection. Our problem can be viewed as a sequential low-rank matrix completion problem in which the subspace is learned in an on-line fashion. The proposed algorithm, dubbed Parallel Estimation and Tracking by REcursive Least Squares (PETRELS), first identifies the underlying low-dimensional subspace via a recursive procedure for each row of the subspace matrix in parallel with discounting for previous observations, and then reconstructs the missing entries via least-squares estimation if required. Numerical examples are provided for direction-of-arrival estimation and matrix completion, comparing PETRELS with state of the art batch algorithms.

preprint2013arXiv

Spectral Compressed Sensing via Structured Matrix Completion

The paper studies the problem of recovering a spectrally sparse object from a small number of time domain samples. Specifically, the object of interest with ambient dimension $n$ is assumed to be a mixture of $r$ complex multi-dimensional sinusoids, while the underlying frequencies can assume any value in the unit disk. Conventional compressed sensing paradigms suffer from the {\em basis mismatch} issue when imposing a discrete dictionary on the Fourier representation. To address this problem, we develop a novel nonparametric algorithm, called enhanced matrix completion (EMaC), based on structured matrix completion. The algorithm starts by arranging the data into a low-rank enhanced form with multi-fold Hankel structure, then attempts recovery via nuclear norm minimization. Under mild incoherence conditions, EMaC allows perfect recovery as soon as the number of samples exceeds the order of $\mathcal{O}(r\log^{2} n)$. We also show that, in many instances, accurate completion of a low-rank multi-fold Hankel matrix is possible when the number of observed entries is proportional to the information theoretical limits (except for a logarithmic gap). The robustness of EMaC against bounded noise and its applicability to super resolution are further demonstrated by numerical experiments.

preprint2012arXiv

Coherence-Based Performance Guarantees of Orthogonal Matching Pursuit

In this paper, we present coherence-based performance guarantees of Orthogonal Matching Pursuit (OMP) for both support recovery and signal reconstruction of sparse signals when the measurements are corrupted by noise. In particular, two variants of OMP either with known sparsity level or with a stopping rule are analyzed. It is shown that if the measurement matrix $X\in\mathbb{C}^{n\times p}$ satisfies the strong coherence property, then with $n\gtrsim\mathcal{O}(k\log p)$, OMP will recover a $k$-sparse signal with high probability. In particular, the performance guarantees obtained here separate the properties required of the measurement matrix from the properties required of the signal, which depends critically on the minimum signal to noise ratio rather than the power profiles of the signal. We also provide performance guarantees for partial support recovery. Comparisons are given with other performance guarantees for OMP using worst-case analysis and the sorted one step thresholding algorithm.

Yuejie Chi

What is connected

Connect this record

See the researcher in context

Building this map preview

34 published item(s)

Beyond Expectations: Learning with Stochastic Dominance Made Practical

Polynomial Convergence of Riemannian Diffusion Models

The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing

Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity

Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization

Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

A Large Collection of Real-world Pediatric Sleep Studies

Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization

Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity

Scaling and Scalability: Provable Nonconvex Low-Rank Tensor Estimation from Incomplete Measurements

Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization

Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction

Compressed Super-Resolution of Positive Sources

Convergence of Distributed Stochastic Variance Reduced Methods without Sampling Extra Data

Guaranteed Recovery of One-Hidden-Layer Neural Networks via Cross Entropy

Learning Latent Features with Pairwise Penalties in Low-Rank Matrix Completion

Subspace Estimation from Unbalanced and Incomplete Data Matrices: $\ell_{2,\infty}$ Statistical Guarantees

Harnessing Sparsity over the Continuum: Atomic Norm Minimization for Super Resolution

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

Vector-Valued Graph Trend Filtering with Non-Convex Penalties

Subspace Learning From Bits

Guaranteed Blind Sparse Spikes Deconvolution via Lifting and Convex Optimization

Low-Rank Positive Semidefinite Matrix Recovery from Corrupted Rank-One Measurements

Reshaped Wirtinger Flow and Incremental Algorithm for Solving Quadratic System of Equations

Exact and Stable Covariance Estimation from Quadratic Sampling via Convex Programming

Off-the-Grid Line Spectrum Denoising and Estimation with Multiple Measurement Vectors

Super-Resolution of Mutually Interfering Signals

Joint Sparsity Recovery for Spectral Compressed Sensing

Robust Spectral Compressed Sensing via Structured Matrix Completion

Compressive Demodulation of Mutually Interfering Signals

PETRELS: Parallel Subspace Estimation and Tracking by Recursive Least Squares from Partial Observations

Spectral Compressed Sensing via Structured Matrix Completion

Coherence-Based Performance Guarantees of Orthogonal Matching Pursuit