Source author record

Yi Ma

Yi Ma appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

54works

24topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DexHoldem: Playing Texas Hold'em with Dexterous Embodied System

Evaluating embodied systems on real dexterous hardware requires more than isolated primitive skills: an agent must perceive a changing tabletop scene, choose a context-appropriate action, execute it with a dexterous hand, and leave the scene usable for later decisions. We introduce DexHoldem, a real-world system-level benchmark built around Texas Hold'em dexterous manipulation with a ShadowHand. DexHoldem provides 1,470 teleoperated demonstrations across 14 Texas Hold'em manipulation primitives, a standardized physical policy benchmark, and an agentic perception benchmark that tests whether agents can recover the structured game state needed for embodied decision making. On primitive execution, $π_{0.5}$ obtains the highest task completion rate ($61.2\%$), while $π_{0.5}$ and $π_0$ tie on scene-preserving success rate ($47.5\%$). On agentic perception, Opus 4.7 obtains the best strict problem-level accuracy ($34.3\%$), while GPT 5.5 obtains the best average field-wise accuracy ($66.8\%$), exposing a gap between isolated visual sub-capabilities and complete routing-relevant state recovery. Finally, we instantiate the full embodied-agent loop in three case studies, where waiting, recovery dispatches, human-help requests, and repeated primitive execution reveal how perception and policy errors accumulate during closed-loop deployment. DexHoldem therefore evaluates dexterous tabletop execution, agentic perception, and embodied decision routing in a shared physical setting. Project page: https://dexholdem.github.io/Dexholdem/.

preprint2025arXiv

Coordinated Humanoid Manipulation with Choice Policies

Humanoid robots hold great promise for operating in human-centric environments, yet achieving robust whole-body coordination across the head, hands, and legs remains a major challenge. We present a system that combines a modular teleoperation interface with a scalable learning framework to address this problem. Our teleoperation design decomposes humanoid control into intuitive submodules, which include hand-eye coordination, grasp primitives, arm end-effector tracking, and locomotion. This modularity allows us to collect high-quality demonstrations efficiently. Building on this, we introduce Choice Policy, an imitation learning approach that generates multiple candidate actions and learns to score them. This architecture enables both fast inference and effective modeling of multimodal behaviors. We validate our approach on two real-world tasks: dishwasher loading and whole-body loco-manipulation for whiteboard wiping. Experiments show that Choice Policy significantly outperforms diffusion policies and standard behavior cloning. Furthermore, our results indicate that hand-eye coordination is critical for success in long-horizon tasks. Our work demonstrates a practical path toward scalable data collection and learning for coordinated humanoid manipulation in unstructured environments.

preprint2024arXiv

Gradient weighting for speaker verification in extremely low Signal-to-Noise Ratio

Speaker verification is hampered by background noise, particularly at extremely low Signal-to-Noise Ratio (SNR) under 0 dB. It is difficult to suppress noise without introducing unwanted artifacts, which adversely affects speaker verification. We proposed the mechanism called Gradient Weighting (Grad-W), which dynamically identifies and reduces artifact noise during prediction. The mechanism is based on the property that the gradient indicates which parts of the input the model is paying attention to. Specifically, when the speaker network focuses on a region in the denoised utterance but not on the clean counterpart, we consider it artifact noise and assign higher weights for this region during optimization of enhancement. We validate it by training an enhancement model and testing the enhanced utterance on speaker verification. The experimental results show that our approach effectively reduces artifact noise, improving speaker verification across various SNR levels.

preprint2022arXiv

A Design of Low-Projection SCMA Codebooks for Ultra-Low Decoding Complexity in Downlink IoT Networks

This paper conceives a novel sparse code multiple access (SCMA) codebook design which is motivated by the strong need for providing ultra-low decoding complexity and good error performance in downlink Internet-of-things (IoT) networks, in which a massive number of low-end and low-cost IoT communication devices are served. By focusing on the typical Rician fading channels, we analyze the pair-wise probability of superimposed SCMA codewords and then deduce the design metrics for multi-dimensional constellation construction and sparse codebook optimization. For significant reduction of the decoding complexity, we advocate the key idea of projecting the multi-dimensional constellation elements to a few overlapped complex numbers in each dimension, called low projection (LP). An emerging modulation scheme, called golden angle modulation (GAM), is considered for multi-stage LP optimization, where the resultant multi-dimensional constellation is called LP-GAM. Our analysis and simulation results show the superiority of the proposed LP codebooks (LPCBs) including one-shot decoding convergence and excellent error rate performance. In particular, the proposed LPCBs lead to decoding complexity reduction by at least $97\%$ compared to that of the conventional codebooks, whilst owning large minimum Euclidean distance. Some examples of the proposed LPCBs are available at \url{https://github.com/ethanlq/SCMA-codebook}.

preprint2022arXiv

A Novel Multi-Task Learning Empowered Codebook Design for Downlink SCMA Networks

Sparse code multiple access (SCMA) is a promising code-domain non-orthogonal multiple access (NOMA) scheme for the enabling of massive machine-type communication. In SCMA, the design of good sparse codebooks and efficient multiuser decoding have attracted tremendous research attention in the past few years. This paper aims to leverage deep learning to jointly design the downlink SCMA encoder and decoder with the aid of autoencoder. We introduce a novel end-to-end learning based SCMA (E2E-SCMA) design framework, under which improved sparse codebooks and low-complexity decoder are obtained. Compared to conventional SCMA schemes, our numerical results show that the proposed E2E-SCMA leads to significant improvements in terms of error rate and computational complexity.

preprint2022arXiv

Closed-Loop Data Transcription to an LDR via Minimaxing Rate Reduction

This work proposes a new computational framework for learning a structured generative model for real-world datasets. In particular, we propose to learn a closed-loop transcription between a multi-class multi-dimensional data distribution and a linear discriminative representation (LDR) in the feature space that consists of multiple independent multi-dimensional linear subspaces. In particular, we argue that the optimal encoding and decoding mappings sought can be formulated as the equilibrium point of a two-player minimax game between the encoder and decoder. A natural utility function for this game is the so-called rate reduction, a simple information-theoretic measure for distances between mixtures of subspace-like Gaussians in the feature space. Our formulation draws inspiration from closed-loop error feedback from control systems and avoids expensive evaluating and minimizing approximated distances between arbitrary distributions in either the data space or the feature space. To a large extent, this new formulation unifies the concepts and benefits of Auto-Encoding and GAN and naturally extends them to the settings of learning a both discriminative and generative representation for multi-class and multi-dimensional real-world data. Our extensive experiments on many benchmark imagery datasets demonstrate tremendous potential of this new closed-loop formulation: under fair comparison, visual quality of the learned decoder and classification performance of the encoder is competitive and often better than existing methods based on GAN, VAE, or a combination of both. Unlike existing generative models, the so learned features of the multiple classes are structured: different classes are explicitly mapped onto corresponding independent principal subspaces in the feature space. Source code can be found at https://github.com/Delay-Xili/LDR.

preprint2022arXiv

Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning

Many goal-reaching reinforcement learning (RL) tasks have empirically verified that rewarding the agent on subgoals improves convergence speed and practical performance. We attempt to provide a theoretical framework to quantify the computational benefits of rewarding the completion of subgoals, in terms of the number of synchronous value iterations. In particular, we consider subgoals as one-way {\em intermediate states}, which can only be visited once per episode and propose two settings that consider these one-way intermediate states: the one-way single-path (OWSP) and the one-way multi-path (OWMP) settings. In both OWSP and OWMP settings, we demonstrate that adding {\em intermediate rewards} to subgoals is more computationally efficient than only rewarding the agent once it completes the goal of reaching a terminal state. We also reveal a trade-off between computational complexity and the pursuit of the shortest path in the OWMP setting: adding intermediate rewards significantly reduces the computational complexity of reaching the goal but the agent may not find the shortest path, whereas with sparse terminal rewards, the agent finds the shortest path at a significantly higher computational cost. We also corroborate our theoretical results with extensive experiments on the MiniGrid environments using Q-learning and some popular deep RL algorithms.

preprint2022arXiv

DeepFGS: Fine-Grained Scalable Coding for Learned Image Compression

Scalable coding, which can adapt to channel bandwidth variation, performs well in today's complex network environment. However, the existing scalable compression methods face two challenges: reduced compression performance and insufficient scalability. In this paper, we propose the first learned fine-grained scalable image compression model (DeepFGS) to overcome the above two shortcomings. Specifically, we introduce a feature separation backbone to divide the image information into basic and scalable features, then redistribute the features channel by channel through an information rearrangement strategy. In this way, we can generate a continuously scalable bitstream via one-pass encoding. In addition, we reuse the decoder to reduce the parameters and computational complexity of DeepFGS. Experiments demonstrate that our DeepFGS outperforms all learning-based scalable image compression models and conventional scalable image codecs in PSNR and MS-SSIM metrics. To the best of our knowledge, our DeepFGS is the first exploration of learned fine-grained scalable coding, which achieves the finest scalability compared with learning-based methods.

preprint2022arXiv

Efficient Maximal Coding Rate Reduction by Variational Forms

The principle of Maximal Coding Rate Reduction (MCR$^2$) has recently been proposed as a training objective for learning discriminative low-dimensional structures intrinsic to high-dimensional data to allow for more robust training than standard approaches, such as cross-entropy minimization. However, despite the advantages that have been shown for MCR$^2$ training, MCR$^2$ suffers from a significant computational cost due to the need to evaluate and differentiate a significant number of log-determinant terms that grows linearly with the number of classes. By taking advantage of variational forms of spectral functions of a matrix, we reformulate the MCR$^2$ objective to a form that can scale significantly without compromising training accuracy. Experiments in image classification demonstrate that our proposed formulation results in a significant speed up over optimizing the original MCR$^2$ objective directly and often results in higher quality learned representations. Further, our approach may be of independent interest in other models that require computation of log-determinant forms, such as in system identification or normalizing flow models.

preprint2022arXiv

Fully Convolutional Line Parsing

We present a one-stage Fully Convolutional Line Parsing network (F-Clip) that detects line segments from images. The proposed network is very simple and flexible with variations that gracefully trade off between speed and accuracy for different applications. F-Clip detects line segments in an end-to-end fashion by predicting each line's center position, length, and angle. We further customize the design of convolution kernels of our fully convolutional network to effectively exploit the statistical priors of the distribution of line angles in real image datasets. We conduct extensive experiments and show that our method achieves a significantly better trade-off between efficiency and accuracy, resulting in a real-time line detector at up to 73 FPS on a single GPU. Such inference speed makes our method readily applicable to real-time tasks without compromising any accuracy of previous methods. Moreover, when equipped with a performance-improving backbone network, F-Clip is able to significantly outperform all state-of-the-art line detectors on accuracy at a similar or even higher frame rate. In other word, under same inference speed, F-Clip always achieving best accuracy compare with other methods. Source code https://github.com/Delay-Xili/F-Clip.

preprint2022arXiv

Network-ELAA Beamforming and Coverage Analysis for eMBB/URLLC in Spatially Non-Stationary Rician Channels

In vehicle-to-infrastructure (V2I) networks, a cluster of multi-antenna access points (APs) can collaboratively conduct transmitter beamforming to provide data services (e.g., eMBB or URLLC). The collaboration between APs effectively forms a networked linear antenna-array with extra-large aperture (i.e., network-ELAA), where the wireless channel exhibits spatial nonstationarity. Major contribution of this work lies in the analysis of beamforming gain and radio coverage for network-ELAA non-stationary Rician channels considering the AP clustering. Assuming that: 1) the total transmit-power is fixed and evenly distributed over APs, 2) the beam is formed only based on the line-of-sight (LoS) path, it is found that the beamforming gain is concave to the cluster size. The optimum size of the AP cluster varies with respect to the user's location, channel uncertainty as well as data services. A user located farther from the ELAA requires a larger cluster size. URLLC is more sensitive to the channel uncertainty when comparing to eMBB, thus requiring a larger cluster size to mitigate the channel fading effect and extend the coverage. Finally, it is shown that the network-ELAA can offer significant coverage extension (50% or more in most of cases) when comparing with the single-AP scenario.

preprint2022arXiv

On the Convergence of Stochastic Extragradient for Bilinear Games using Restarted Iteration Averaging

We study the stochastic bilinear minimax optimization problem, presenting an analysis of the same-sample Stochastic ExtraGradient (SEG) method with constant step size, and presenting variations of the method that yield favorable convergence. In sharp contrasts with the basic SEG method whose last iterate only contracts to a fixed neighborhood of the Nash equilibrium, SEG augmented with iteration averaging provably converges to the Nash equilibrium under the same standard settings, and such a rate is further improved by incorporating a scheduled restarting procedure. In the interpolation setting where noise vanishes at the Nash equilibrium, we achieve an optimal convergence rate up to tight constants. We present numerical experiments that validate our theoretical findings and demonstrate the effectiveness of the SEG method when equipped with iteration averaging and restarting.

preprint2022arXiv

On the Principles of Parsimony and Self-Consistency for the Emergence of Intelligence

Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in general. We introduce two fundamental principles, Parsimony and Self-consistency, that address two fundamental questions regarding Intelligence: what to learn and how to learn, respectively. We believe the two principles are the cornerstones for the emergence of Intelligence, artificial or natural. While these two principles have rich classical roots, we argue that they can be stated anew in entirely measurable and computable ways. More specifically, the two principles lead to an effective and efficient computational framework, compressive closed-loop transcription, that unifies and explains the evolution of modern deep networks and many artificial intelligence practices. While we mainly use modeling of visual data as an example, we believe the two principles will unify understanding of broad families of autonomous intelligent systems and provide a framework for understanding the brain.

preprint2022arXiv

PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations

Deep Reinforcement Learning (DRL) has been a promising solution to many complex decision-making problems. Nevertheless, the notorious weakness in generalization among environments prevent widespread application of DRL agents in real-world scenarios. Although advances have been made recently, most prior works assume sufficient online interaction on training environments, which can be costly in practical cases. To this end, we focus on an offline-training-online-adaptation setting, in which the agent first learns from offline experiences collected in environments with different dynamics and then performs online policy adaptation in environments with new dynamics. In this paper, we propose Policy Adaptation with Decoupled Representations (PAnDR) for fast policy adaptation. In offline training phase, the environment representation and policy representation are learned through contrastive learning and policy recovery, respectively. The representations are further refined by mutual information optimization to make them more decoupled and complete. With learned representations, a Policy-Dynamics Value Function (PDVF) [Raileanu et al., 2020] network is trained to approximate the values for different combinations of policies and environments from offline experiences. In online adaptation phase, with the environment context inferred from few experiences collected in new environments, the policy is optimized by gradient ascent with respect to the PDVF. Our experiments show that PAnDR outperforms existing algorithms in several representative policy adaptation problems.

preprint2022arXiv

Polarisation of SKT Calabi-Yau $\partial\bar\partial$-manifolds by Aeppli classes

Given a $\partial\bar\partial$-manifold $X$ with trivial canonical bundle and carrying a metric $ω$ such that $\partial\bar\partialω=0$, we introduce the concept of small deformations of $X$ polarised by the Aeppli cohomology class $[ω]_A$ of an SKT metric $ω$. There is a correspondence between the manifolds polarised by $[ω]_A$ in the Kuranishi family of $X$ and the Bott-Chern classes that are primitive in a sense that we define. We also investigate the existence of a primitive element in an arbitrary Bott-Chern primitive class and compare the metrics on the base space of the subfamily of manifolds polarised by $[ω]_A$ within the Kuranishi family.

preprint2022arXiv

Power Allocation for FDMA-URLLC Downlink with Random Channel Assignment

Concerning ultra-reliable low-latency communication (URLLC) for the downlink operating in the frequency-division multiple-access with random channel assignment, a lightweight power allocation approach is proposed to maximize the number of URLLC users subject to transmit-power and individual user-reliability constraints. Provided perfect channel-state-information at the transmitter (CSIT), the proposed approach is proven to ensure maximized URLLC users. Assuming imperfect CSIT, the proposed approach still aims to maximize the URLLC users without compromising the individual user reliability by using a pessimistic evaluation of the channel gain. It is demonstrated, through numerical results, that the proposed approach can significantly improve the user capacity and the transmit-power efficiency in Rayleigh fading channels. With imperfect CSIT, the proposed approach can still provide remarkable user capacity at limited cost of transmit-power efficiency.

preprint2022arXiv

Predicting Out-of-Distribution Error with the Projection Norm

We propose a metric -- Projection Norm -- to predict a model's performance on out-of-distribution (OOD) data without access to ground truth labels. Projection Norm first uses model predictions to pseudo-label test samples and then trains a new model on the pseudo-labels. The more the new model's parameters differ from an in-distribution model, the greater the predicted OOD error. Empirically, our approach outperforms existing methods on both image and text classification tasks and across different network architectures. Theoretically, we connect our approach to a bound on the test error for overparameterized linear models. Furthermore, we find that Projection Norm is the only approach that achieves non-trivial detection performance on adversarial examples. Our code is available at https://github.com/yaodongyu/ProjNorm.

preprint2022arXiv

Robust Calibration with Multi-domain Temperature Scaling

Uncertainty quantification is essential for the reliable deployment of machine learning models to high-stakes application domains. Uncertainty quantification is all the more challenging when training distribution and test distribution are different, even the distribution shifts are mild. Despite the ubiquity of distribution shifts in real-world applications, existing uncertainty quantification approaches mainly study the in-distribution setting where the train and test distributions are the same. In this paper, we develop a systematic calibration model to handle distribution shifts by leveraging data from multiple domains. Our proposed method -- multi-domain temperature scaling -- uses the heterogeneity in the domains to improve calibration robustness under distribution shift. Through experiments on three benchmark data sets, we find our proposed method outperforms existing methods as measured on both in-distribution and out-of-distribution test sets.

preprint2021arXiv

Optimistic Dual Extrapolation for Coherent Non-monotone Variational Inequalities

The optimization problems associated with training generative adversarial neural networks can be largely reduced to certain {\em non-monotone} variational inequality problems (VIPs), whereas existing convergence results are mostly based on monotone or strongly monotone assumptions. In this paper, we propose {\em optimistic dual extrapolation (OptDE)}, a method that only performs {\em one} gradient evaluation per iteration. We show that OptDE is provably convergent to {\em a strong solution} under different coherent non-monotone assumptions. In particular, when a {\em weak solution} exists, the convergence rate of our method is $O(1/{ε^{2}})$, which matches the best existing result of the methods with two gradient evaluations. Further, when a {\em $σ$-weak solution} exists, the convergence guarantee is improved to the linear rate $O(\log\frac{1}ε)$. Along the way--as a byproduct of our inquiries into non-monotone variational inequalities--we provide the near-optimal $O\big(\frac{1}ε\log \frac{1}ε\big)$ convergence guarantee in terms of restricted strong merit function for monotone variational inequalities. We also show how our results can be naturally generalized to the stochastic setting, and obtain corresponding new convergence results. Taken together, our results contribute to the broad landscape of variational inequality--both non-monotone and monotone alike--by providing a novel and more practical algorithm with the state-of-the-art convergence guarantees.

preprint2021arXiv

Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization

In this paper, we introduce a simplified and unified method for finite-sum convex optimization, named \emph{Variance Reduction via Accelerated Dual Averaging (VRADA)}. In both general convex and strongly convex settings, VRADA can attain an $O\big(\frac{1}{n}\big)$-accurate solution in $O(n\log\log n)$ number of stochastic gradient evaluations which improves the best-known result $O(n\log n)$, where $n$ is the number of samples. Meanwhile, VRADA matches the lower bound of the general convex setting up to a $\log\log n$ factor and matches the lower bounds in both regimes $n\le Θ(κ)$ and $n\gg κ$ of the strongly convex setting, where $κ$ denotes the condition number. Besides improving the best-known results and matching all the above lower bounds simultaneously, VRADA has more unified and simplified algorithmic implementation and convergence analysis for both the general convex and strongly convex settings. The underlying novel approaches such as the novel initialization strategy in VRADA may be of independent interest. Through experiments on real datasets, we show the good performance of VRADA over existing methods for large-scale machine learning problems.

preprint2020arXiv

A Modular Neural Network Based Deep Learning Approach for MIMO Signal Detection

In this paper, we reveal that artificial neural network (ANN) assisted multiple-input multiple-output (MIMO) signal detection can be modeled as ANN-assisted lossy vector quantization (VQ), named MIMO-VQ, which is basically a joint statistical channel quantization and signal quantization procedure. It is found that the quantization loss increases linearly with the number of transmit antennas, and thus MIMO-VQ scales poorly with the size of MIMO. Motivated by this finding, we propose a novel modular neural network based approach, termed MNNet, where the whole network is formed by a set of pre-defined ANN modules. The key of ANN module design lies in the integration of parallel interference cancellation in the MNNet, which linearly reduces the interference (or equivalently the number of transmit-antennas) along the feed-forward propagation; and so as the quantization loss. Our simulation results show that the MNNet approach largely improves the deep-learning capacity with near-optimal performance in various cases. Provided that MNNet is well modularized, the learning procedure does not need to be applied on the entire network as a whole, but rather at the modular level. Due to this reason, MNNet has the advantage of much lower learning complexity than other deep-learning based MIMO detection approaches.

preprint2020arXiv

An Actor-Critic-Based UAV-BSs Deployment Method for Dynamic Environments

In this paper, the real-time deployment of unmanned aerial vehicles (UAVs) as flying base stations (BSs) for optimizing the throughput of mobile users is investigated for UAV networks. This problem is formulated as a time-varying mixed-integer non-convex programming (MINP) problem, which is challenging to find an optimal solution in a short time with conventional optimization techniques. Hence, we propose an actor-critic-based (AC-based) deep reinforcement learning (DRL) method to find near-optimal UAV positions at every moment. In the proposed method, the process searching for the solution iteratively at a particular moment is modeled as a Markov decision process (MDP). To handle infinite state and action spaces and improve the robustness of the decision process, two powerful neural networks (NNs) are configured to evaluate the UAV position adjustments and make decisions, respectively. Compared with the heuristic algorithm, sequential least-squares programming and fixed UAVs methods, simulation results have shown that the proposed method outperforms these three benchmarks in terms of the throughput at every moment in UAV networks.

preprint2020arXiv

An Orthogonal-SGD based Learning Approach for MIMO Detection under Multiple Channel Models

In this paper, an orthogonal stochastic gradient descent (O-SGD) based learning approach is proposed to tackle the wireless channel over-training problem inherent in artificial neural network (ANN)-assisted MIMO signal detection. Our basic idea lies in the discovery and exploitation of the training-sample orthogonality between the current training epoch and past training epochs. Unlike the conventional SGD that updates the neural network simply based upon current training samples, O-SGD discovers the correlation between current training samples and historical training data, and then updates the neural network with those uncorrelated components. The network updating occurs only in those identified null subspaces. By such means, the neural network can understand and memorize uncorrelated components between different wireless channels, and thus is more robust to wireless channel variations. This hypothesis is confirmed through our extensive computer simulations as well as performance comparison with the conventional SGD approach.

preprint2020arXiv

Breaking the $O(1/ε)$ Optimal Rate for a Class of Minimax Problems

It is known that for convex optimization $\min_{\mathbf{w}\in\mathcal{W}}f(\mathbf{w})$, the best possible rate of first order accelerated methods is $O(1/\sqrtε)$. However, for the bilinear minimax problem: $\min_{\mathbf{w}\in\mathcal{W}}\max_{\mathbf{v}\in\mathcal{V}}$ $f(\mathbf{w})$ $+\langle\mathbf{w}, \boldsymbol{A}\mathbf{v}\rangle$ $-h(\mathbf{v})$ where both $f(\mathbf{w})$ and $h(\mathbf{v})$ are convex, the best known rate of first order methods slows down to $O(1/ε)$. It is not known whether one can achieve the accelerated rate $O(1/\sqrtε)$ for the bilinear minimax problem without assuming $f(\mathbf{w})$ and $h(\mathbf{v})$ being strongly convex. In this paper, we fill this theoretical gap by proposing a bilinear accelerated extragradient (BAXG) method. We show that when $\mathcal{W}=\mathbb{R}^d$, $f(\mathbf{w})$ and $h(\mathbf{v})$ are convex and smooth, and $\boldsymbol{A}$ has full column rank, then the BAXG method achieves an accelerated rate $O(1/\sqrtε\log \frac{1}ε)$, within a logarithmic factor to the likely optimal rate $O(1/\sqrtε)$. As result, a large class of bilinear convex concave minimax problems, including a few problems of practical importance, can be solved much faster than previously known methods.

preprint2020arXiv

Deep Isometric Learning for Visual Recognition

Initialization, normalization, and skip connections are believed to be three indispensable techniques for training very deep convolutional neural networks and obtaining state-of-the-art performance. This paper shows that deep vanilla ConvNets without normalization nor skip connections can also be trained to achieve surprisingly good performance on standard image recognition benchmarks. This is achieved by enforcing the convolution kernels to be near isometric during initialization and training, as well as by using a variant of ReLU that is shifted towards being isometric. Further experiments show that if combined with skip connections, such near isometric networks can achieve performances on par with (for ImageNet) and better than (for COCO) the standard ResNet, even without normalization at all. Our code is available at https://github.com/HaozhiQi/ISONet.

preprint2020arXiv

Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising

In E-commerce, advertising is essential for merchants to reach their target users. The typical objective is to maximize the advertiser's cumulative revenue over a period of time under a budget constraint. In real applications, an advertisement (ad) usually needs to be exposed to the same user multiple times until the user finally contributes revenue (e.g., places an order). However, existing advertising systems mainly focus on the immediate revenue with single ad exposures, ignoring the contribution of each exposure to the final conversion, thus usually falls into suboptimal solutions. In this paper, we formulate the sequential advertising strategy optimization as a dynamic knapsack problem. We propose a theoretically guaranteed bilevel optimization framework, which significantly reduces the solution space of the original optimization space while ensuring the solution quality. To improve the exploration efficiency of reinforcement learning, we also devise an effective action space reduction approach. Extensive offline and online experiments show the superior performance of our approaches over state-of-the-art baselines in terms of cumulative revenue.

preprint2020arXiv

KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge

Reinforcement learning agents usually learn from scratch, which requires a large number of interactions with the environment. This is quite different from the learning process of human. When faced with a new task, human naturally have the common sense and use the prior knowledge to derive an initial policy and guide the learning process afterwards. Although the prior knowledge may be not fully applicable to the new task, the learning process is significantly sped up since the initial policy ensures a quick-start of learning and intermediate guidance allows to avoid unnecessary exploration. Taking this inspiration, we propose knowledge guided policy network (KoGuN), a novel framework that combines human prior suboptimal knowledge with reinforcement learning. Our framework consists of a fuzzy rule controller to represent human knowledge and a refine module to fine-tune suboptimal prior knowledge. The proposed framework is end-to-end and can be combined with existing policy-based reinforcement learning algorithm. We conduct experiments on both discrete and continuous control tasks. The empirical results show that our approach, which combines human suboptimal knowledge and RL, achieves significant improvement on learning efficiency of flat RL algorithms, even with very low-performance human prior knowledge.

preprint2020arXiv

Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction

To learn intrinsic low-dimensional structures from high-dimensional data that most discriminate between classes, we propose the principle of Maximal Coding Rate Reduction ($\text{MCR}^2$), an information-theoretic measure that maximizes the coding rate difference between the whole dataset and the sum of each individual class. We clarify its relationships with most existing frameworks such as cross-entropy, information bottleneck, information gain, contractive and contrastive learning, and provide theoretical guarantees for learning diverse and discriminative features. The coding rate can be accurately computed from finite samples of degenerate subspace-like distributions and can learn intrinsic representations in supervised, self-supervised, and unsupervised settings in a unified manner. Empirically, the representations learned using this principle alone are significantly more robust to label corruptions in classification than those using cross-entropy, and can lead to state-of-the-art results in clustering mixed data from self-learned invariant features.

preprint2020arXiv

Learning to Accelerate Heuristic Searching for Large-Scale Maximum Weighted b-Matching Problems in Online Advertising

Bipartite b-matching is fundamental in algorithm design, and has been widely applied into economic markets, labor markets, etc. These practical problems usually exhibit two distinct features: large-scale and dynamic, which requires the matching algorithm to be repeatedly executed at regular intervals. However, existing exact and approximate algorithms usually fail in such settings due to either requiring intolerable running time or too much computation resource. To address this issue, we propose \texttt{NeuSearcher} which leverages the knowledge learned from previously instances to solve new problem instances. Specifically, we design a multichannel graph neural network to predict the threshold of the matched edges weights, by which the search region could be significantly reduced. We further propose a parallel heuristic search algorithm to iteratively improve the solution quality until convergence. Experiments on both open and industrial datasets demonstrate that \texttt{NeuSearcher} can speed up 2 to 3 times while achieving exactly the same matching solution compared with the state-of-the-art approximation approaches.

preprint2020arXiv

Learning to Detect 3D Reflection Symmetry for Single-View Reconstruction

3D reconstruction from a single RGB image is a challenging problem in computer vision. Previous methods are usually solely data-driven, which lead to inaccurate 3D shape recovery and limited generalization capability. In this work, we focus on object-level 3D reconstruction and present a geometry-based end-to-end deep learning framework that first detects the mirror plane of reflection symmetry that commonly exists in man-made objects and then predicts depth maps by finding the intra-image pixel-wise correspondence of the symmetry. Our method fully utilizes the geometric cues from symmetry during the test time by building plane-sweep cost volumes, a powerful tool that has been used in multi-view stereopsis. To our knowledge, this is the first work that uses the concept of cost volumes in the setting of single-image 3D reconstruction. We conduct extensive experiments on the ShapeNet dataset and find that our reconstruction method significantly outperforms the previous state-of-the-art single-view 3D reconstruction networks in term of the accuracy of camera poses and depth maps, without requiring objects being completely symmetric. Code is available at https://github.com/zhou13/symmetrynet.

preprint2020arXiv

Learning to Parse Wireframes in Images of Man-Made Environments

In this paper, we propose a learning-based approach to the task of automatically extracting a "wireframe" representation for images of cluttered man-made environments. The wireframe (see Fig. 1) contains all salient straight lines and their junctions of the scene that encode efficiently and accurately large-scale geometry and object shapes. To this end, we have built a very large new dataset of over 5,000 images with wireframes thoroughly labelled by humans. We have proposed two convolutional neural networks that are suitable for extracting junctions and lines with large spatial support, respectively. The networks trained on our dataset have achieved significantly better performance than state-of-the-art methods for junction detection and line segment detection, respectively. We have conducted extensive experiments to evaluate quantitatively and qualitatively the wireframes obtained by our method, and have convincingly shown that effectively and efficiently parsing wireframes for images of man-made environments is a feasible goal within reach. Such wireframes could benefit many important visual tasks such as feature correspondence, 3D reconstruction, vision-based mapping, localization, and navigation. The data and source code are available at https://github.com/huangkuns/wireframe.

preprint2020arXiv

On Deep Learning Solutions for Joint Transmitter and Noncoherent Receiver Design in MU-MIMO Systems

This paper aims to handle the joint transmitter and noncoherent receiver design for multiuser multiple-input multiple-output (MU-MIMO) systems through deep learning. Given the deep neural network (DNN) based noncoherent receiver, the novelty of this work mainly lies in the multiuser waveform design at the transmitter side. According to the signal format, the proposed deep learning solutions can be divided into two groups. One group is called pilot-aided waveform, where the information-bearing symbols are time-multiplexed with the pilot symbols. The other is called learning-based waveform, where the multiuser waveform is partially or even completely designed by deep learning algorithms. Specifically, if the information-bearing symbols are directly embedded in the waveform, it is called systematic waveform. Otherwise, it is called non-systematic waveform, where no artificial design is involved. Simulation results show that the pilot-aided waveform design outperforms the conventional zero forcing receiver with least squares (LS) channel estimation on small-size MU-MIMO systems. By exploiting the time-domain degrees of freedom (DoF), the learning-based waveform design further improves the detection performance by at least 5 dB at high signal-to-noise ratio (SNR) range. Moreover, it is found that the traditional weight initialization method might cause a training imbalance among different users in the learning-based waveform design. To tackle this issue, a novel weight initialization method is proposed which provides a balanced convergence performance with no complexity penalty.

preprint2020arXiv

Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization

Recent advances have shown that implicit bias of gradient descent on over-parameterized models enables the recovery of low-rank matrices from linear measurements, even with no prior knowledge on the intrinsic rank. In contrast, for robust low-rank matrix recovery from grossly corrupted measurements, over-parameterization leads to overfitting without prior knowledge on both the intrinsic rank and sparsity of corruption. This paper shows that with a double over-parameterization for both the low-rank matrix and sparse corruption, gradient descent with discrepant learning rates provably recovers the underlying matrix even without prior knowledge on neither rank of the matrix nor sparsity of the corruption. We further extend our approach for the robust recovery of natural images by over-parameterizing images with deep convolutional networks. Experiments show that our method handles different test images and varying corruption levels with a single learning pipeline where the network width and termination conditions do not need to be adjusted on a case-by-case basis. Underlying the success is again the implicit bias with discrepant learning rates on different over-parameterized parameters, which may bear on broader applications.

preprint2018arXiv

TARM: A Turbo-type Algorithm for Affine Rank Minimization

The affine rank minimization (ARM) problem arises in many real-world applications. The goal is to recover a low-rank matrix from a small amount of noisy affine measurements. The original problem is NP-hard, and so directly solving the problem is computationally prohibitive. Approximate low-complexity solutions for ARM have recently attracted much research interest. In this paper, we design an iterative algorithm for ARM based on message passing principles. The proposed algorithm is termed turbo-type ARM (TARM), as inspired by the recently developed turbo compressed sensing algorithm for sparse signal recovery. We show that, when the linear operator for measurement is right-orthogonally invariant (ROIL), a scalar function called state evolution can be established to accurately predict the behaviour of the TARM algorithm. We also show that TARM converges much faster than the counterpart algorithms for low-rank matrix recovery. We further extend the TARM algorithm for matrix completion, where the measurement operator corresponds to a random selection matrix. We show that, although the state evolution is not accurate for matrix completion, the TARM algorithm with carefully tuned parameters still significantly outperforms its counterparts.

preprint2015arXiv

Automatic Layer Separation using Light Field Imaging

We propose a novel approach that jointly removes reflection or translucent layer from a scene and estimates scene depth. The input data are captured via light field imaging. The problem is couched as minimizing the rank of the transmitted scene layer via Robust Principle Component Analysis (RPCA). We also impose regularization based on piecewise smoothness, gradient sparsity, and layer independence to simultaneously recover 3D geometry of the transmitted layer. Experimental results on synthetic and real data show that our technique is robust and reliable, and can handle a broad range of layer separation problems.

preprint2015arXiv

Robust Face Recognition by Constrained Part-based Alignment

Developing a reliable and practical face recognition system is a long-standing goal in computer vision research. Existing literature suggests that pixel-wise face alignment is the key to achieve high-accuracy face recognition. By assuming a human face as piece-wise planar surfaces, where each surface corresponds to a facial part, we develop in this paper a Constrained Part-based Alignment (CPA) algorithm for face recognition across pose and/or expression. Our proposed algorithm is based on a trainable CPA model, which learns appearance evidence of individual parts and a tree-structured shape configuration among different parts. Given a probe face, CPA simultaneously aligns all its parts by fitting them to the appearance evidence with consideration of the constraint from the tree-structured shape configuration. This objective is formulated as a norm minimization problem regularized by graph likelihoods. CPA can be easily integrated with many existing classifiers to perform part-based face recognition. Extensive experiments on benchmark face datasets show that CPA outperforms or is on par with existing methods for robust face recognition across pose, expression, and/or illumination changes.

preprint2015arXiv

ROML: A Robust Feature Correspondence Approach for Matching Objects in A Set of Images

Feature-based object matching is a fundamental problem for many applications in computer vision, such as object recognition, 3D reconstruction, tracking, and motion segmentation. In this work, we consider simultaneously matching object instances in a set of images, where both inlier and outlier features are extracted. The task is to identify the inlier features and establish their consistent correspondences across the image set. This is a challenging combinatorial problem, and the problem complexity grows exponentially with the image number. To this end, we propose a novel framework, termed ROML, to address this problem. ROML optimizes simultaneously a partial permutation matrix (PPM) for each image, and feature correspondences are established by the obtained PPMs. Two of our key contributions are summarized as follows. (1) We formulate the problem as rank and sparsity minimization for PPM optimization, and treat simultaneous optimization of multiple PPMs as a regularized consensus problem in the context of distributed optimization. (2) We use the ADMM method to solve the thus formulated ROML problem, in which a subproblem associated with a single PPM optimization appears to be a difficult integer quadratic program (IQP). We prove that under wildly applicable conditions, this IQP is equivalent to a linear sum assignment problem (LSAP), which can be efficiently solved to an exact solution. Extensive experiments on rigid/non-rigid object matching, matching instances of a common object category, and common object localization show the efficacy of our proposed method.

preprint2015arXiv

Simulation of Implied Volatility Surfaces via Tangent Levy Models

In this paper, we implement and test two types of market-based models for European-type options, based on the tangent Levy models proposed recently by R. Carmona and S. Nadtochiy. As a result, we obtain a method for generating Monte Carlo samples of future paths of implied volatility surfaces. These paths and the surfaces themselves are free of arbitrage, and are constructed in a way that is consistent with the past and present values of implied volatility. We use a real market data to estimate the parameters of these models and conduct an empirical study, to compare the performance of market-based models with the performance of classical stochastic volatility models. We choose the problem of minimal-variance portfolio choice as a measure of model performance and compare the two tangent Levy models to SABR model. Our study demonstrates that the tangent Levy models do a much better job at finding a portfolio with smallest variance, their predictions for the variance are more reliable, and the portfolio weights are more stable. To the best of our knowledge, this is the first example of empirical analysis that provides a convincing evidence of the superior performance of the market-based models for European options using real market data.

preprint2014arXiv

Blind Image Deblurring by Spectral Properties of Convolution Operators

In this paper, we study the problem of recovering a sharp version of a given blurry image when the blur kernel is unknown. Previous methods often introduce an image-independent regularizer (such as Gaussian or sparse priors) on the desired blur kernel. We shall show that the blurry image itself encodes rich information about the blur kernel. Such information can be found through analyzing and comparing how the spectrum of an image as a convolution operator changes before and after blurring. Our analysis leads to an effective convex regularizer on the blur kernel which depends only on the given blurry image. We show that the minimizer of this regularizer guarantees to give good approximation to the blur kernel if the original image is sharp enough. By combining this powerful regularizer with conventional image deblurring techniques, we show how we could significantly improve the deblurring results through simulations and experiments on real images. In addition, our analysis and experiments help explaining a widely accepted doctrine; that is, the edges are good features for deblurring.

preprint2014arXiv

Collaborative Representation based Classification for Face Recognition

By coding a query sample as a sparse linear combination of all training samples and then classifying it by evaluating which class leads to the minimal coding residual, sparse representation based classification (SRC) leads to interesting results for robust face recognition. It is widely believed that the l1- norm sparsity constraint on coding coefficients plays a key role in the success of SRC, while its use of all training samples to collaboratively represent the query sample is rather ignored. In this paper we discuss how SRC works, and show that the collaborative representation mechanism used in SRC is much more crucial to its success of face classification. The SRC is a special case of collaborative representation based classification (CRC), which has various instantiations by applying different norms to the coding residual and coding coefficient. More specifically, the l1 or l2 norm characterization of coding residual is related to the robustness of CRC to outlier facial pixels, while the l1 or l2 norm characterization of coding coefficient is related to the degree of discrimination of facial features. Extensive experiments were conducted to verify the face recognition accuracy and efficiency of CRC with different instantiations.

preprint2014arXiv

PCANet: A Simple Deep Learning Baseline for Image Classification?

In this work, we propose a very simple deep learning network for image classification which comprises only the very basic data processing components: cascaded principal component analysis (PCA), binary hashing, and block-wise histograms. In the proposed architecture, PCA is employed to learn multistage filter banks. It is followed by simple binary hashing and block histograms for indexing and pooling. This architecture is thus named as a PCA network (PCANet) and can be designed and learned extremely easily and efficiently. For comparison and better understanding, we also introduce and study two simple variations to the PCANet, namely the RandNet and LDANet. They share the same topology of PCANet but their cascaded filters are either selected randomly or learned from LDA. We have tested these basic networks extensively on many benchmark visual datasets for different tasks, such as LFW for face verification, MultiPIE, Extended Yale B, AR, FERET datasets for face recognition, as well as MNIST for hand-written digits recognition. Surprisingly, for all tasks, such a seemingly naive PCANet model is on par with the state of the art features, either prefixed, highly hand-crafted or carefully learned (by DNNs). Even more surprisingly, it sets new records for many classification tasks in Extended Yale B, AR, FERET datasets, and MNIST variations. Additional experiments on other public datasets also demonstrate the potential of the PCANet serving as a simple but highly competitive baseline for texture classification and object recognition.

preprint2014arXiv

Sparse Illumination Learning and Transfer for Single-Sample Face Recognition with Image Corruption and Misalignment

Single-sample face recognition is one of the most challenging problems in face recognition. We propose a novel algorithm to address this problem based on a sparse representation based classification (SRC) framework. The new algorithm is robust to image misalignment and pixel corruption, and is able to reduce required gallery images to one sample per class. To compensate for the missing illumination information traditionally provided by multiple gallery images, a sparse illumination learning and transfer (SILT) technique is introduced. The illumination in SILT is learned by fitting illumination examples of auxiliary face images from one or more additional subjects with a sparsely-used illumination dictionary. By enforcing a sparse representation of the query image in the illumination dictionary, the SILT can effectively recover and transfer the illumination and pose information from the alignment stage to the recognition stage. Our extensive experiments have demonstrated that the new algorithms significantly outperform the state of the art in the single-sample regime and with less restrictions. In particular, the single-sample face alignment accuracy is comparable to that of the well-known Deformable SRC algorithm using multiple gallery images per class. Furthermore, the face recognition accuracy exceeds those of the SRC and Extended SRC algorithms using hand labeled alignment initialization.

preprint2013arXiv

The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices

This paper proposes scalable and fast algorithms for solving the Robust PCA problem, namely recovering a low-rank matrix with an unknown fraction of its entries being arbitrarily corrupted. This problem arises in many applications, such as image processing, web data ranking, and bioinformatic data analysis. It was recently shown that under surprisingly broad conditions, the Robust PCA problem can be exactly solved via convex optimization that minimizes a combination of the nuclear norm and the $\ell^1$-norm . In this paper, we apply the method of augmented Lagrange multipliers (ALM) to solve this convex program. As the objective function is non-smooth, we show how to extend the classical analysis of ALM to such new objective functions and prove the optimality of the proposed algorithms and characterize their convergence rate. Empirically, the proposed new algorithms can be more than five times faster than the previous state-of-the-art algorithms for Robust PCA, such as the accelerated proximal gradient (APG) algorithm. Moreover, the new algorithms achieve higher precision, yet being less storage/memory demanding. We also show that the ALM technique can be used to solve the (related but somewhat simpler) matrix completion problem and obtain rather promising results too. We further prove the necessary and sufficient condition for the inexact ALM to converge globally. Matlab code of all algorithms discussed are available at http://perception.csl.illinois.edu/matrix-rank/home.html

preprint2012arXiv

Compressive Principal Component Pursuit

We consider the problem of recovering a target matrix that is a superposition of low-rank and sparse components, from a small set of linear measurements. This problem arises in compressed sensing of structured high-dimensional signals such as videos and hyperspectral images, as well as in the analysis of transformation invariant low-rank recovery. We analyze the performance of the natural convex heuristic for solving this problem, under the assumption that measurements are chosen uniformly at random. We prove that this heuristic exactly recovers low-rank and sparse terms, provided the number of observations exceeds the number of intrinsic degrees of freedom of the component signals by a polylogarithmic factor. Our analysis introduces several ideas that may be of independent interest for the more general problem of compressed sensing and decomposing superpositions of multiple structured signals.

preprint2012arXiv

Fast L1-Minimization Algorithms For Robust Face Recognition

L1-minimization refers to finding the minimum L1-norm solution to an underdetermined linear system b=Ax. Under certain conditions as described in compressive sensing theory, the minimum L1-norm solution is also the sparsest solution. In this paper, our study addresses the speed and scalability of its algorithms. In particular, we focus on the numerical implementation of a sparsity-based classification framework in robust face recognition, where sparse representation is sought to recover human identities from very high-dimensional facial images that may be corrupted by illumination, facial disguise, and pose variation. Although the underlying numerical problem is a linear program, traditional algorithms are known to suffer poor scalability for large-scale applications. We investigate a new solution based on a classical convex optimization framework, known as Augmented Lagrangian Methods (ALM). The new convex solvers provide a viable solution to real-world, time-critical applications such as face recognition. We conduct extensive experiments to validate and compare the performance of the ALM algorithms against several popular L1-minimization solvers, including interior-point method, Homotopy, FISTA, SESOP-PCD, approximate message passing (AMP) and TFOCS. To aid peer evaluation, the code for all the algorithms has been made publicly available.

preprint2012arXiv

Generalized Principal Component Analysis (GPCA)

This paper presents an algebro-geometric solution to the problem of segmenting an unknown number of subspaces of unknown and varying dimensions from sample data points. We represent the subspaces with a set of homogeneous polynomials whose degree is the number of subspaces and whose derivatives at a data point give normal vectors to the subspace passing through the point. When the number of subspaces is known, we show that these polynomials can be estimated linearly from data; hence, subspace segmentation is reduced to classifying one point per subspace. We select these points optimally from the data set by minimizing certain distance function, thus dealing automatically with moderate noise in the data. A basis for the complement of each subspace is then recovered by applying standard PCA to the collection of derivatives (normal vectors). Extensions of GPCA that deal with data in a high- dimensional space and with an unknown number of subspaces are also presented. Our experiments on low-dimensional data show that GPCA outperforms existing algebraic algorithms based on polynomial factorization and provides a good initialization to iterative techniques such as K-subspaces and Expectation Maximization. We also present applications of GPCA to computer vision problems such as face clustering, temporal video segmentation, and 3D motion segmentation from point correspondences in multiple affine views.

preprint2012arXiv

Principal Component Pursuit with Reduced Linear Measurements

In this paper, we study the problem of decomposing a superposition of a low-rank matrix and a sparse matrix when a relatively few linear measurements are available. This problem arises in many data processing tasks such as aligning multiple images or rectifying regular texture, where the goal is to recover a low-rank matrix with a large fraction of corrupted entries in the presence of nonlinear domain transformation. We consider a natural convex heuristic to this problem which is a variant to the recently proposed Principal Component Pursuit. We prove that under suitable conditions, this convex program guarantees to recover the correct low-rank and sparse components despite reduced measurements. Our analysis covers both random and deterministic measurement models.

preprint2012arXiv

Robust Recovery of Subspace Structures by Low-Rank Representation

In this work we address the subspace recovery problem. Given a set of data samples (vectors) approximately drawn from a union of multiple subspaces, our goal is to segment the samples into their respective subspaces and correct the possible errors as well. To this end, we propose a novel method termed Low-Rank Representation (LRR), which seeks the lowest-rank representation among all the candidates that can represent the data samples as linear combinations of the bases in a given dictionary. It is shown that LRR well solves the subspace recovery problem: when the data is clean, we prove that LRR exactly captures the true subspace structures; for the data contaminated by outliers, we prove that under certain conditions LRR can exactly recover the row space of the original data and detect the outlier as well; for the data corrupted by arbitrary errors, LRR can also approximately recover the row space with theoretical guarantees. Since the subspace membership is provably determined by the row space, these further imply that LRR can perform robust subspace segmentation and error correction, in an efficient way.

preprint2011arXiv

Sparsity and Robustness in Face Recognition

This report concerns the use of techniques for sparse signal representation and sparse error correction for automatic face recognition. Much of the recent interest in these techniques comes from the paper "Robust Face Recognition via Sparse Representation" by Wright et al. (2009), which showed how, under certain technical conditions, one could cast the face recognition problem as one of seeking a sparse representation of a given input face image in terms of a "dictionary" of training images and images of individual pixels. In this report, we have attempted to clarify some frequently encountered questions about this work and particularly, on the validity of using sparse representation techniques for face recognition.

preprint2010arXiv

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit

We consider the problem of recovering a low-rank matrix when some of its entries, whose locations are not known a priori, are corrupted by errors of arbitrarily large magnitude. It has recently been shown that this problem can be solved efficiently and effectively by a convex program named Principal Component Pursuit (PCP), provided that the fraction of corrupted entries and the rank of the matrix are both sufficiently small. In this paper, we extend that result to show that the same convex program, with a slightly improved weighting parameter, exactly recovers the low-rank matrix even if "almost all" of its entries are arbitrarily corrupted, provided the signs of the errors are random. We corroborate our result with simulations on randomly generated matrices and errors.

preprint2010arXiv

Segmentation of Natural Images by Texture and Boundary Compression

We present a novel algorithm for segmentation of natural images that harnesses the principle of minimum description length (MDL). Our method is based on observations that a homogeneously textured region of a natural image can be well modeled by a Gaussian distribution and the region boundary can be effectively coded by an adaptive chain code. The optimal segmentation of an image is the one that gives the shortest coding length for encoding all textures and boundaries in the image, and is obtained via an agglomerative clustering process applied to a hierarchy of decreasing window sizes as multi-scale texture features. The optimal segmentation also provides an accurate estimate of the overall coding length and hence the true entropy of the image. We test our algorithm on the publicly available Berkeley Segmentation Dataset. It achieves state-of-the-art segmentation results compared to other existing methods.

preprint2010arXiv

Stable Principal Component Pursuit

In this paper, we study the problem of recovering a low-rank matrix (the principal components) from a high-dimensional data matrix despite both small entry-wise noise and gross sparse errors. Recently, it has been shown that a convex program, named Principal Component Pursuit (PCP), can recover the low-rank matrix when the data matrix is corrupted by gross sparse errors. We further prove that the solution to a related convex program (a relaxed PCP) gives an estimate of the low-rank matrix that is simultaneously stable to small entrywise noise and robust to gross sparse errors. More precisely, our result shows that the proposed convex program recovers the low-rank matrix even though a positive fraction of its entries are arbitrarily corrupted, with an error bound proportional to the noise level. We present simulation results to support our result and demonstrate that the new convex program accurately recovers the principal components (the low-rank matrix) under quite broad conditions. To our knowledge, this is the first result that shows the classical Principal Component Analysis (PCA), optimal for small i.i.d. noise, can be made robust to gross sparse errors; or the first that shows the newly proposed PCP can be made stable to small entry-wise perturbations.

preprint2010arXiv

TILT: Transform Invariant Low-rank Textures

In this paper, we show how to efficiently and effectively extract a class of "low-rank textures" in a 3D scene from 2D images despite significant corruptions and warping. The low-rank textures capture geometrically meaningful structures in an image, which encompass conventional local features such as edges and corners as well as all kinds of regular, symmetric patterns ubiquitous in urban environments and man-made objects. Our approach to finding these low-rank textures leverages the recent breakthroughs in convex optimization that enable robust recovery of a high-dimensional low-rank matrix despite gross sparse errors. In the case of planar regions with significant affine or projective deformation, our method can accurately recover both the intrinsic low-rank texture and the precise domain transformation, and hence the 3D geometry and appearance of the planar regions. Extensive experimental results demonstrate that this new technique works effectively for many regular and near-regular patterns or objects that are approximately low-rank, such as symmetrical patterns, building facades, printed texts, and human faces.

preprint2010arXiv

Underlay Cognitive Radio with Full or Partial Channel Quality Information

Underlay cognitive radios (UCRs) allow a secondary user to enter a primary user's spectrum through intelligent utilization of multiuser channel quality information (CQI) and sharing of codebook. The aim of this work is to study two-user Gaussian UCR systems by assuming the full or partial knowledge of multiuser CQI. Key contribution of this work is motivated by the fact that the full knowledge of multiuser CQI is not always available. We first establish a location-aided UCR model where the secondary user is assumed to have partial CQI about the secondary-transmitter to primary-receiver link as well as full CQI about the other links. Then, new UCR approaches are proposed and carefully analyzed in terms of the secondary user's achievable rate, denoted by $C_2$, the capacity penalty to primary user, denoted by $ΔC_1$, and capacity outage probability. Numerical examples are provided to visually compare the performance of UCRs with full knowledge of multiuser CQI and the proposed approaches with partial knowledge of multiuser CQI.

Yi Ma

What is connected

Connect this record

See the researcher in context

Building this map preview

54 published item(s)

DexHoldem: Playing Texas Hold'em with Dexterous Embodied System

Coordinated Humanoid Manipulation with Choice Policies

Gradient weighting for speaker verification in extremely low Signal-to-Noise Ratio

A Design of Low-Projection SCMA Codebooks for Ultra-Low Decoding Complexity in Downlink IoT Networks

A Novel Multi-Task Learning Empowered Codebook Design for Downlink SCMA Networks

Closed-Loop Data Transcription to an LDR via Minimaxing Rate Reduction

Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning

DeepFGS: Fine-Grained Scalable Coding for Learned Image Compression

Efficient Maximal Coding Rate Reduction by Variational Forms

Fully Convolutional Line Parsing

Network-ELAA Beamforming and Coverage Analysis for eMBB/URLLC in Spatially Non-Stationary Rician Channels

On the Convergence of Stochastic Extragradient for Bilinear Games using Restarted Iteration Averaging

On the Principles of Parsimony and Self-Consistency for the Emergence of Intelligence

PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations

Polarisation of SKT Calabi-Yau $\partial\bar\partial$-manifolds by Aeppli classes

Power Allocation for FDMA-URLLC Downlink with Random Channel Assignment

Predicting Out-of-Distribution Error with the Projection Norm

Robust Calibration with Multi-domain Temperature Scaling

Optimistic Dual Extrapolation for Coherent Non-monotone Variational Inequalities

Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization

A Modular Neural Network Based Deep Learning Approach for MIMO Signal Detection

An Actor-Critic-Based UAV-BSs Deployment Method for Dynamic Environments

An Orthogonal-SGD based Learning Approach for MIMO Detection under Multiple Channel Models

Breaking the $O(1/ε)$ Optimal Rate for a Class of Minimax Problems

Deep Isometric Learning for Visual Recognition

Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising

KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge

Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction

Learning to Accelerate Heuristic Searching for Large-Scale Maximum Weighted b-Matching Problems in Online Advertising

Learning to Detect 3D Reflection Symmetry for Single-View Reconstruction

Learning to Parse Wireframes in Images of Man-Made Environments

On Deep Learning Solutions for Joint Transmitter and Noncoherent Receiver Design in MU-MIMO Systems

Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization

TARM: A Turbo-type Algorithm for Affine Rank Minimization

Automatic Layer Separation using Light Field Imaging

Robust Face Recognition by Constrained Part-based Alignment

ROML: A Robust Feature Correspondence Approach for Matching Objects in A Set of Images

Simulation of Implied Volatility Surfaces via Tangent Levy Models

Blind Image Deblurring by Spectral Properties of Convolution Operators

Collaborative Representation based Classification for Face Recognition

PCANet: A Simple Deep Learning Baseline for Image Classification?

Sparse Illumination Learning and Transfer for Single-Sample Face Recognition with Image Corruption and Misalignment

The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices

Compressive Principal Component Pursuit

Fast L1-Minimization Algorithms For Robust Face Recognition

Generalized Principal Component Analysis (GPCA)

Principal Component Pursuit with Reduced Linear Measurements

Robust Recovery of Subspace Structures by Low-Rank Representation

Sparsity and Robustness in Face Recognition

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit

Segmentation of Natural Images by Texture and Boundary Compression

Stable Principal Component Pursuit

TILT: Transform Invariant Low-rank Textures

Underlay Cognitive Radio with Full or Partial Channel Quality Information