Source author record

Xiao Fu

Xiao Fu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning eess.SP Computer Vision Artificial Intelligence Computation and Language Information Theory math.IT math.OC eess.AS eess.IV Information Retrieval physics.app-ph physics.optics Populations and Evolution Social and Information Networks Sound

Catalog footprint

What is connected

29works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Content-Style Identification via Differential Independence

Generative analysis often models multi-domain observations as nonlinear mixtures of domain-invariant content variables and domain-specific style variables. Identifying both factors from unpaired domains enables tasks such as domain transfer and counterfactual data generation. Prior work establishes identifiability under (block-wise) statistical independence between content and style, or via sparse Jacobian assumptions on the nonlinear mixing function, but such conditions can be restrictive in practice. In this work, we introduce content-style differential independence (CSDI), an alternative structural condition requiring that infinitesimal variations in content and style induce orthogonal directions on the data manifold, thereby enabling identifiability even when content and style are dependent and the Jacobian is dense. We operationalize this condition through a blockwise orthogonality constraint on the Jacobian subspaces associated with content and style. To support high-dimensional generative models, we design a stochastic regularizer based on numerical Jacobian approximation, enabling scalable training in settings such as high-resolution image generation. Experiments across multiple datasets corroborate the identifiability analysis and demonstrate practical benefits on counterfactual generation and domain translation.

preprint2026arXiv

Domain Transfer Becomes Identifiable via a Single Alignment

Domain transfer (DT) maps source to target distributions and supports tasks such as unsupervised image-to-image translation, single-cell analysis, and cross-platform medical imaging. However, DT is fundamentally ill-posed: push-forward mappings are generally non-identifiable, as measure-preserving automorphisms (MPAs) preserve marginals while altering cross-domain correspondences, leading to content-misaligned translation. Recent work shows that MPAs can be eliminated by jointly transferring multiple corresponding source/target conditional distributions, but supervision signals labeling such conditionals are not always available in practice. We develop an alternative route to DT identifiability. Under a structural sparsity condition on the Jacobian support pattern, we show that distribution matching together with a single paired anchor sample suffices to identify the ground-truth transfer -- requiring substantially less supervision than prior approaches. To enable practical high-dimensional learning, we further propose an efficient Jacobian sparsity regularizer based on randomized masked finite differences, yielding a scalable surrogate without explicit Jacobian evaluation. Empirical results on synthetic and real-world DT tasks validate the theory.

preprint2026arXiv

Plenoptic Video Generation

Camera-controlled generative video re-rendering methods, such as ReCamMaster, have achieved remarkable progress. However, despite their success in single-view setting, these works often struggle to maintain consistency across multi-view scenarios. Ensuring spatio-temporal coherence in hallucinated regions remains challenging due to the inherent stochasticity of generative models. To address it, we introduce PlenopticDreamer, a framework that synchronizes generative hallucinations to maintain spatio-temporal memory. The core idea is to train a multi-in-single-out video-conditioned model in an autoregressive manner, aided by a camera-guided video retrieval strategy that adaptively selects salient videos from previous generations as conditional inputs. In addition, Our training incorporates progressive context-scaling to improve convergence, self-conditioning to enhance robustness against long-range visual degradation caused by error accumulation, and a long-video conditioning mechanism to support extended video generation. Extensive experiments on the Basic and Agibot benchmarks demonstrate that PlenopticDreamer achieves state-of-the-art video re-rendering, delivering superior view synchronization, high-fidelity visuals, accurate camera control, and diverse view transformations (e.g., third-person to third-person, and head-view to gripper-view in robotic manipulation). Project page: https://research.nvidia.com/labs/dir/plenopticdreamer/

preprint2026arXiv

StreamPhy: Streaming Inference of High-Dimensional Physical Dynamics via State Space Models

Inferring the evolution of high-dimensional and multi-modal (e.g., spatio-temporal) physical fields from irregular sparse measurements in real time is a fundamental challenge in science and engineering. Existing approaches, including diffusion-based generative models and functional tensor methods, typically operate in offline settings, depend on full temporal observations, or incur substantial inference cost. We propose StreamPhy, an end-to-end framework that enables efficient and accurate streaming inference of full-field physical dynamics from incoming irregular sparse measurements. The framework integrates a data-adaptive observation encoder that is robust to arbitrary observation patterns, a structured state-space model that supports memory-efficient online updates across irregular time intervals, and an expressive Functional Tensor Feature-wise Linear Modulation (FT-FiLM) decoder for continuous-field generation. We prove that FT-FiLM is more expressive than the functional Tucker model, admitting a richer function class for handling complex dynamics. Experiments on three representative physical systems under challenging sampling patterns show that StreamPhy consistently outperforms state-of-the-art baselines, with at least 48\% improvement in accuracy and up to 20--100X faster inference than diffusion-based methods.

preprint2022arXiv

A Noise-Robust Loss for Unlabeled Entity Problem in Named Entity Recognition

Named Entity Recognition (NER) is an important task in natural language processing. However, traditional supervised NER requires large-scale annotated datasets. Distantly supervision is proposed to alleviate the massive demand for datasets, but datasets constructed in this way are extremely noisy and have a serious unlabeled entity problem. The cross entropy (CE) loss function is highly sensitive to unlabeled data, leading to severe performance degradation. As an alternative, we propose a new loss function called NRCES to cope with this problem. A sigmoid term is used to mitigate the negative impact of noise. In addition, we balance the convergence and noise tolerance of the model according to samples and the training process. Experiments on synthetic and real-world datasets demonstrate that our approach shows strong robustness in the case of severe unlabeled entity problem, achieving new state-of-the-art on real-world datasets.

preprint2022arXiv

Deep Spectrum Cartography: Completing Radio Map Tensors Using Learned Neural Models

The spectrum cartography (SC) technique constructs multi-domain (e.g., frequency, space, and time) radio frequency (RF) maps from limited measurements, which can be viewed as an ill-posed tensor completion problem. Model-based cartography techniques often rely on handcrafted priors (e.g., sparsity, smoothness and low-rank structures) for the completion task. Such priors may be inadequate to capture the essence of complex wireless environments -- especially when severe shadowing happens. To circumvent such challenges, offline-trained deep neural models of radio maps were considered for SC, as deep neural networks (DNNs) are able to "learn" intricate underlying structures from data. However, such deep learning (DL)-based SC approaches encounter serious challenges in both off-line model learning (training) and completion (generalization), possibly because the latent state space for generating the radio maps is prohibitively large. In this work, an emitter radio map disaggregation-based approach is proposed, under which only individual emitters' radio maps are modeled by DNNs. This way, the learning and generalization challenges can both be substantially alleviated. Using the learned DNNs, a fast nonnegative matrix factorization-based two-stage SC method and a performance-enhanced iterative optimization algorithm are proposed. Theoretical aspects -- such as recoverability of the radio tensor, sample complexity, and noise robustness -- under the proposed framework are characterized, and such theoretical properties have been elusive in the context of DL-based radio tensor completion. Experiments using synthetic and real-data from indoor and heavily shadowed environments are employed to showcase the effectiveness of the proposed methods.

preprint2022arXiv

Fast and Structured Block-Term Tensor Decomposition For Hyperspectral Unmixing

The block-term tensor decomposition model with multilinear rank-$(L_r,L_r,1)$ terms (or, the "LL1 tensor decomposition" in short) offers a valuable alternative for hyperspectral unmixing (HU) under the linear mixture model. Particularly, the LL1 decomposition ensures the endmember/abundance identifiability in scenarios where such guarantees are not supported by the classic matrix factorization (MF) approaches. However, existing LL1-based HU algorithms use a three-factor parameterization of the tensor (i.e., the hyperspectral image cube), which leads to a number of challenges including high per-iteration complexity, slow convergence, and difficulties in incorporating structural prior information. This work puts forth an LL1 tensor decomposition-based HU algorithm that uses a constrained two-factor re-parameterization of the tensor data. As a consequence, a two-block alternating gradient projection (GP)-based LL1 algorithm is proposed for HU. With carefully designed projection solvers, the GP algorithm enjoys a relatively low per-iteration complexity. Like in MF-based HU, the factors under our parameterization correspond to the endmembers and abundances. Thus, the proposed framework is natural to incorporate physics-motivated priors that arise in HU. The proposed algorithm often attains orders-of-magnitude speedup and substantial HU performance gains compared to the existing three-factor parameterization-based HU algorithms.

preprint2022arXiv

Memory-Efficient Convex Optimization for Self-Dictionary Separable Nonnegative Matrix Factorization: A Frank-Wolfe Approach

Nonnegative matrix factorization (NMF) often relies on the separability condition for tractable algorithm design. Separability-based NMF is mainly handled by two types of approaches, namely, greedy pursuit and convex programming. A notable convex NMF formulation is the so-called self-dictionary multiple measurement vectors (SD-MMV), which can work without knowing the matrix rank a priori, and is arguably more resilient to error propagation relative to greedy pursuit. However, convex SD-MMV renders a large memory cost that scales quadratically with the problem size. This memory challenge has been around for a decade, and a major obstacle for applying convex SD-MMV to big data analytics. This work proposes a memory-efficient algorithm for convex SD-MMV. Our algorithm capitalizes on the special update rules of a classic algorithm from the 1950s, namely, the Frank-Wolfe (FW) algorithm. It is shown that, under reasonable conditions, the FW algorithm solves the noisy SD-MMV problem with a memory cost that grows linearly with the amount of data. To handle noisier scenarios, a smoothed group sparsity regularizer is proposed to improve robustness while maintaining the low memory footprint with guarantees. The proposed approach presents the first linear memory complexity algorithmic framework for convex SD-MMV based NMF. The method is tested over a couple of unsupervised learning tasks, i.e., text mining and community detection, to showcase its effectiveness and memory efficiency.

preprint2022arXiv

Metalenses with polarization-independent adaptive nano-antennas

Metalens research has made major advances in recent years. These advances rely on the simple design principle of arranging meta-atoms in regular arrays to create an arbitrary phase and polarization profile. Unfortunately, the concept of equally spaced meta-atoms reaches its limit for high deflection angles where the deflection efficiency decreases. The efficiency can be increased using nano-antennas with multiple elements, but their polarization sensitivity hinders their application in metalenses. Here, we show that by designing polarization-insensitive dimer nano-antennas and abandoning the principle of equally spaced unit cells, polarization-independent ultrahigh numerical aperture (NA=1.48) oil-immersion operation with an efficiency of 43% can be demonstrated. This represents a significant improvement on other polarization-independent designs at visible wavelength. We also use this single layer metalens to replace a conventional objective lens and demonstrate the confocal scanning microscopic imaging of a grating with a period of 300 nm at 532 nm operating wavelength. Overall, our results experimentally demonstrate a novel design concept that further improves metalens performance.

preprint2022arXiv

On Local Linear Convergence of Projected Gradient Descent for Unit-Modulus Least Squares

The unit-modulus least squares (UMLS) problem has a wide spectrum of applications in signal processing, e.g., phase-only beamforming, phase retrieval, radar code design, and sensor network localization. Scalable first-order methods such as projected gradient descent (PGD) have recently been studied as a simple yet efficient approach to solving the UMLS problem. Existing results on the convergence of PGD for UMLS often focus on global convergence to stationary points. As a non-convex problem, only a sublinear convergence rate has been established. However, these results do not explain the fast convergence of PGD frequently observed in practice. This manuscript presents a novel analysis of convergence of PGD for UMLS, justifying the linear convergence behavior of the algorithm near the solution. By exploiting the local structure of the objective function and the constraint set, we establish an exact expression for the convergence rate and characterize the conditions for linear convergence. Simulations show that our theoretical analysis corroborates numerical examples. Furthermore, variants of PGD with adaptive step sizes are proposed based on the new insight revealed in our convergence analysis. The variants show substantial acceleration in practice.

preprint2022arXiv

On the Effectiveness of Pinyin-Character Dual-Decoding for End-to-End Mandarin Chinese ASR

End-to-end automatic speech recognition (ASR) has achieved promising results. However, most existing end-to-end ASR methods neglect the use of specific language characteristics. For Mandarin Chinese ASR tasks, there exist mutual promotion relationship between Pinyin and Character where Chinese characters can be romanized by Pinyin. Based on the above intuition, we first investigate types of end-to-end encoder-decoder based models in the single-input dual-output (SIDO) multi-task framework, after which a novel asynchronous decoding with fuzzy Pinyin sampling method is proposed according to the one-to-one correspondence characteristics between Pinyin and Character. Furthermore, we proposed a two-stage training strategy to make training more stable and converge faster. The results on the test sets of AISHELL-1 dataset show that the proposed enhanced dual-decoder model without a language model is improved by a big margin compared to strong baseline models.

preprint2022arXiv

Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation

Large-scale training data with high-quality annotations is critical for training semantic and instance segmentation models. Unfortunately, pixel-wise annotation is labor-intensive and costly, raising the demand for more efficient labeling strategies. In this work, we present a novel 3D-to-2D label transfer method, Panoptic NeRF, which aims for obtaining per-pixel 2D semantic and instance labels from easy-to-obtain coarse 3D bounding primitives. Our method utilizes NeRF as a differentiable tool to unify coarse 3D annotations and 2D semantic cues transferred from existing datasets. We demonstrate that this combination allows for improved geometry guided by semantic information, enabling rendering of accurate semantic maps across multiple views. Furthermore, this fusion process resolves label ambiguity of the coarse 3D annotations and filters noise in the 2D predictions. By inferring in 3D space and rendering to 2D labels, our 2D semantic and instance labels are multi-view consistent by design. Experimental results show that Panoptic NeRF outperforms existing label transfer methods in terms of accuracy and multi-view consistency on challenging urban scenes of the KITTI-360 dataset.

preprint2022arXiv

Understanding Latent Correlation-Based Multiview Learning and Self-Supervision: An Identifiability Perspective

Multiple views of data, both naturally acquired (e.g., image and audio) and artificially produced (e.g., via adding different noise to data samples), have proven useful in enhancing representation learning. Natural views are often handled by multiview analysis tools, e.g., (deep) canonical correlation analysis [(D)CCA], while the artificial ones are frequently used in self-supervised learning (SSL) paradigms, e.g., BYOL and Barlow Twins. Both types of approaches often involve learning neural feature extractors such that the embeddings of data exhibit high cross-view correlations. Although intuitive, the effectiveness of correlation-based neural embedding is mostly empirically validated. This work aims to understand latent correlation maximization-based deep multiview learning from a latent component identification viewpoint. An intuitive generative model of multiview data is adopted, where the views are different nonlinear mixtures of shared and private components. Since the shared components are view/distortion-invariant, representing the data using such components is believed to reveal the identity of the samples effectively and robustly. Under this model, latent correlation maximization is shown to guarantee the extraction of the shared components across views (up to certain ambiguities). In addition, it is further shown that the private information in each view can be provably disentangled from the shared using proper regularization design. A finite sample analysis, which has been rare in nonlinear mixture identifiability study, is also presented. The theoretical results and newly designed regularization are tested on a series of tasks.

preprint2021arXiv

A More Efficient Chinese Named Entity Recognition base on BERT and Syntactic Analysis

We propose a new Named entity recognition (NER) method to effectively make use of the results of Part-of-speech (POS) tagging, Chinese word segmentation (CWS) and parsing while avoiding NER error caused by POS tagging error. This paper first uses Stanford natural language process (NLP) tool to annotate large-scale untagged data so as to reduce the dependence on the tagged data; then a new NLP model, g-BERT model, is designed to compress Bidirectional Encoder Representations from Transformers (BERT) model in order to reduce calculation quantity; finally, the model is evaluated based on Chinese NER dataset. The experimental results show that the calculation quantity in g-BERT model is reduced by 60% and performance improves by 2% with Test F1 to 96.5 compared with that in BERT model.

preprint2021arXiv

Hyperspectral Denoising Using Unsupervised Disentangled Spatio-Spectral Deep Priors

Image denoising is often empowered by accurate prior information. In recent years, data-driven neural network priors have shown promising performance for RGB natural image denoising. Compared to classic handcrafted priors (e.g., sparsity and total variation), the "deep priors" are learned using a large number of training samples -- which can accurately model the complex image generating process. However, data-driven priors are hard to acquire for hyperspectral images (HSIs) due to the lack of training data. A remedy is to use the so-called unsupervised deep image prior (DIP). Under the unsupervised DIP framework, it is hypothesized and empirically demonstrated that proper neural network structures are reasonable priors of certain types of images, and the network weights can be learned without training data. Nonetheless, the most effective unsupervised DIP structures were proposed for natural images instead of HSIs. The performance of unsupervised DIP-based HSI denoising is limited by a couple of serious challenges, namely, network structure design and network complexity. This work puts forth an unsupervised DIP framework that is based on the classic spatio-spectral decomposition of HSIs. Utilizing the so-called linear mixture model of HSIs, two types of unsupervised DIPs, i.e., U-Net-like network and fully-connected networks, are employed to model the abundance maps and endmembers contained in the HSIs, respectively. This way, empirically validated unsupervised DIP structures for natural images can be easily incorporated for HSI denoising. Besides, the decomposition also substantially reduces network complexity. An efficient alternating optimization algorithm is proposed to handle the formulated denoising problem. Semi-real and real data experiments are employed to showcase the effectiveness of the proposed approach.

preprint2021arXiv

Learning to Continuously Optimize Wireless Resource in a Dynamic Environment: A Bilevel Optimization Perspective

There has been a growing interest in developing data-driven, and in particular deep neural network (DNN) based methods for modern communication tasks. For a few popular tasks such as power control, beamforming, and MIMO detection, these methods achieve state-of-the-art performance while requiring less computational efforts, less resources for acquiring channel state information (CSI), etc. However, it is often challenging for these approaches to learn in a dynamic environment. This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment. Specifically, we consider an ``episodically dynamic" setting where the environment statistics change in ``episodes", and in each episode the environment is stationary. We propose to build the notion of continual learning (CL) into wireless system design, so that the learning model can incrementally adapt to the new episodes, {\it without forgetting} knowledge learned from the previous episodes. Our design is based on a novel bilevel optimization formulation which ensures certain ``fairness" across different data samples. We demonstrate the effectiveness of the CL approach by integrating it with two popular DNN based models for power control and beamforming, respectively, and testing using both synthetic and ray-tracing based data sets. These numerical results show that the proposed CL approach is not only able to adapt to the new scenarios quickly and seamlessly, but importantly, it also maintains high performance over the previously encountered scenarios as well.

preprint2021arXiv

StatEcoNet: Statistical Ecology Neural Networks for Species Distribution Modeling

This paper focuses on a core task in computational sustainability and statistical ecology: species distribution modeling (SDM). In SDM, the occurrence pattern of a species on a landscape is predicted by environmental features based on observations at a set of locations. At first, SDM may appear to be a binary classification problem, and one might be inclined to employ classic tools (e.g., logistic regression, support vector machines, neural networks) to tackle it. However, wildlife surveys introduce structured noise (especially under-counting) in the species observations. If unaccounted for, these observation errors systematically bias SDMs. To address the unique challenges of SDM, this paper proposes a framework called StatEcoNet. Specifically, this work employs a graphical generative model in statistical ecology to serve as the skeleton of the proposed computational framework and carefully integrates neural networks under the framework. The advantages of StatEcoNet over related approaches are demonstrated on simulated datasets as well as bird species data. Since SDMs are critical tools for ecological science and natural resource management, StatEcoNet may offer boosted computational and analytical powers to a wide range of applications that have significant social impacts, e.g., the study and conservation of threatened species.

preprint2021arXiv

Stochastic Mirror Descent for Low-Rank Tensor Decomposition Under Non-Euclidean Losses

This work considers low-rank canonical polyadic decomposition (CPD) under a class of non-Euclidean loss functions that frequently arise in statistical machine learning and signal processing. These loss functions are often used for certain types of tensor data, e.g., count and binary tensors, where the least squares loss is considered unnatural.Compared to the least squares loss, the non-Euclidean losses are generally more challenging to handle. Non-Euclidean CPD has attracted considerable interests and a number of prior works exist. However, pressing computational and theoretical challenges, such as scalability and convergence issues, still remain. This work offers a unified stochastic algorithmic framework for large-scale CPD decomposition under a variety of non-Euclidean loss functions. Our key contribution lies in a tensor fiber sampling strategy-based flexible stochastic mirror descent framework. Leveraging the sampling scheme and the multilinear algebraic structure of low-rank tensors, the proposed lightweight algorithm ensures global convergence to a stationary point under reasonable conditions. Numerical results show that our framework attains promising non-Euclidean CPD performance. The proposed framework also exhibits substantial computational savings compared to state-of-the-art methods.

preprint2020arXiv

Block-Randomized Stochastic Proximal Gradient for Low-Rank Tensor Factorization

This work considers the problem of computing the canonical polyadic decomposition (CPD) of large tensors. Prior works mostly leverage data sparsity to handle this problem, which is not suitable for handling dense tensors that often arise in applications such as medical imaging, computer vision, and remote sensing. Stochastic optimization is known for its low memory cost and per-iteration complexity when handling dense data. However, exisiting stochastic CPD algorithms are not flexible enough to incorporate a variety of constraints/regularizations that are of interest in signal and data analytics. Convergence properties of many such algorithms are also unclear. In this work, we propose a stochastic optimization framework for large-scale CPD with constraints/regularizations. The framework works under a doubly randomized fashion, and can be regarded as a judicious combination of randomized block coordinate descent (BCD) and stochastic proximal gradient (SPG). The algorithm enjoys lightweight updates and small memory footprint. In addition, this framework entails considerable flexibility---many frequently used regularizers and constraints can be readily handled under the proposed scheme. The approach is also supported by convergence analysis. Numerical results on large-scale dense tensors are employed to showcase the effectiveness of the proposed approach.

preprint2020arXiv

Nonlinear Multiview Analysis: Identifiability and Neural Network-assisted Implementation

Multiview analysis aims at extracting shared latent components from data samples that are acquired in different domains, e.g., image, text, and audio. Classic multiview analysis, e.g., canonical correlation analysis (CCA), tackles this problem via matching the linearly transformed views in a certain latent domain. More recently, powerful nonlinear learning tools such as kernel methods and neural networks are utilized for enhancing the classic CCA. However, unlike linear CCA whose theoretical aspects are clearly understood, nonlinear CCA approaches are largely intuition-driven. In particular, it is unclear under what conditions the shared latent components across the views can be identified---while identifiability plays an essential role in many applications. In this work, we revisit nonlinear multiview analysis and address both the theoretical and computational aspects. Our work leverages a useful nonlinear model, namely, the post-nonlinear model, from the nonlinear mixture separation literature. Combining with multiview data, we take a nonlinear multiview mixture learning viewpoint, which is a natural extension of the classic generative models for linear CCA. From there, we derive a learning criterion. We show that minimizing this criterion leads to identification of the latent shared components up to certain ambiguities, under reasonable conditions. Our derivation and formulation also offer new insights and interpretations to existing deep neural network-based CCA formulations. On the computation side, we propose an effective algorithm with simple and scalable update rules. A series of simulations and real-data experiments corroborate our theoretical analysis.

preprint2020arXiv

On Recoverability of Randomly Compressed Tensors with Low CP Rank

Our interest lies in the recoverability properties of compressed tensors under the \textit{canonical polyadic decomposition} (CPD) model. The considered problem is well-motivated in many applications, e.g., hyperspectral image and video compression. Prior work studied this problem under somewhat special assumptions---e.g., the latent factors of the tensor are sparse or drawn from absolutely continuous distributions. We offer an alternative result: We show that if the tensor is compressed by a subgaussian linear mapping, then the tensor is recoverable if the number of measurements is on the same order of magnitude as that of the model parameters---without strong assumptions on the latent factors. Our proof is based on deriving a \textit{restricted isometry property} (R.I.P.) under the CPD model via set covering techniques, and thus exhibits a flavor of classic compressive sensing. The new recoverability result enriches the understanding to the compressed CP tensor recovery problem; it offers theoretical guarantees for recovering tensors whose elements are not necessarily continuous or sparse.

preprint2019arXiv

Amplitude Retrieval for Channel Estimation of MIMO Systems with One-Bit ADCs

This letter revisits the channel estimation problem for MIMO systems with one-bit analog-to-digital converters (ADCs) through a novel algorithm--Amplitude Retrieval (AR). Unlike the state-of-the-art methods such as those based on one-bit compressive sensing, AR takes a different approach. It accounts for the lost amplitudes of the one-bit quantized measurements, and performs channel estimation and amplitude completion jointly. This way, the direction information of the propagation paths can be estimated via accurate direction finding algorithms in array processing, e.g., maximum likelihood. The upsot is that AR is able to handle off-grid angles and provide more accurate channel estimates. Simulation results are included to showcase the advantages of AR.

preprint2019arXiv

Learning Nonlinear Mixtures: Identifiability and Algorithm

Linear mixture models have proven very useful in a plethora of applications, e.g., topic modeling, clustering, and source separation. As a critical aspect of the linear mixture models, identifiability of the model parameters is well-studied, under frameworks such as independent component analysis and constrained matrix factorization. Nevertheless, when the linear mixtures are distorted by an unknown nonlinear functions -- which is well-motivated and more realistic in many cases -- the identifiability issues are much less studied. This work proposes an identification criterion for a nonlinear mixture model that is well grounded in many real-world applications, and offers identifiability guarantees. A practical implementation based on a judiciously designed neural network is proposed to realize the criterion, and an effective learning algorithm is proposed. Numerical results on synthetic and real-data corroborate effectiveness of the proposed method.

preprint2019arXiv

Tensor Completion from Regular Sub-Nyquist Samples

Signal sampling and reconstruction is a fundamental engineering task at the heart of signal processing. The celebrated Shannon-Nyquist theorem guarantees perfect signal reconstruction from uniform samples, obtained at a rate twice the maximum frequency present in the signal. Unfortunately a large number of signals of interest are far from being band-limited. This motivated research on reconstruction from sub-Nyquist samples, which mainly hinges on the use of random / incoherent sampling procedures. However, uniform or regular sampling is more appealing in practice and from the system design point of view, as it is far simpler to implement, and often necessary due to system constraints. In this work, we study regular sampling and reconstruction of three- or higher-dimensional signals (tensors). We show that reconstructing a tensor signal from regular samples is feasible. Under the proposed framework, the sample complexity is determined by the tensor rank---rather than the signal bandwidth. This result offers new perspectives for designing practical regular sampling patterns and systems for signals that are naturally tensors, e.g., images and video. For a concrete application, we show that functional magnetic resonance imaging (fMRI) acceleration is a tensor sampling problem, and design practical sampling schemes and an algorithmic framework to handle it. Numerical results show that our tensor sampling strategy accelerates the fMRI sampling process significantly without sacrificing reconstruction accuracy.

preprint2016arXiv

Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm

In topic modeling, many algorithms that guarantee identifiability of the topics have been developed under the premise that there exist anchor words -- i.e., words that only appear (with positive probability) in one topic. Follow-up work has resorted to three or higher-order statistics of the data corpus to relax the anchor word assumption. Reliable estimates of higher-order statistics are hard to obtain, however, and the identification of topics under those models hinges on uncorrelatedness of the topics, which can be unrealistic. This paper revisits topic modeling based on second-order moments, and proposes an anchor-free topic mining framework. The proposed approach guarantees the identification of the topics under a much milder condition compared to the anchor-word assumption, thereby exhibiting much better robustness in practice. The associated algorithm only involves one eigen-decomposition and a few small linear programs. This makes it easy to implement and scale up to very large problem instances. Experiments using the TDT2 and Reuters-21578 corpus demonstrate that the proposed anchor-free approach exhibits very favorable performance (measured using coherence, similarity count, and clustering accuracy metrics) compared to the prior art.

preprint2016arXiv

Learning From Hidden Traits: Joint Factor Analysis and Latent Clustering

Dimensionality reduction techniques play an essential role in data analytics, signal processing and machine learning. Dimensionality reduction is usually performed in a preprocessing stage that is separate from subsequent data analysis, such as clustering or classification. Finding reduced-dimension representations that are well-suited for the intended task is more appealing. This paper proposes a joint factor analysis and latent clustering framework, which aims at learning cluster-aware low-dimensional representations of matrix and tensor data. The proposed approach leverages matrix and tensor factorization models that produce essentially unique latent representations of the data to unravel latent cluster structure -- which is otherwise obscured because of the freedom to apply an oblique transformation in latent space. At the same time, latent cluster structure is used as prior information to enhance the performance of factorization. Specific contributions include several custom-built problem formulations, corresponding algorithms, and discussion of associated convergence properties. Besides extensive simulations, real-world datasets such as Reuters document data and MNIST image data are also employed to showcase the effectiveness of the proposed approaches.

preprint2016arXiv

Robust Volume Minimization-Based Matrix Factorization for Remote Sensing and Document Clustering

This paper considers \emph{volume minimization} (VolMin)-based structured matrix factorization (SMF). VolMin is a factorization criterion that decomposes a given data matrix into a basis matrix times a structured coefficient matrix via finding the minimum-volume simplex that encloses all the columns of the data matrix. Recent work showed that VolMin guarantees the identifiability of the factor matrices under mild conditions that are realistic in a wide variety of applications. This paper focuses on both theoretical and practical aspects of VolMin. On the theory side, exact equivalence of two independently developed sufficient conditions for VolMin identifiability is proven here, thereby providing a more comprehensive understanding of this aspect of VolMin. On the algorithm side, computational complexity and sensitivity to outliers are two key challenges associated with real-world applications of VolMin. These are addressed here via a new VolMin algorithm that handles volume regularization in a computationally simple way, and automatically detects and {iteratively downweights} outliers, simultaneously. Simulations and real-data experiments using a remotely sensed hyperspectral image and the Reuters document corpus are employed to showcase the effectiveness of the proposed algorithm.

preprint2015arXiv

Self-Dictionary Sparse Regression for Hyperspectral Unmixing: Greedy Pursuit and Pure Pixel Search are Related

This paper considers a recently emerged hyperspectral unmixing formulation based on sparse regression of a self-dictionary multiple measurement vector (SD-MMV) model, wherein the measured hyperspectral pixels are used as the dictionary. Operating under the pure pixel assumption, this SD-MMV formalism is special in that it allows simultaneous identification of the endmember spectral signatures and the number of endmembers. Previous SD-MMV studies mainly focus on convex relaxations. In this study, we explore the alternative of greedy pursuit, which generally provides efficient and simple algorithms. In particular, we design a greedy SD-MMV algorithm using simultaneous orthogonal matching pursuit. Intriguingly, the proposed greedy algorithm is shown to be closely related to some existing pure pixel search algorithms, especially, the successive projection algorithm (SPA). Thus, a link between SD-MMV and pure pixel search is revealed. We then perform exact recovery analyses, and prove that the proposed greedy algorithm is robust to noise---including its identification of the (unknown) number of endmembers---under a sufficiently low noise level. The identification performance of the proposed greedy algorithm is demonstrated through both synthetic and real-data experiments.

preprint2015arXiv

Semiblind Hyperspectral Unmixing in the Presence of Spectral Library Mismatches

The dictionary-aided sparse regression (SR) approach has recently emerged as a promising alternative to hyperspectral unmixing (HU) in remote sensing. By using an available spectral library as a dictionary, the SR approach identifies the underlying materials in a given hyperspectral image by selecting a small subset of spectral samples in the dictionary to represent the whole image. A drawback with the current SR developments is that an actual spectral signature in the scene is often assumed to have zero mismatch with its corresponding dictionary sample, and such an assumption is considered too ideal in practice. In this paper, we tackle the spectral signature mismatch problem by proposing a dictionary-adjusted nonconvex sparsity-encouraging regression (DANSER) framework. The main idea is to incorporate dictionary correcting variables in an SR formulation. A simple and low per-iteration complexity algorithm is tailor-designed for practical realization of DANSER. Using the same dictionary correcting idea, we also propose a robust subspace solution for dictionary pruning. Extensive simulations and real-data experiments show that the proposed method is effective in mitigating the undesirable spectral signature mismatch effects.

Xiao Fu

What is connected

Connect this record

See the researcher in context

Building this map preview

29 published item(s)

Content-Style Identification via Differential Independence

Domain Transfer Becomes Identifiable via a Single Alignment

Plenoptic Video Generation

StreamPhy: Streaming Inference of High-Dimensional Physical Dynamics via State Space Models

A Noise-Robust Loss for Unlabeled Entity Problem in Named Entity Recognition

Deep Spectrum Cartography: Completing Radio Map Tensors Using Learned Neural Models

Fast and Structured Block-Term Tensor Decomposition For Hyperspectral Unmixing

Memory-Efficient Convex Optimization for Self-Dictionary Separable Nonnegative Matrix Factorization: A Frank-Wolfe Approach

Metalenses with polarization-independent adaptive nano-antennas

On Local Linear Convergence of Projected Gradient Descent for Unit-Modulus Least Squares

On the Effectiveness of Pinyin-Character Dual-Decoding for End-to-End Mandarin Chinese ASR

Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation

Understanding Latent Correlation-Based Multiview Learning and Self-Supervision: An Identifiability Perspective

A More Efficient Chinese Named Entity Recognition base on BERT and Syntactic Analysis

Hyperspectral Denoising Using Unsupervised Disentangled Spatio-Spectral Deep Priors

Learning to Continuously Optimize Wireless Resource in a Dynamic Environment: A Bilevel Optimization Perspective

StatEcoNet: Statistical Ecology Neural Networks for Species Distribution Modeling

Stochastic Mirror Descent for Low-Rank Tensor Decomposition Under Non-Euclidean Losses

Block-Randomized Stochastic Proximal Gradient for Low-Rank Tensor Factorization

Nonlinear Multiview Analysis: Identifiability and Neural Network-assisted Implementation

On Recoverability of Randomly Compressed Tensors with Low CP Rank

Amplitude Retrieval for Channel Estimation of MIMO Systems with One-Bit ADCs

Learning Nonlinear Mixtures: Identifiability and Algorithm

Tensor Completion from Regular Sub-Nyquist Samples

Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm

Learning From Hidden Traits: Joint Factor Analysis and Latent Clustering

Robust Volume Minimization-Based Matrix Factorization for Remote Sensing and Document Clustering

Self-Dictionary Sparse Regression for Hyperspectral Unmixing: Greedy Pursuit and Pure Pixel Search are Related

Semiblind Hyperspectral Unmixing in the Presence of Spectral Library Mismatches