Source author record

Xin Yuan

Xin Yuan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

55works

25topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Deep Probabilistic Unfolding for Quantized Compressive Sensing

We propose a deep probabilistic unfolding model to address the classical quantized compressive sensing problem that leverages an unfolding framework to enhance the reconstruction accuracy and efficiency. Unlike previous unfolding methods that apply L2 projection to measurements, we derive a closed-form, numerically stable likelihood gradient projection, which allows the model to respect the true quantization physics, turning the hard quantization constraint into a soft probabilistic guidance. Furthermore, an efficient, dual-domain Mamba module is specifically designed to dynamically capture and fuse the multi-scale local and global features, ensuring the interactions between the distant but correlated regions. Extensive experiments demonstrate the state-of-the-art performance of the proposed method over previous works, which is capable of promoting the application of quantized compressive sensing in real life.

preprint2023arXiv

AdaVocoder: Adaptive Vocoder for Custom Voice

Custom voice is to construct a personal speech synthesis system by adapting the source speech synthesis model to the target model through the target few recordings. The solution to constructing a custom voice is to combine an adaptive acoustic model with a robust vocoder. However, training a robust vocoder usually requires a multi-speaker dataset, which should include various age groups and various timbres, so that the trained vocoder can be used for unseen speakers. Collecting such a multi-speaker dataset is difficult, and the dataset distribution always has a mismatch with the distribution of the target speaker dataset. This paper proposes an adaptive vocoder for custom voice from another novel perspective to solve the above problems. The adaptive vocoder mainly uses a cross-domain consistency loss to solve the overfitting problem encountered by the GAN-based neural vocoder in the transfer learning of few-shot scenes. We construct two adaptive vocoders, AdaMelGAN and AdaHiFi-GAN. First, We pre-train the source vocoder model on AISHELL3 and CSMSC datasets, respectively. Then, fine-tune it on the internal dataset VXI-children with few adaptation data. The empirical results show that a high-quality custom voice system can be built by combining a adaptive acoustic model with a adaptive vocoder.

preprint2023arXiv

Large-scale Global Low-rank Optimization for Computational Compressed Imaging

Computational reconstruction plays a vital role in computer vision and computational photography. Most of the conventional optimization and deep learning techniques explore local information for reconstruction. Recently, nonlocal low-rank (NLR) reconstruction has achieved remarkable success in improving accuracy and generalization. However, the computational cost has inhibited NLR from seeking global structural similarity, which consequentially keeps it trapped in the tradeoff between accuracy and efficiency and prevents it from high-dimensional large-scale tasks. To address this challenge, we report here the global low-rank (GLR) optimization technique, realizing highly-efficient large-scale reconstruction with global self-similarity. Inspired by the self-attention mechanism in deep learning, GLR extracts exemplar image patches by feature detection instead of conventional uniform selection. This directly produces key patches using structural features to avoid burdensome computational redundancy. Further, it performs patch matching across the entire image via neural-based convolution, which produces the global similarity heat map in parallel, rather than conventional sequential block-wise matching. As such, GLR improves patch grouping efficiency by more than one order of magnitude. We experimentally demonstrate GLR's effectiveness on temporal, frequency, and spectral dimensions, including different computational imaging modalities of compressive temporal imaging, magnetic resonance imaging, and multispectral filter array demosaicing. This work presents the superiority of inherent fusion of deep learning strategies and iterative optimization, and breaks the persistent dilemma of the tradeoff between accuracy and efficiency for various large-scale reconstruction tasks.

preprint2023arXiv

Learning-based Intelligent Surface Configuration, User Selection, Channel Allocation, and Modulation Adaptation for Jamming-resisting Multiuser OFDMA Systems

Reconfigurable intelligent surfaces (RISs) can potentially combat jamming attacks by diffusing jamming signals. This paper jointly optimizes user selection, channel allocation, modulation-coding, and RIS configuration in a multiuser OFDMA system under a jamming attack. This problem is non-trivial and has never been addressed, because of its mixed-integer programming nature and difficulties in acquiring channel state information (CSI) involving the RIS and jammer. We propose a new deep reinforcement learning (DRL)-based approach, which learns only through changes in the received data rates of the users to reject the jamming signals and maximize the sum rate of the system. The key idea is that we decouple the discrete selection of users, channels, and modulation-coding from the continuous RIS configuration, hence facilitating the RIS configuration with the latest twin delayed deep deterministic policy gradient (TD3) model. Another important aspect is that we show a winner-takes-all strategy is almost surely optimal for selecting the users, channels, and modulation-coding, given a learned RIS configuration. Simulations show that the new approach converges fast to fulfill the benefit of the RIS, due to its substantially small state and action spaces. Without the need of the CSI, the approach is promising and offers practical value.

preprint2023arXiv

Low-Resource Mongolian Speech Synthesis Based on Automatic Prosody Annotation

While deep learning-based text-to-speech (TTS) models such as VITS have shown excellent results, they typically require a sizable set of high-quality <text, audio> pairs to train, which is expensive to collect. So far, most languages in the world still lack the training data needed to develop TTS systems. This paper proposes two improvement methods for the two problems faced by low-resource Mongolian speech synthesis: a) In view of the lack of high-quality <text, audio> pairs of data, it is difficult to model the mapping problem from linguistic features to acoustic features. Improvements are made using pre-trained VITS model and transfer learning methods. b) In view of the problem of less labeled information, this paper proposes to use an automatic prosodic annotation method to label the prosodic information of text and corresponding speech, thereby improving the naturalness and intelligibility of low-resource Mongolian language. Through empirical research, the N-MOS of the method proposed in this paper is 4.195, and the I-MOS is 4.228.

preprint2022arXiv

A Simple and Efficient Reconstruction Backbone for Snapshot Compressive Imaging

The emerging technology of snapshot compressive imaging (SCI) enables capturing high dimensional (HD) data in an efficient way. It is generally implemented by two components: an optical encoder that compresses HD signals into a 2D measurement and an algorithm decoder that retrieves the HD data upon the hardware-encoded measurement. Over a broad range of SCI applications, hyperspectral imaging (HSI) and video compressive sensing have received significant research attention in recent years. Among existing SCI reconstruction algorithms, deep learning-based methods stand out as their promising performance and efficient inference. However, the deep reconstruction network may suffer from overlarge model size and highly-specialized network design, which inevitably lead to costly training time, high memory usage, and limited flexibility, thus discouraging the deployments of SCI systems in practical scenarios. In this paper, we tackle the above challenges by proposing a simple yet highly efficient reconstruction method, namely stacked residual network (SRN), by revisiting the residual learning strategy with nested structures and spatial-invariant property. The proposed SRN empowers high-fidelity data retrieval with fewer computation operations and negligible model size compared with existing networks, and also serves as a versatile backbone applicable for both hyperspectral and video data. Based on the proposed backbone, we first develop the channel attention enhanced SRN (CAE-SRN) to explore the spectral inter-dependencies for fine-grained spatial estimation in HSI. We then employ SRN as a deep denoiser and incorporate it into a generalized alternating projection (GAP) framework -- resulting in GAP-SRN -- to handle the video compressive sensing task. Experimental results demonstrate the state-of-the-art performance, high computational efficiency of the proposed SRN on two SCI applications.

preprint2022arXiv

Adaptive Deep PnP Algorithm for Video Snapshot Compressive Imaging

Video Snapshot compressive imaging (SCI) is a promising technique to capture high-speed videos, which transforms the imaging speed from the detector to mask modulating and only needs a single measurement to capture multiple frames. The algorithm to reconstruct high-speed frames from the measurement plays a vital role in SCI. In this paper, we consider the promising reconstruction algorithm framework, namely plug-and-play (PnP), which is flexible to the encoding process comparing with other deep learning networks. One drawback of existing PnP algorithms is that they use a pre-trained denoising network as a plugged prior while the training data of the network might be different from the task in real applications. Towards this end, in this work, we propose the online PnP algorithm which can adaptively update the network's parameters within the PnP iteration; this makes the denoising network more applicable to the desired data in the SCI reconstruction. Furthermore, for color video imaging, RGB frames need to be recovered from Bayer pattern or named demosaicing in the camera pipeline. To address this challenge, we design a two-stage reconstruction framework to optimize these two coupled ill-posed problems and introduce a deep demosaicing prior specifically for video demosaicing which does not have much past works instead of using single image demosaicing networks. Extensive results on both simulation and real datasets verify the superiority of our adaptive deep PnP algorithm.

preprint2022arXiv

Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction

Many algorithms have been developed to solve the inverse problem of coded aperture snapshot spectral imaging (CASSI), i.e., recovering the 3D hyperspectral images (HSIs) from a 2D compressive measurement. In recent years, learning-based methods have demonstrated promising performance and dominated the mainstream research direction. However, existing CNN-based methods show limitations in capturing long-range dependencies and non-local self-similarity. Previous Transformer-based methods densely sample tokens, some of which are uninformative, and calculate the multi-head self-attention (MSA) between some tokens that are unrelated in content. This does not fit the spatially sparse nature of HSI signals and limits the model scalability. In this paper, we propose a novel Transformer-based method, coarse-to-fine sparse Transformer (CST), firstly embedding HSI sparsity into deep learning for HSI reconstruction. In particular, CST uses our proposed spectra-aware screening mechanism (SASM) for coarse patch selecting. Then the selected patches are fed into our customized spectra-aggregation hashing multi-head self-attention (SAH-MSA) for fine pixel clustering and self-similarity capturing. Comprehensive experiments show that our CST significantly outperforms state-of-the-art methods while requiring cheaper computational costs. The code and models will be released at https://github.com/caiyuanhao1998/MST

preprint2022arXiv

CoNet: Borderless and decentralized server cooperation in edge computing

In edge computing (EC), by offloading tasks to edge server or remote cloud, the system performance can be improved greatly. However, since the traffic distribution in EC is heterogeneous and dynamic, it is difficult for an individual edge server to provide satisfactory computation service anytime and anywhere. This issue motivated the researchers to study the cooperation between edge servers. The previous server cooperation algorithms have disadvantages since the cooperated region is limited within one-hop. However, the performance of EC can be improved further by releasing the restriction of cooperation region. Even some works have extended the cooperated region to multi-hops, they fail to support the task offloading which is one of the core issues of edge computing. Therefore, we propose a new decentralized and borderless server cooperation algorithm for edge computing which takes task offloading strategy into account, named CoNet. In CoNet, the cooperation region is not limited. Each server forms its own basic cooperation unit (BCU) and calculates its announced capability based on BCU. The server's capability, the processing delay, the task and calculation result forwarding delay are considered during the calculation. The task division strategy bases on the real capability of host-server and the announced capability of cooperation-servers. This cooperation process is recursive and will be terminated once the terminal condition is satisfied. The simulation results demonstrate the advantages of CoNet over previous works.

preprint2022arXiv

Dispersed Pixel Perturbation-based Imperceptible Backdoor Trigger for Image Classifier Models

Typical deep neural network (DNN) backdoor attacks are based on triggers embedded in inputs. Existing imperceptible triggers are computationally expensive or low in attack success. In this paper, we propose a new backdoor trigger, which is easy to generate, imperceptible, and highly effective. The new trigger is a uniformly randomly generated three-dimensional (3D) binary pattern that can be horizontally and/or vertically repeated and mirrored and superposed onto three-channel images for training a backdoored DNN model. Dispersed throughout an image, the new trigger produces weak perturbation to individual pixels, but collectively holds a strong recognizable pattern to train and activate the backdoor of the DNN. We also analytically reveal that the trigger is increasingly effective with the improving resolution of the images. Experiments are conducted using the ResNet-18 and MLP models on the MNIST, CIFAR-10, and BTSR datasets. In terms of imperceptibility, the new trigger outperforms existing triggers, such as BadNets, Trojaned NN, and Hidden Backdoor, by over an order of magnitude. The new trigger achieves an almost 100% attack success rate, only reduces the classification accuracy by less than 0.7%-2.4%, and invalidates the state-of-the-art defense techniques.

preprint2022arXiv

Ensemble learning priors unfolding for scalable Snapshot Compressive Sensing

Snapshot compressive imaging (SCI) can record the 3D information by a 2D measurement and from this 2D measurement to reconstruct the original 3D information by reconstruction algorithm. As we can see, the reconstruction algorithm plays a vital role in SCI. Recently, deep learning algorithm show its outstanding ability, outperforming the traditional algorithm. Therefore, to improve deep learning algorithm reconstruction accuracy is an inevitable topic for SCI. Besides, deep learning algorithms are usually limited by scalability, and a well trained model in general can not be applied to new systems if lacking the new training process. To address these problems, we develop the ensemble learning priors to further improve the reconstruction accuracy and propose the scalable learning to empower deep learning the scalability just like the traditional algorithm. What's more, our algorithm has achieved the state-of-the-art results, outperforming existing algorithms. Extensive results on both simulation and real datasets demonstrate the superiority of our proposed algorithm. The code and models will be released to the public.

preprint2022arXiv

HDNet: High-resolution Dual-domain Learning for Spectral Compressive Imaging

The rapid development of deep learning provides a better solution for the end-to-end reconstruction of hyperspectral image (HSI). However, existing learning-based methods have two major defects. Firstly, networks with self-attention usually sacrifice internal resolution to balance model performance against complexity, losing fine-grained high-resolution (HR) features. Secondly, even if the optimization focusing on spatial-spectral domain learning (SDL) converges to the ideal solution, there is still a significant visual difference between the reconstructed HSI and the truth. Therefore, we propose a high-resolution dual-domain learning network (HDNet) for HSI reconstruction. On the one hand, the proposed HR spatial-spectral attention module with its efficient feature fusion provides continuous and fine pixel-level features. On the other hand, frequency domain learning (FDL) is introduced for HSI reconstruction to narrow the frequency domain discrepancy. Dynamic FDL supervision forces the model to reconstruct fine-grained frequencies and compensate for excessive smoothing and distortion caused by pixel-level losses. The HR pixel-level attention and frequency-level refinement in our HDNet mutually promote HSI perceptual quality. Extensive quantitative and qualitative evaluation experiments show that our method achieves SOTA performance on simulated and real HSI datasets. Code and models will be released at https://github.com/caiyuanhao1998/MST

preprint2022arXiv

Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

Hyperspectral image (HSI) reconstruction aims to recover the 3D spatial-spectral signal from a 2D measurement in the coded aperture snapshot spectral imaging (CASSI) system. The HSI representations are highly similar and correlated across the spectral dimension. Modeling the inter-spectra interactions is beneficial for HSI reconstruction. However, existing CNN-based methods show limitations in capturing spectral-wise similarity and long-range dependencies. Besides, the HSI information is modulated by a coded aperture (physical mask) in CASSI. Nonetheless, current algorithms have not fully explored the guidance effect of the mask for HSI restoration. In this paper, we propose a novel framework, Mask-guided Spectral-wise Transformer (MST), for HSI reconstruction. Specifically, we present a Spectral-wise Multi-head Self-Attention (S-MSA) that treats each spectral feature as a token and calculates self-attention along the spectral dimension. In addition, we customize a Mask-guided Mechanism (MM) that directs S-MSA to pay attention to spatial regions with high-fidelity spectral representations. Extensive experiments show that our MST significantly outperforms state-of-the-art (SOTA) methods on simulation and real HSI datasets while requiring dramatically cheaper computational and memory costs. Code and pre-trained models are available at https://github.com/caiyuanhao1998/MST/

preprint2022arXiv

Spatial-Temporal Transformer for Video Snapshot Compressive Imaging

Video snapshot compressive imaging (SCI) captures multiple sequential video frames by a single measurement using the idea of computational imaging. The underlying principle is to modulate high-speed frames through different masks and these modulated frames are summed to a single measurement captured by a low-speed 2D sensor (dubbed optical encoder); following this, algorithms are employed to reconstruct the desired high-speed frames (dubbed software decoder) if needed. In this paper, we consider the reconstruction algorithm in video SCI, i.e., recovering a series of video frames from a compressed measurement. Specifically, we propose a Spatial-Temporal transFormer (STFormer) to exploit the correlation in both spatial and temporal domains. STFormer network is composed of a token generation block, a video reconstruction block, and these two blocks are connected by a series of STFormer blocks. Each STFormer block consists of a spatial self-attention branch, a temporal self-attention branch and the outputs of these two branches are integrated by a fusion network. Extensive results on both simulated and real data demonstrate the state-of-the-art performance of STFormer. The code and models are publicly available at https://github.com/ucaswangls/STFormer.git

preprint2022arXiv

Text-to-Image Generation via Implicit Visual Guidance and Hypernetwork

We develop an approach for text-to-image generation that embraces additional retrieval images, driven by a combination of implicit visual guidance loss and generative objectives. Unlike most existing text-to-image generation methods which merely take the text as input, our method dynamically feeds cross-modal search results into a unified training stage, hence improving the quality, controllability and diversity of generation results. We propose a novel hypernetwork modulated visual-text encoding scheme to predict the weight update of the encoding layer, enabling effective transfer from visual information (e.g. layout, content) into the corresponding latent domain. Experimental results show that our model guided with additional retrieval visual data outperforms existing GAN-based models. On COCO dataset, we achieve better FID of $9.13$ with up to $3.5 \times$ fewer generator parameters, compared with the state-of-the-art method.

preprint2022arXiv

Trajectory Planning of Cellular-Connected UAV for Communication-assisted Radar Sensing

Being a key technology for beyond fifth-generation wireless systems, joint communication and radar sensing (JCAS) utilizes the reflections of communication signals to detect foreign objects and deliver situational awareness. A cellular-connected unmanned aerial vehicle (UAV) is uniquely suited to form a mobile bistatic synthetic aperture radar (SAR) with its serving base station (BS) to sense over large areas with superb sensing resolutions at no additional requirement of spectrum. This paper designs this novel BS-UAV bistatic SAR platform, and optimizes the flight path of the UAV to minimize its propulsion energy and guarantee the required sensing resolutions on a series of interesting landmarks. A new trajectory planning algorithm is developed to convexify the propulsion energy and resolution requirements by using successive convex approximation and block coordinate descent. Effective trajectories are obtained with a polynomial complexity. Extensive simulations reveal that the proposed trajectory planning algorithm outperforms significantly its alternative that minimizes the flight distance of cellular-aided sensing missions in terms of energy efficiency and effective consumption fluctuation. The energy saving offered by the proposed algorithm can be as significant as 55\%.

preprint2022arXiv

Two-Stage is Enough: A Concise Deep Unfolding Reconstruction Network for Flexible Video Compressive Sensing

We consider the reconstruction problem of video compressive sensing (VCS) under the deep unfolding/rolling structure. Yet, we aim to build a flexible and concise model using minimum stages. Different from existing deep unfolding networks used for inverse problems, where more stages are used for higher performance but without flexibility to different masks and scales, hereby we show that a 2-stage deep unfolding network can lead to the state-of-the-art (SOTA) results (with a 1.7dB gain in PSNR over the single stage model, RevSCI) in VCS. The proposed method possesses the properties of adaptation to new masks and ready to scale to large data without any additional training thanks to the advantages of deep unfolding. Furthermore, we extend the proposed model for color VCS to perform joint reconstruction and demosaicing. Experimental results demonstrate that our 2-stage model has also achieved SOTA on color VCS reconstruction, leading to a >2.3dB gain in PSNR over the previous SOTA algorithm based on plug-and-play framework, meanwhile speeds up the reconstruction by >17 times. In addition, we have found that our network is also flexible to the mask modulation and scale size for color VCS reconstruction so that a single trained network can be applied to different hardware systems. The code and models will be released to the public.

preprint2022arXiv

When Internet of Things meets Metaverse: Convergence of Physical and Cyber Worlds

In recent years, the Internet of Things (IoT) is studied in the context of the Metaverse to provide users immersive cyber-virtual experiences in mixed reality environments. This survey introduces six typical IoT applications in the Metaverse, including collaborative healthcare, education, smart city, entertainment, real estate, and socialization. In the IoT-inspired Metaverse, we also comprehensively survey four pillar technologies that enable augmented reality (AR) and virtual reality (VR), namely, responsible artificial intelligence (AI), high-speed data communications, cost-effective mobile edge computing (MEC), and digital twins. According to the physical-world demands, we outline the current industrial efforts and seven key requirements for building the IoT-inspired Metaverse: immersion, variety, economy, civility, interactivity, authenticity, and independence. In addition, this survey describes the open issues in the IoT-inspired Metaverse, which need to be addressed to eventually achieve the convergence of physical and cyber worlds.

preprint2021arXiv

Memory-Efficient Network for Large-scale Video Compressive Sensing

Video snapshot compressive imaging (SCI) captures a sequence of video frames in a single shot using a 2D detector. The underlying principle is that during one exposure time, different masks are imposed on the high-speed scene to form a compressed measurement. With the knowledge of masks, optimization algorithms or deep learning methods are employed to reconstruct the desired high-speed video frames from this snapshot measurement. Unfortunately, though these methods can achieve decent results, the long running time of optimization algorithms or huge training memory occupation of deep networks still preclude them in practical applications. In this paper, we develop a memory-efficient network for large-scale video SCI based on multi-group reversible 3D convolutional neural networks. In addition to the basic model for the grayscale SCI system, we take one step further to combine demosaicing and SCI reconstruction to directly recover color video from Bayer measurements. Extensive results on both simulation and real data captured by SCI cameras demonstrate that our proposed model outperforms previous state-of-the-art with less memory and thus can be used in large-scale problems. The code is at https://github.com/BoChenGroup/RevSCI-net.

preprint2021arXiv

MetaSCI: Scalable and Adaptive Reconstruction for Video Compressive Sensing

To capture high-speed videos using a two-dimensional detector, video snapshot compressive imaging (SCI) is a promising system, where the video frames are coded by different masks and then compressed to a snapshot measurement. Following this, efficient algorithms are desired to reconstruct the high-speed frames, where the state-of-the-art results are achieved by deep learning networks. However, these networks are usually trained for specific small-scale masks and often have high demands of training time and GPU memory, which are hence {\bf \em not flexible} to $i$) a new mask with the same size and $ii$) a larger-scale mask. We address these challenges by developing a Meta Modulated Convolutional Network for SCI reconstruction, dubbed MetaSCI. MetaSCI is composed of a shared backbone for different masks, and light-weight meta-modulation parameters to evolve to different modulation parameters for each mask, thus having the properties of {\bf \em fast adaptation} to new masks (or systems) and ready to {\bf \em scale to large data}. Extensive simulation and real data results demonstrate the superior performance of our proposed approach. Our code is available at {\small\url{https://github.com/xyvirtualgroup/MetaSCI-CVPR2021}}.

preprint2021arXiv

Plug-and-Play Algorithms for Video Snapshot Compressive Imaging

We consider the reconstruction problem of video snapshot compressive imaging (SCI), which captures high-speed videos using a low-speed 2D sensor (detector). The underlying principle of SCI is to modulate sequential high-speed frames with different masks and then these encoded frames are integrated into a snapshot on the sensor and thus the sensor can be of low-speed. On one hand, video SCI enjoys the advantages of low-bandwidth, low-power and low-cost. On the other hand, applying SCI to large-scale problems (HD or UHD videos) in our daily life is still challenging and one of the bottlenecks lies in the reconstruction algorithm. Exiting algorithms are either too slow (iterative optimization algorithms) or not flexible to the encoding process (deep learning based end-to-end networks). In this paper, we develop fast and flexible algorithms for SCI based on the plug-and-play (PnP) framework. In addition to the PnP-ADMM method, we further propose the PnP-GAP (generalized alternating projection) algorithm with a lower computational workload. We first employ the image deep denoising priors to show that PnP can recover a UHD color video with 30 frames from a snapshot measurement. Since videos have strong temporal correlation, by employing the video deep denoising priors, we achieve a significant improvement in the results. Furthermore, we extend the proposed PnP algorithms to the color SCI system using mosaic sensors, where each pixel only captures the red, green or blue channels. A joint reconstruction and demosaicing paradigm is developed for flexible and high quality reconstruction of color video SCI systems. Extensive results on both simulation and real datasets verify the superiority of our proposed algorithm.

preprint2021arXiv

Snapshot Compressive Imaging: Principle, Implementation, Theory, Algorithms and Applications

Capturing high-dimensional (HD) data is a long-term challenge in signal processing and related fields. Snapshot compressive imaging (SCI) uses a two-dimensional (2D) detector to capture HD ($\ge3$D) data in a {\em snapshot} measurement. Via novel optical designs, the 2D detector samples the HD data in a {\em compressive} manner; following this, algorithms are employed to reconstruct the desired HD data-cube. SCI has been used in hyperspectral imaging, video, holography, tomography, focal depth imaging, polarization imaging, microscopy, \etc.~Though the hardware has been investigated for more than a decade, the theoretical guarantees have only recently been derived. Inspired by deep learning, various deep neural networks have also been developed to reconstruct the HD data-cube in spectral SCI and video SCI. This article reviews recent advances in SCI hardware, theory and algorithms, including both optimization-based and deep-learning-based algorithms. Diverse applications and the outlook of SCI are also discussed.

preprint2020arXiv

A Benchmark for Sparse Coding: When Group Sparsity Meets Rank Minimization

Sparse coding has achieved a great success in various image processing tasks. However, a benchmark to measure the sparsity of image patch/group is missing since sparse coding is essentially an NP-hard problem. This work attempts to fill the gap from the perspective of rank minimization. More details please see the manuscript....

preprint2020arXiv

A realistic phase screen model for forward multiple-scattering media

Existing random phase screen (RPS) models for forward multiple-scattering media fail to incorporate ballistic light. In this letter, we redesign the angular spectrum of the screen by means of Monte-Carlo simulation based on an assumption that a single screen should represent all the scattering events a photon experiences between two adjacent screens. Three examples demonstrate that the proposed model exhibits more realistic optical properties than conventional RPS models in terms of attenuation of ballistic light, evolution of beam profile and angular memory effect. The proposed model also provides the flexibility to balance the computing accuracy, speed and memory usage by tuning the screen spacing.

preprint2020arXiv

Fast Hyperspectral Image Recovery via Non-iterative Fusion of Dual-Camera Compressive Hyperspectral Imaging

Coded aperture snapshot spectral imaging (CASSI) is a promising technique to capture the three-dimensional hyperspectral image (HSI) using a single coded two-dimensional (2D) measurement, in which algorithms are used to perform the inverse problem. Due to the ill-posed nature, various regularizers have been exploited to reconstruct the 3D data from the 2D measurement. Unfortunately, the accuracy and computational complexity are unsatisfied. One feasible solution is to utilize additional information such as the RGB measurement in CASSI. Considering the combined CASSI and RGB measurement, in this paper, we propose a new fusion model for the HSI reconstruction. We investigate the spectral low-rank property of HSI composed of a spectral basis and spatial coefficients. Specifically, the RGB measurement is utilized to estimate the coefficients, meanwhile the CASSI measurement is adopted to provide the orthogonal spectral basis. We further propose a patch processing strategy to enhance the spectral low-rank property of HSI. The proposed model neither requires non-local processing or iteration, nor the spectral sensing matrix of the RGB detector. Extensive experiments on both simulated and real HSI dataset demonstrate that our proposed method outperforms previous state-of-the-art not only in quality but also speeds up the reconstruction more than 5000 times.

preprint2020arXiv

From Rank Estimation to Rank Approximation: Rank Residual Constraint for Image Restoration

In this paper, we propose a novel approach to the rank minimization problem, termed rank residual constraint (RRC) model. Different from existing low-rank based approaches, such as the well-known nuclear norm minimization (NNM) and the weighted nuclear norm minimization (WNNM), which estimate the underlying low-rank matrix directly from the corrupted observations, we progressively approximate the underlying low-rank matrix via minimizing the rank residual. Through integrating the image nonlocal self-similarity (NSS) prior with the proposed RRC model, we apply it to image restoration tasks, including image denoising and image compression artifacts reduction. Towards this end, we first obtain a good reference of the original image groups by using the image NSS prior, and then the rank residual of the image groups between this reference and the degraded image is minimized to achieve a better estimate to the desired image. In this manner, both the reference and the estimated image are updated gradually and jointly in each iteration. Based on the group-based sparse representation model, we further provide a theoretical analysis on the feasibility of the proposed RRC model. Experimental results demonstrate that the proposed RRC model outperforms many state-of-the-art schemes in both the objective and perceptual quality.

preprint2020arXiv

Image Compression Based on Compressive Sensing: End-to-End Comparison with JPEG

We present an end-to-end image compression system based on compressive sensing. The presented system integrates the conventional scheme of compressive sampling and reconstruction with quantization and entropy coding. The compression performance, in terms of decoded image quality versus data rate, is shown to be comparable with JPEG and significantly better at the low rate range. We study the parameters that influence the system performance, including (i) the choice of sensing matrix, (ii) the trade-off between quantization and compression ratio, and (iii) the reconstruction algorithms. We propose an effective method to jointly control the quantization step and compression ratio in order to achieve near optimal quality at any given bit rate. Furthermore, our proposed image compression system can be directly used in the compressive sensing camera, e.g. the single pixel camera, to construct a hardware compressive sampling system.

preprint2020arXiv

Plug-and-Play Algorithms for Large-scale Snapshot Compressive Imaging

Snapshot compressive imaging (SCI) aims to capture the high-dimensional (usually 3D) images using a 2D sensor (detector) in a single snapshot. Though enjoying the advantages of low-bandwidth, low-power and low-cost, applying SCI to large-scale problems (HD or UHD videos) in our daily life is still challenging. The bottleneck lies in the reconstruction algorithms; they are either too slow (iterative optimization algorithms) or not flexible to the encoding process (deep learning based end-to-end networks). In this paper, we develop fast and flexible algorithms for SCI based on the plug-and-play (PnP) framework. In addition to the widely used PnP-ADMM method, we further propose the PnP-GAP (generalized alternating projection) algorithm with a lower computational workload and prove the convergence of PnP-GAP under the SCI hardware constraints. By employing deep denoising priors, we first time show that PnP can recover a UHD color video ($3840\times 1644\times 48$ with PNSR above 30dB) from a snapshot 2D measurement. Extensive results on both simulation and real datasets verify the superiority of our proposed algorithm. The code is available at https://github.com/liuyang12/PnP-SCI.

preprint2020arXiv

Snapshot Interferometric 3D Imaging by Compressive Sensing and Deep Learning

We demonstrate single-shot compressive three-dimensional (3D) $(x, y, z)$ imaging based on interference coding. The depth dimension of the object is encoded into the interferometric spectra of the light field, resulting a $(x, y, λ)$ datacube which is subsequently measured by a single-shot spectrometer. By implementing a compression ratio up to $400$, we are able to reconstruct $1G$ voxels from a 2D measurement. Both an optimization based compressive sensing algorithm and a deep learning network are developed for 3D reconstruction from a single 2D coded measurement. Due to the fast acquisition speed, our approach is able to capture volumetric activities at native camera frame rates, enabling 4D (volumetric-temporal) visualization of dynamic scenes.

preprint2020arXiv

Spatial--spectral FFPNet: Attention-Based Pyramid Network for Segmentation and Classification of Remote Sensing Images

We consider the problem of segmentation and classification of high-resolution and hyperspectral remote sensing images. Unlike conventional natural (RGB) images, the inherent large scale and complex structures of remote sensing images pose major challenges such as spatial object distribution diversity and spectral information extraction when existing models are directly applied for image classification. In this study, we develop an attention-based pyramid network for segmentation and classification of remote sensing datasets. Attention mechanisms are used to develop the following modules: i) a novel and robust attention-based multi-scale fusion method effectively fuses useful spatial or spectral information at different and same scales; ii) a region pyramid attention mechanism using region-based attention addresses the target geometric size diversity in large-scale remote sensing images; and iii cross-scale attention} in our adaptive atrous spatial pyramid pooling network adapts to varied contents in a feature-embedded space. Different forms of feature fusion pyramid frameworks are established by combining these attention-based modules. First, a novel segmentation framework, called the heavy-weight spatial feature fusion pyramid network (FFPNet), is proposed to address the spatial problem of high-resolution remote sensing images. Second, an end-to-end spatial--spectral FFPNet is presented for classifying hyperspectral images. Experiments conducted on ISPRS Vaihingen and ISPRS Potsdam high-resolution datasets demonstrate the competitive segmentation accuracy achieved by the proposed heavy-weight spatial FFPNet. Furthermore, experiments on the Indian Pines and the University of Pavia hyperspectral datasets indicate that the proposed spatial--spectral FFPNet outperforms the current state-of-the-art methods in hyperspectral image classification.

preprint2020arXiv

The Power of Triply Complementary Priors for Image Compressive Sensing

Recent works that utilized deep models have achieved superior results in various image restoration applications. Such approach is typically supervised which requires a corpus of training images with distribution similar to the images to be recovered. On the other hand, the shallow methods which are usually unsupervised remain promising performance in many inverse problems, \eg, image compressive sensing (CS), as they can effectively leverage non-local self-similarity priors of natural images. However, most of such methods are patch-based leading to the restored images with various ringing artifacts due to naive patch aggregation. Using either approach alone usually limits performance and generalizability in image restoration tasks. In this paper, we propose a joint low-rank and deep (LRD) image model, which contains a pair of triply complementary priors, namely \textit{external} and \textit{internal}, \textit{deep} and \textit{shallow}, and \textit{local} and \textit{non-local} priors. We then propose a novel hybrid plug-and-play (H-PnP) framework based on the LRD model for image CS. To make the optimization tractable, a simple yet effective algorithm is proposed to solve the proposed H-PnP based image CS problem. Extensive experimental results demonstrate that the proposed H-PnP algorithm significantly outperforms the state-of-the-art techniques for image CS recovery such as SCSNet and WNNM.

preprint2020arXiv

Various Total Variation for Snapshot Video Compressive Imaging

Sampling high-dimensional images is challenging due to limited availability of sensors; scanning is usually necessary in these cases. To mitigate this challenge, snapshot compressive imaging (SCI) was proposed to capture the high-dimensional (usually 3D) images using a 2D sensor (detector). Via novel optical design, the {\em measurement} captured by the sensor is an encoded image of multiple frames of the 3D desired signal. Following this, reconstruction algorithms are employed to retrieve the high-dimensional data. Though various algorithms have been proposed, the total variation (TV) based method is still the most efficient one due to a good trade-off between computational time and performance. This paper aims to answer the question of which TV penalty (anisotropic TV, isotropic TV and vectorized TV) works best for video SCI reconstruction? Various TV denoising and projection algorithms are developed and tested for video SCI reconstruction on both simulation and real datasets.

preprint2020arXiv

Waveform Optimization for MIMO Joint Communication and Radio Sensing Systems with Training Overhead

In this paper, we study optimal waveform design to maximize mutual information (MI) for a joint communication and (radio) sensing (JCAS, a.k.a., radar-communication) multi-input multi-output (MIMO) downlink system. We consider a typical packet-based signal structure which includes training and data symbols. We first derive the conditional MI for both sensing and communication under correlated channels by considering the training overhead and channel estimation error (CEE). Then, we derive a lower bound for the channel estimation error and optimize the power allocation between the training and data symbols to minimize the CEE. Based on the optimal power allocation, we provide optimal waveform design methods for three scenarios, including maximizing MI for communication only and for sensing only, and maximizing a weighted sum MI for both communication and sensing. We also present extensive simulation results that provide insights on waveform design and validate the effectiveness of the proposed designs.

preprint2016arXiv

$k$-core percolation on complex networks: Comparing random, localized and targeted attacks

The type of malicious attack inflicting on networks greatly influences their stability under ordinary percolation in which a node fails when it becomes disconnected from the giant component. Here we study its generalization, $k$-core percolation, in which a node fails when it loses connection to a threshold $k$ number of neighbors. We study and compare analytically and by numerical simulations of $k$-core percolation the stability of networks under random attacks (RA), localized attacks (LA) and targeted attacks (TA), respectively. By mapping a network under LA or TA into an equivalent network under RA, we find that in both single and interdependent networks, TA exerts the greatest damage to the core structure of a network. We also find that for Erdős-Rényi (ER) networks, LA and RA exert equal damage to the core structure whereas for scale-free (SF) networks, LA exerts much more damage than RA does to the core structure.

preprint2016arXiv

Classification and Reconstruction of High-Dimensional Signals from Low-Dimensional Features in the Presence of Side Information

This paper offers a characterization of fundamental limits on the classification and reconstruction of high-dimensional signals from low-dimensional features, in the presence of side information. We consider a scenario where a decoder has access both to linear features of the signal of interest and to linear features of the side information signal; while the side information may be in a compressed form, the objective is recovery or classification of the primary signal, not the side information. The signal of interest and the side information are each assumed to have (distinct) latent discrete labels; conditioned on these two labels, the signal of interest and side information are drawn from a multivariate Gaussian distribution. With joint probabilities on the latent labels, the overall signal-(side information) representation is defined by a Gaussian mixture model. We then provide sharp sufficient and/or necessary conditions for these quantities to approach zero when the covariance matrices of the Gaussians are nearly low-rank. These conditions, which are reminiscent of the well-known Slepian-Wolf and Wyner-Ziv conditions, are a function of the number of linear features extracted from the signal of interest, the number of linear features extracted from the side information signal, and the geometry of these signals and their interplay. Moreover, on assuming that the signal of interest and the side information obey such an approximately low-rank model, we derive expansions of the reconstruction error as a function of the deviation from an exactly low-rank model; such expansions also allow identification of operational regimes where the impact of side information on signal reconstruction is most relevant. Our framework, which offers a principled mechanism to integrate side information in high-dimensional data problems, is also tested in the context of imaging applications.

preprint2016arXiv

Percolation of networks with directed dependency links

The self-consistent probabilistic approach has proven itself powerful in studying the percolation behavior of interdependent or multiplex networks without tracking the percolation process through each cascading step. In order to understand how directed dependency links impact criticality, we employ this approach to study the percolation properties of networks with both undirected connectivity links and directed dependency links. We find that when a random network with a given degree distribution undergoes a second-order phase transition, the critical point and the unstable regime surrounding the second-order phase transition regime are determined by the proportion of nodes that do not depend on any other nodes. Moreover, we also find that the triple point and the boundary between first- and second-order transitions are determined by the proportion of nodes that depend on no more than one node. This implies that it is maybe general for multiplex network systems, some important properties of phase transitions can be determined only by a few parameters. We illustrate our findings using Erdos-Renyi (ER) networks.

preprint2016arXiv

Variational Autoencoder for Deep Learning of Images, Labels and Captions

A novel variational autoencoder is developed to model images, as well as associated labels or captions. The Deep Generative Deconvolutional Network (DGDN) is used as a decoder of the latent image features, and a deep Convolutional Neural Network (CNN) is used as an image encoder; the CNN is used to approximate a distribution for the latent DGDN features/code. The latent code is also linked to generative models for labels (Bayesian support vector machine) or captions (recurrent neural network). When predicting a label/caption for a new image at test, averaging is performed across the distribution of latent codes; this is computationally efficient as a consequence of the learned CNN-based encoder. Since the framework is capable of modeling the image in the presence/absence of associated labels/captions, a new semi-supervised setting is manifested for CNN learning with images; the framework even allows unsupervised CNN learning, based on images alone.

preprint2015arXiv

A Generative Model for Deep Convolutional Learning

A generative model is developed for deep (multi-layered) convolutional dictionary learning. A novel probabilistic pooling operation is integrated into the deep model, yielding efficient bottom-up (pretraining) and top-down (refinement) probabilistic learning. Experimental results demonstrate powerful capabilities of the model to learn multi-layer features from images, and excellent classification results are obtained on the MNIST and Caltech 101 datasets.

preprint2015arXiv

Compressive Hyperspectral Imaging with Side Information

A blind compressive sensing algorithm is proposed to reconstruct hyperspectral images from spectrally-compressed measurements.The wavelength-dependent data are coded and then superposed, mapping the three-dimensional hyperspectral datacube to a two-dimensional image. The inversion algorithm learns a dictionary {\em in situ} from the measurements via global-local shrinkage priors. By using RGB images as side information of the compressive sensing system, the proposed approach is extended to learn a coupled dictionary from the joint dataset of the compressed measurements and the corresponding RGB images, to improve reconstruction quality. A prototype camera is built using a liquid-crystal-on-silicon modulator. Experimental reconstructions of hyperspectral datacubes from both simulated and real compressed measurements demonstrate the efficacy of the proposed inversion algorithm, the feasibility of the camera and the benefit of side information.

preprint2015arXiv

Compressive Sensing via Low-Rank Gaussian Mixture Models

We develop a new compressive sensing (CS) inversion algorithm by utilizing the Gaussian mixture model (GMM). While the compressive sensing is performed globally on the entire image as implemented in our lensless camera, a low-rank GMM is imposed on the local image patches. This low-rank GMM is derived via eigenvalue thresholding of the GMM trained on the projection of the measurement data, thus learned {\em in situ}. The GMM and the projection of the measurement data are updated iteratively during the reconstruction. Our GMM algorithm degrades to the piecewise linear estimator (PLE) if each patch is represented by a single Gaussian model. Inspired by this, a low-rank PLE algorithm is also developed for CS inversion, constituting an additional contribution of this paper. Extensive results on both simulation data and real data captured by the lensless camera demonstrate the efficacy of the proposed algorithm. Furthermore, we compare the CS reconstruction results using our algorithm with the JPEG compression. Simulation results demonstrate that when limited bandwidth is available (a small number of measurements), our algorithm can achieve comparable results as JPEG.

preprint2015arXiv

Convergence of the Generalized Alternating Projection Algorithm for Compressive Sensing

The convergence of the generalized alternating projection (GAP) algorithm is studied in this paper to solve the compressive sensing problem $\yv = \Amat \xv + \epsilonv$. By assuming that $\Amat\Amat\ts$ is invertible, we prove that GAP converges linearly within a certain range of step-size when the sensing matrix $\Amat$ satisfies restricted isometry property (RIP) condition of $δ_{2K}$, where $K$ is the sparsity of $\xv$. The theoretical analysis is extended to the adaptively iterative thresholding (AIT) algorithms, for which the convergence rate is also derived based on $δ_{2K}$ of the sensing matrix. We further prove that, under the same conditions, the convergence rate of GAP is faster than that of AIT. Extensive simulation results confirm the theoretical assertions.

preprint2015arXiv

Generalized Alternating Projection Based Total Variation Minimization for Compressive Sensing

We consider the total variation (TV) minimization problem used for compressive sensing and solve it using the generalized alternating projection (GAP) algorithm. Extensive results demonstrate the high performance of proposed algorithm on compressive sensing, including two dimensional images, hyperspectral images and videos. We further derive the Alternating Direction Method of Multipliers (ADMM) framework with TV minimization for video and hyperspectral image compressive sensing under the CACTI and CASSI framework, respectively. Connections between GAP and ADMM are also provided.

preprint2015arXiv

Generative Deep Deconvolutional Learning

A generative Bayesian model is developed for deep (multi-layer) convolutional dictionary learning. A novel probabilistic pooling operation is integrated into the deep model, yielding efficient bottom-up and top-down probabilistic learning. After learning the deep convolutional dictionary, testing is implemented via deconvolutional inference. To speed up this inference, a new statistical approach is proposed to project the top-layer dictionary elements to the data level. Following this, only one layer of deconvolution is required during testing. Experimental results demonstrate powerful capabilities of the model to learn multi-layer features from images. Excellent classification results are obtained on both the MNIST and Caltech 101 datasets.

preprint2015arXiv

Lensless Compressive Imaging

We develop a lensless compressive imaging architecture, which consists of an aperture assembly and a single sensor, without using any lens. An anytime algorithm is proposed to reconstruct images from the compressive measurements; the algorithm produces a sequence of solutions that monotonically converge to the true signal (thus, anytime). The algorithm is developed based on the sparsity of local overlapping patches (in the transformation domain) and state-of-the-art results have been obtained. Experiments on real data demonstrate that encouraging results are obtained by measuring about 10% (of the image pixels) compressive measurements. The reconstruction results of the proposed algorithm are compared with the JPEG compression (based on file sizes) and the reconstructed image quality is close to the JPEG compression, in particular at a high compression rate.

preprint2015arXiv

Non-Gaussian Discriminative Factor Models via the Max-Margin Rank-Likelihood

We consider the problem of discriminative factor analysis for data that are in general non-Gaussian. A Bayesian model based on the ranks of the data is proposed. We first introduce a new {\em max-margin} version of the rank-likelihood. A discriminative factor model is then developed, integrating the max-margin rank-likelihood and (linear) Bayesian support vector machines, which are also built on the max-margin principle. The discriminative factor model is further extended to the {\em nonlinear} case through mixtures of local linear classifiers, via Dirichlet processes. Fully local conjugacy of the model yields efficient inference with both Markov Chain Monte Carlo and variational Bayes approaches. Extensive experiments on benchmark and real data demonstrate superior performance of the proposed model and its potential for applications in computational biology.

preprint2015arXiv

The influence of the broadness of the degree distribution on network's robustness: comparing localized attack and random attack

The stability of networks is greatly influenced by their degree distributions and in particular by their broadness. Networks with broader degree distributions are usually more robust to random failures but less robust to localized attacks. To better understand the effect of the broadness of the degree distribution we study here two models where the broadness is controlled and compare their robustness against localized attacks (LA) and random attacks (RA). We study analytically and by numerical simulations the cases where the degrees in the networks follow a Bi-Poisson distribution $P(k)=αe^{-λ_1}\frac{λ_1^k}{k!}+(1-α) e^{-λ_2}\frac{λ_2^k}{k!},α\in[0,1]$, and a Gaussian distribution $P(k)=A \cdot exp{(-\frac{(k-μ)^2}{2σ^2})}$ with a normalization constant $A$ where $k\geq 0$. In the Bi-Poisson distribution the broadness is controlled by the values of $α$, $λ_1$ and $λ_2$, while in the Gaussian distribution it is controlled by the standard deviation, $σ$. We find that only for $α=0$ or $α=1$, namely degrees obeying a pure Poisson distribution, LA and RA are the same but for all other cases networks are more vulnerable under LA compared to RA. For Gaussian distribution, with an average degree $μ$ fixed, we find that when $σ^2$ is smaller than $μ$ the network is more vulnerable against random attack. However, when $σ^2$ is larger than $μ$ the network becomes more vulnerable against localized attack. Similar qualitative results are also shown for interdependent networks.

preprint2014arXiv

Coherent Sources Direction Finding and Polarization Estimation with Various Compositions of Spatially Spread Polarized Antenna Arrays

Various compositions of sparsely polarized antenna arrays are proposed in this paper to estimate the direction-of-arrivals (DOAs) and polarizations of multiple coherent sources. These polarized antenna arrays are composed of one of the following five sparsely-spread sub-array geometries: 1) four spatially-spread dipoles with three orthogonal orientations, 2) four spatially-spread loops with three orthogonal orientations, 3) three spatially-spread dipoles and three spatially-spread loops with orthogonal orientations, 4) three collocated dipole-loop pairs with orthogonal orientations, and 5) a collocated dipole-triad and a collocated loop-triad. All the dipoles/loops/pairs/triads in each sub-array can also be sparsely spaced with the inter-antenna spacing far larger than a half-wavelength. Only one dimensional spatial-smoothing is used in the proposed algorithm to derive the two-dimensional DOAs and polarizations of multiple cross-correlated signals. From the simulation results, the sparse array composed of dipole-triads and loop-triads is recommended to construct a large aperture array, while the sparse arrays composed of only dipoles or only loops are recommended to efficiently reduce the mutual coupling across the antennas. Practical applications include distributed arrays and passive radar systems.

preprint2014arXiv

Low-Cost Compressive Sensing for Color Video and Depth

A simple and inexpensive (low-power and low-bandwidth) modification is made to a conventional off-the-shelf color video camera, from which we recover {multiple} color frames for each of the original measured frames, and each of the recovered frames can be focused at a different depth. The recovery of multiple frames for each measured frame is made possible via high-speed coding, manifested via translation of a single coded aperture; the inexpensive translation is constituted by mounting the binary code on a piezoelectric device. To simultaneously recover depth information, a {liquid} lens is modulated at high speed, via a variable voltage. Consequently, during the aforementioned coding process, the liquid lens allows the camera to sweep the focus through multiple depths. In addition to designing and implementing the camera, fast recovery is achieved by an anytime algorithm exploiting the group-sparsity of wavelet/DCT coefficients.

preprint2014arXiv

Multiscale Shrinkage and Lévy Processes

A new shrinkage-based construction is developed for a compressible vector $\boldsymbol{x}\in\mathbb{R}^n$, for cases in which the components of $\xv$ are naturally associated with a tree structure. Important examples are when $\xv$ corresponds to the coefficients of a wavelet or block-DCT representation of data. The method we consider in detail, and for which numerical results are presented, is based on increments of a gamma process. However, we demonstrate that the general framework is appropriate for many other types of shrinkage priors, all within the Lévy process family, with the gamma process a special case. Bayesian inference is carried out by approximating the posterior with samples from an MCMC algorithm, as well as by constructing a heuristic variational approximation to the posterior. We also consider expectation-maximization (EM) for a MAP (point) solution. State-of-the-art results are manifested for compressive sensing and denoising applications, the latter with spiky (non-Gaussian) noise.

preprint2014arXiv

Tree-Structure Bayesian Compressive Sensing for Video

A Bayesian compressive sensing framework is developed for video reconstruction based on the color coded aperture compressive temporal imaging (CACTI) system. By exploiting the three dimension (3D) tree structure of the wavelet and Discrete Cosine Transformation (DCT) coefficients, a Bayesian compressive sensing inversion algorithm is derived to reconstruct (up to 22) color video frames from a single monochromatic compressive measurement. Both simulated and real datasets are adopted to verify the performance of the proposed algorithm.

preprint2013arXiv

Adaptive Temporal Compressive Sensing for Video

This paper introduces the concept of adaptive temporal compressive sensing (CS) for video. We propose a CS algorithm to adapt the compression ratio based on the scene's temporal complexity, computed from the compressed data, without compromising the quality of the reconstructed video. The temporal adaptivity is manifested by manipulating the integration time of the camera, opening the possibility to real-time implementation. The proposed algorithm is a generalized temporal CS approach that can be incorporated with a diverse set of existing hardware systems.

preprint2013arXiv

Coded aperture compressive temporal imaging

We use mechanical translation of a coded aperture for code division multiple access compression of video. We present experimental results for reconstruction at 148 frames per coded snapshot.

preprint2013arXiv

Joint DOA and Polarization Estimation with Sparsely Distributed and Spatially Non-Collocating Dipole/Loop Triads

This paper introduces an ESPRIT-based algorithm to estimate the directions-of-arrival and polarizations for multiple sources. The investigated algorithm is based on new sparse array geometries, which are composed of three non-collocating dipole triads or three non-collocating loop triads. Both the inter-triad spacings and the inter-sensor spacings in the same triad can be far larger than a half-wavelength of the incident sources. By adopting the ESPRIT algorithm, the eigenvalues of the data-correlation matrix offer the fine but ambiguous estimates of the direction-cosines for each source, and the eigenvectors provide the estimates of each source's steering vector. Based on the constrained array geometries, the fine and unambiguous estimates of directions-of-arrival and polarizations are obtained. Simulation results verify the efficacy of the investigated approach and also verify the aperture extension property of the proposed array geometries.

preprint2013arXiv

Polynomial-Phase Signal Direction-Finding & Source-Tracking with an Acoustic Vector Sensor

A new ESPRIT-based algorithm is proposed to estimate the direction-of-arrival of an arbitrary degree polynomial-phase signal with a single acoustic vector sensor. The proposed approach requires neither a priori knowledge of the polynomial-phase signal's coefficients nor a priori knowledge of the polynomial-phase signal's frequency-spectrum. A pre-processing technique is also proposed to incorporate the single-forgetting-factor algorithm and multiple-forgetting-factor adaptive tracking algorithm to track a polynomial-phase signal using one acoustic vector sensor. Simulation results verify the efficacy of the proposed direction finding and source tracking algorithms.

preprint2012arXiv

Phonon-induced dephasing of chromium colour centres in diamond

We report on the coherence properties of single photons from chromium-based colour centres in diamond. We use field-correlation and spectral lineshape measurements to reveal the interplay between slow spectral wandering and fast dephasing mechanisms as a function of temperature. We show that the zero-phonon transition frequency and its linewidth follow a power-law dependence on temperature indicating that the dominant fast dephasing mechanisms for these centres are direct electron-phonon coupling and phonon-modulated Coulomb coupling to nearby impurities. Further, the observed reduction in the quantum yield for photon emission as a function of temperature is consistent with the opening of additional nonradiative channels through thermal activation to higher energy states predominantly and indicates a near-unity quantum efficiency at 4 K.

Xin Yuan

What is connected

Connect this record

See the researcher in context

Building this map preview

55 published item(s)

Deep Probabilistic Unfolding for Quantized Compressive Sensing

AdaVocoder: Adaptive Vocoder for Custom Voice

Large-scale Global Low-rank Optimization for Computational Compressed Imaging

Learning-based Intelligent Surface Configuration, User Selection, Channel Allocation, and Modulation Adaptation for Jamming-resisting Multiuser OFDMA Systems

Low-Resource Mongolian Speech Synthesis Based on Automatic Prosody Annotation

A Simple and Efficient Reconstruction Backbone for Snapshot Compressive Imaging

Adaptive Deep PnP Algorithm for Video Snapshot Compressive Imaging

Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction

CoNet: Borderless and decentralized server cooperation in edge computing

Dispersed Pixel Perturbation-based Imperceptible Backdoor Trigger for Image Classifier Models

Ensemble learning priors unfolding for scalable Snapshot Compressive Sensing

HDNet: High-resolution Dual-domain Learning for Spectral Compressive Imaging

Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

Spatial-Temporal Transformer for Video Snapshot Compressive Imaging

Text-to-Image Generation via Implicit Visual Guidance and Hypernetwork

Trajectory Planning of Cellular-Connected UAV for Communication-assisted Radar Sensing

Two-Stage is Enough: A Concise Deep Unfolding Reconstruction Network for Flexible Video Compressive Sensing

When Internet of Things meets Metaverse: Convergence of Physical and Cyber Worlds

Memory-Efficient Network for Large-scale Video Compressive Sensing

MetaSCI: Scalable and Adaptive Reconstruction for Video Compressive Sensing

Plug-and-Play Algorithms for Video Snapshot Compressive Imaging

Snapshot Compressive Imaging: Principle, Implementation, Theory, Algorithms and Applications

A Benchmark for Sparse Coding: When Group Sparsity Meets Rank Minimization

A realistic phase screen model for forward multiple-scattering media

Fast Hyperspectral Image Recovery via Non-iterative Fusion of Dual-Camera Compressive Hyperspectral Imaging

From Rank Estimation to Rank Approximation: Rank Residual Constraint for Image Restoration

Image Compression Based on Compressive Sensing: End-to-End Comparison with JPEG

Plug-and-Play Algorithms for Large-scale Snapshot Compressive Imaging

Snapshot Interferometric 3D Imaging by Compressive Sensing and Deep Learning

Spatial--spectral FFPNet: Attention-Based Pyramid Network for Segmentation and Classification of Remote Sensing Images

The Power of Triply Complementary Priors for Image Compressive Sensing

Various Total Variation for Snapshot Video Compressive Imaging

Waveform Optimization for MIMO Joint Communication and Radio Sensing Systems with Training Overhead

$k$-core percolation on complex networks: Comparing random, localized and targeted attacks

Classification and Reconstruction of High-Dimensional Signals from Low-Dimensional Features in the Presence of Side Information

Percolation of networks with directed dependency links

Variational Autoencoder for Deep Learning of Images, Labels and Captions

A Generative Model for Deep Convolutional Learning

Compressive Hyperspectral Imaging with Side Information

Compressive Sensing via Low-Rank Gaussian Mixture Models

Convergence of the Generalized Alternating Projection Algorithm for Compressive Sensing

Generalized Alternating Projection Based Total Variation Minimization for Compressive Sensing

Generative Deep Deconvolutional Learning

Lensless Compressive Imaging

Non-Gaussian Discriminative Factor Models via the Max-Margin Rank-Likelihood

The influence of the broadness of the degree distribution on network's robustness: comparing localized attack and random attack

Coherent Sources Direction Finding and Polarization Estimation with Various Compositions of Spatially Spread Polarized Antenna Arrays

Low-Cost Compressive Sensing for Color Video and Depth

Multiscale Shrinkage and Lévy Processes

Tree-Structure Bayesian Compressive Sensing for Video

Adaptive Temporal Compressive Sensing for Video

Coded aperture compressive temporal imaging

Joint DOA and Polarization Estimation with Sparsely Distributed and Spatially Non-Collocating Dipole/Loop Triads

Polynomial-Phase Signal Direction-Finding & Source-Tracking with an Acoustic Vector Sensor

Phonon-induced dephasing of chromium colour centres in diamond