Source author record

Xiao-Ping Zhang

Xiao-Ping Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.SP Artificial Intelligence Machine Learning eess.IV Applications astro-ph.EP Distributed, Parallel, and Cluster Computing eess.AS Information Theory math.IT math.OC math.ST Methodology physics.med-ph physics.optics Sound Statistics Theory

Catalog footprint

What is connected

27works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Parameter Convergence Radar Detector Based on VAMP Deep Unfolding

Compared with the sparse recovery process in traditional compressed sensing (CS) radar detector CAMP, vector AMP deep unfolding (VAMP-DU) can achieve sparse recovery over a broader range of observation matrices, with faster convergence speed and higher recovery accuracy. However, the distribution of the error term in VAMP-DU remains unknown, which renders the distribution of the test statistic in CS radar detection undetermined and thus hinders threshold setting under a given false alarm rate when VAMP-DU is applied to CS radar detection. In this work, we theoretically prove that the error term in VAMP-DU follows a Gaussian distribution by leveraging a general state evolution (SE). Based on the Gaussianity, we propose a new parameter convergence radar detector (PCRD) as the CS detector to calculate the distribution parameter of the test statistic and realize target detection under a given false alarm rate. Specifically, PCRD exploits the Gaussian property of error term in VAMP-DU to exhibit superior false alarm control capability, while leveraging the improved recovery accuracy of VAMP-DU to further enhance target detection performance. Numerical simulations validate the Gaussianity of the error term in VAMP-DU and show the superiority of the VAMP-DU-based PCRD over existing approaches in both false alarm control accuracy and target detection performance.

preprint2026arXiv

Step-by-Step Causality: Transparent Causal Discovery with Multi-Agent Tree-Query and Adversarial Confidence Estimation

Causal discovery aims to recover ``what causes what'', but classical constraint-based methods (e.g., PC, FCI) suffer from error propagation, and recent LLM-based causal oracles often behave as opaque, confidence-free black boxes. This paper introduces Tree-Query, a tree-structured, multi-expert LLM framework that reduces pairwise causal discovery to a short sequence of queries about backdoor paths, (in)dependence, latent confounding, and causal direction, yielding interpretable judgments with robustness-aware confidence scores. Theoretical guarantees are provided for asymptotic identifiability of four pairwise relations. On data-free benchmarks derived from Mooij et al. and UCI causal graphs, Tree-Query improves structural metrics over direct LLM baselines, and a diet--weight case study illustrates confounder screening and stable, high-confidence causal conclusions. Tree-Query thus offers a principled way to obtain data-free causal priors from LLMs that can complement downstream data-driven causal discovery. Code is available at https://anonymous.4open.science/r/Repo-9B3E-4F96.

preprint2026arXiv

Verifiable Process Rewards for Agentic Reasoning

Reinforcement learning from verifiable rewards (RLVR) has improved the reasoning abilities of large language models (LLMs), but most existing approaches rely on sparse outcome-level feedback. This sparsity creates a credit assignment challenge in long-horizon agentic reasoning: a trajectory may fail despite containing many correct intermediate decisions, or succeed despite containing flawed ones. In this work, we study a class of densely-verifiable agentic reasoning problems, where intermediate actions can be objectively checked by symbolic or algorithmic oracles. We propose Verifiable Process Rewards (VPR), a framework that converts such oracles into dense turn-level supervision for reinforcement learning, and instantiate it in three representative settings: search-based verification for dynamic deduction, constraint-based verification for logical reasoning, and posterior-based verification for probabilistic inference. We further provide a theoretical analysis showing that dense verifier-grounded rewards can improve long-horizon credit assignment by providing more localized learning signals, with the benefit depending on the reliability of the verifier. Empirically, VPR outperforms outcome-level reward and rollout-based process reward baselines across controlled environments, and more importantly, transfers to both general and agentic reasoning benchmarks, suggesting that verifiable process supervision can foster general reasoning skills applicable beyond the training environments. Our results indicate that VPR is a promising approach for enhancing LLM agents whenever reliable intermediate verification is available, while also highlighting its dependence on oracle quality and the open challenge of extending VPR to less structured, open-ended environments.

preprint2024arXiv

Language-free Compositional Action Generation via Decoupling Refinement

Composing simple elements into complex concepts is crucial yet challenging, especially for 3D action generation. Existing methods largely rely on extensive neural language annotations to discern composable latent semantics, a process that is often costly and labor-intensive. In this study, we introduce a novel framework to generate compositional actions without reliance on language auxiliaries. Our approach consists of three main components: Action Coupling, Conditional Action Generation, and Decoupling Refinement. Action Coupling utilizes an energy model to extract the attention masks of each sub-action, subsequently integrating two actions using these attentions to generate pseudo-training examples. Then, we employ a conditional generative model, CVAE, to learn a latent space, facilitating the diverse generation. Finally, we propose Decoupling Refinement, which leverages a self-supervised pre-trained model MAE to ensure semantic consistency between the sub-actions and compositional actions. This refinement process involves rendering generated 3D actions into 2D space, decoupling these images into two sub-segments, using the MAE model to restore the complete image from sub-segments, and constraining the recovered images to match images rendered from raw sub-actions. Due to the lack of existing datasets containing both sub-actions and compositional actions, we created two new datasets, named HumanAct-C and UESTC-C, and present a corresponding evaluation metric. Both qualitative and quantitative assessments are conducted to show our efficacy.

preprint2023arXiv

Finding the Most Transferable Tasks for Brain Image Segmentation

Although many studies have successfully applied transfer learning to medical image segmentation, very few of them have investigated the selection strategy when multiple source tasks are available for transfer. In this paper, we propose a prior knowledge guided and transferability based framework to select the best source tasks among a collection of brain image segmentation tasks, to improve the transfer learning performance on the given target task. The framework consists of modality analysis, RoI (region of interest) analysis, and transferability estimation, such that the source task selection can be refined step by step. Specifically, we adapt the state-of-the-art analytical transferability estimation metrics to medical image segmentation tasks and further show that their performance can be significantly boosted by filtering candidate source tasks based on modality and RoI characteristics. Our experiments on brain matter, brain tumor, and white matter hyperintensities segmentation datasets reveal that transferring from different tasks under the same modality is often more successful than transferring from the same task under different modalities. Furthermore, within the same modality, transferring from the source task that has stronger RoI shape similarity with the target task can significantly improve the final transfer performance. And such similarity can be captured using the Structural Similarity index in the label space.

preprint2022arXiv

Boosting Video Representation Learning with Multi-Faceted Integration

Video content is multifaceted, consisting of objects, scenes, interactions or actions. The existing datasets mostly label only one of the facets for model training, resulting in the video representation that biases to only one facet depending on the training dataset. There is no study yet on how to learn a video representation from multifaceted labels, and whether multifaceted information is helpful for video representation learning. In this paper, we propose a new learning framework, MUlti-Faceted Integration (MUFI), to aggregate facets from different datasets for learning a representation that could reflect the full spectrum of video content. Technically, MUFI formulates the problem as visual-semantic embedding learning, which explicitly maps video representation into a rich semantic embedding space, and jointly optimizes video representation from two perspectives. One is to capitalize on the intra-facet supervision between each video and its own label descriptions, and the second predicts the "semantic representation" of each video from the facets of other datasets as the inter-facet supervision. Extensive experiments demonstrate that learning 3D CNN via our MUFI framework on a union of four large-scale video datasets plus two image datasets leads to superior capability of video representation. The pre-learnt 3D CNN with MUFI also shows clear improvements over other approaches on several downstream video applications. More remarkably, MUFI achieves 98.1%/80.9% on UCF101/HMDB51 for action recognition and 101.5% in terms of CIDEr-D score on MSVD for video captioning.

preprint2022arXiv

Distributed Optimal Power Flow for VSC-MTDC Meshed AC/DC Grids Using ALADIN

The increasing application of voltage source converter (VSC) high voltage direct current (VSC-HVDC) technology in power grids has raised the importance of incorporating DC grids and converters into the existing transmission network. This poses significant challenges in dealing with the resulting optimal power flow (OPF) problem. In this paper, a recently proposed nonconvex distributed optimization algorithm -- Augmented Lagrangian based Alternating Direction Inexact Newton method (ALADIN), is tailored to solve the nonconvex AC/DC OPF problem for emerging voltage source converter (VSC) based multiterminal high voltage direct current (VSC-MTDC) meshed AC/DC hybrid systems. The proposed scheme decomposes this AC/DC hybrid OPF problem and handles it in a fully distributed way. Compared to the existing state-of-art Alternating Direction Method of Multipliers(ADMM), which is in general, not applicable for nonconvex problems, ALADIN has a theoretical convergence guarantee. Applying these two approaches to (VSC-MTDC) coupled with an IEEE benchmark AC power system illustrates that the tailored ALADIN outperforms ADMM in convergence speed and numerical robustness.

preprint2022arXiv

Dual Vision Transformer

Prior works have proposed several strategies to reduce the computational cost of self-attention mechanism. Many of these works consider decomposing the self-attention procedure into regional and local feature extraction procedures that each incurs a much smaller computational complexity. However, regional information is typically only achieved at the expense of undesirable information lost owing to down-sampling. In this paper, we propose a novel Transformer architecture that aims to mitigate the cost issue, named Dual Vision Transformer (Dual-ViT). The new architecture incorporates a critical semantic pathway that can more efficiently compress token vectors into global semantics with reduced order of complexity. Such compressed global semantics then serve as useful prior information in learning finer pixel level details, through another constructed pixel pathway. The semantic pathway and pixel pathway are then integrated together and are jointly trained, spreading the enhanced self-attention information in parallel through both of the pathways. Dual-ViT is henceforth able to reduce the computational complexity without compromising much accuracy. We empirically demonstrate that Dual-ViT provides superior accuracy than SOTA Transformer architectures with reduced training complexity. Source code is available at \url{https://github.com/YehLi/ImageNetModel}.

preprint2022arXiv

New Closed-form Joint Localization and Synchronization using Sequential One-way TOAs

It is an essential technique for the moving user nodes (UNs) with clock offset and clock skew to resolve the joint localization and synchronization (JLAS) problem. Existing iterative maximum likelihood methods using sequential one-way time-of-arrival (TOA) measurements from the anchor nodes' (AN) broadcast signals require a good initial guess and have a computational complexity that grows with the number of iterations, given the size of the problem. In this paper, we propose a new closed-form JLAS approach, namely CFJLAS, which achieves the asymptotically optimal solution in one shot without initialization when the noise is small, and has a low computational complexity. After squaring and differencing the sequential TOA measurement equations, we devise two intermediate variables to reparameterize the non-linear problem. In this way, we convert the problem to a simpler one of solving two simultaneous quadratic equations. We then solve the equations analytically to obtain a raw closed-form JLAS estimation. Finally, we apply a weighted least squares (WLS) step to optimize the estimation. We derive the Cramer-Rao lower bound (CRLB), analyze the estimation error, and show that the estimation accuracy of the CFJLAS reaches the CRLB under the small noise condition. The complexity of the new CFJLAS is only determined by the size of the problem, unlike the conventional iterative method, whose complexity is additionally multiplied by the number of iterations. Simulations in a 2D scene verify that the estimation accuracies of the new CFJLAS method in position, velocity, clock offset, and clock skew all reach the CRLB under the small noise condition. Compared with the conventional iterative method, the proposed new CFJLAS method does not require initialization, obtains the optimal solution under the small noise condition, and has a low computational complexity.

preprint2022arXiv

Parameterized Image Quality Score Distribution Prediction

Recently, image quality has been generally describedby a mean opinion score (MOS). However, we observe that thequality scores of an image given by a group of subjects are verysubjective and diverse. Thus it is not enough to use a MOS todescribe the image quality. In this paper, we propose to describeimage quality using a parameterized distribution rather thana MOS, and an objective method is also proposed to predictthe image quality score distribution (IQSD). At first, the LIVEdatabase is re-recorded. Specifically, we have invited a largegroup of subjects to evaluate the quality of all images in theLIVE database, and each image is evaluated by a large numberof subjects (187 valid subjects), whose scores can form a reliableIQSD. By analyzing the obtained subjective quality scores, wefind that the IQSD can be well modeled by an alpha stable model,and it can reflect much more information than a single MOS, suchas the skewness of opinion score, the subject diversity and themaximum probability score for an image. Therefore, we proposeto model the IQSD using the alpha stable model. Moreover, wepropose a framework and an algorithm to predict the alphastable model based IQSD, where quality features are extractedfrom each image based on structural information and statisticalinformation, and support vector regressors are trained to predictthe alpha stable model parameters. Experimental results verifythe feasibility of using alpha stable model to describe the IQSD,and prove the effectiveness of objective alpha stable model basedIQSD prediction method.

preprint2022arXiv

Refine-Net: Normal Refinement Neural Network for Noisy Point Clouds

Point normal, as an intrinsic geometric property of 3D objects, not only serves conventional geometric tasks such as surface consolidation and reconstruction, but also facilitates cutting-edge learning-based techniques for shape analysis and generation. In this paper, we propose a normal refinement network, called Refine-Net, to predict accurate normals for noisy point clouds. Traditional normal estimation wisdom heavily depends on priors such as surface shapes or noise distributions, while learning-based solutions settle for single types of hand-crafted features. Differently, our network is designed to refine the initial normal of each point by extracting additional information from multiple feature representations. To this end, several feature modules are developed and incorporated into Refine-Net by a novel connection module. Besides the overall network architecture of Refine-Net, we propose a new multi-scale fitting patch selection scheme for the initial normal estimation, by absorbing geometry domain knowledge. Also, Refine-Net is a generic normal estimation framework: 1) point normals obtained from other methods can be further refined, and 2) any feature module related to the surface geometric structures can be potentially integrated into the framework. Qualitative and quantitative evaluations demonstrate the clear superiority of Refine-Net over the state-of-the-arts on both synthetic and real-scanned datasets. Our code is available at https://github.com/hrzhou2/refinenet.

preprint2022arXiv

SAFARI: Sparsity enabled Federated Learning with Limited and Unreliable Communications

Federated learning (FL) enables edge devices to collaboratively learn a model in a distributed fashion. Many existing researches have focused on improving communication efficiency of high-dimensional models and addressing bias caused by local updates. However, most of FL algorithms are either based on reliable communications or assume fixed and known unreliability characteristics. In practice, networks could suffer from dynamic channel conditions and non-deterministic disruptions, with time-varying and unknown characteristics. To this end, in this paper we propose a sparsity enabled FL framework with both communication efficiency and bias reduction, termed as SAFARI. It makes novel use of a similarity among client models to rectify and compensate for bias that is resulted from unreliable communications. More precisely, sparse learning is implemented on local clients to mitigate communication overhead, while to cope with unreliable communications, a similarity-based compensation method is proposed to provide surrogates for missing model updates. We analyze SAFARI under bounded dissimilarity and with respect to sparse models. It is demonstrated that SAFARI under unreliable communications is guaranteed to converge at the same rate as the standard FedAvg with perfect communications. Implementations and evaluations on CIFAR-10 dataset validate the effectiveness of SAFARI by showing that it can achieve the same convergence speed and accuracy as FedAvg with perfect communications, with up to 80% of the model weights being pruned and a high percentage of client updates missing in each round.

preprint2022arXiv

Segmented Learning for Class-of-Service Network Traffic Classification

Class-of-service (CoS) network traffic classification (NTC) classifies a group of similar traffic applications. The CoS classification is advantageous in resource scheduling for Internet service providers and avoids the necessity of remodelling. Our goal is to find a robust, lightweight, and fast-converging CoS classifier that uses fewer data in modelling and does not require specialized tools in feature extraction. The commonality of statistical features among the network flow segments motivates us to propose novel segmented learning that includes essential vector representation and a simple-segment method of classification. We represent the segmented traffic in the vector form using the EVR. Then, the segmented traffic is modelled for classification using random forest. Our solution's success relies on finding the optimal segment size and a minimum number of segments required in modelling. The solution is validated on multiple datasets for various CoS services, including virtual reality (VR). Significant findings of the research work are i) Synchronous services that require acknowledgment and request to continue communication are classified with 99% accuracy, ii) Initial 1,000 packets in any session are good enough to model a CoS traffic for promising results, and we therefore can quickly deploy a CoS classifier, and iii) Test results remain consistent even when trained on one dataset and tested on a different dataset. In summary, our solution is the first to propose segmentation learning NTC that uses fewer features to classify most CoS traffic with an accuracy of 99%. The implementation of our solution is available on GitHub.

preprint2022arXiv

Sequential Doppler Shift based Optimal Localization and Synchronization with TOA

Doppler shift is an important measurement for localization and synchronization (LAS), and is available in various practical systems. Existing studies on LAS techniques in a time division broadcast LAS system (TDBS) only use sequential time-of-arrival (TOA) measurements from the broadcast signals. In this paper, we develop a new optimal LAS method in the TDBS, namely LAS-SDT, by taking advantage of the sequential Doppler shift and TOA measurements. It achieves higher accuracy compared with the conventional TOA-only method for user devices (UDs) with motion and clock drift. Another two variant methods, LAS-SDT-v for the case with UD velocity aiding, and LAS-SDT-k for the case with UD clock drift aiding, are developed. We derive the Cramer-Rao lower bound (CRLB) for these different cases. We show analytically that the accuracies of the estimated UD position, clock offset, velocity and clock drift are all significantly higher than those of the conventional LAS method using TOAs only. Numerical results corroborate the theoretical analysis and show the optimal estimation performance of the LAS-SDT.

preprint2021arXiv

Non-local Channel Aggregation Network for Single Image Rain Removal

Rain streaks showing in images or videos would severely degrade the performance of computer vision applications. Thus, it is of vital importance to remove rain streaks and facilitate our vision systems. While recent convolutinal neural network based methods have shown promising results in single image rain removal (SIRR), they fail to effectively capture long-range location dependencies or aggregate convolutional channel information simultaneously. However, as SIRR is a highly illposed problem, these spatial and channel information are very important clues to solve SIRR. First, spatial information could help our model to understand the image context by gathering long-range dependency location information hidden in the image. Second, aggregating channels could help our model to concentrate on channels more related to image background instead of rain streaks. In this paper, we propose a non-local channel aggregation network (NCANet) to address the SIRR problem. NCANet models 2D rainy images as sequences of vectors in three directions, namely vertical direction, transverse direction and channel direction. Recurrently aggregating information from all three directions enables our model to capture the long-range dependencies in both channels and spaitials locations. Extensive experiments on both heavy and light rain image data sets demonstrate the effectiveness of the proposed NCANet model.

preprint2021arXiv

Optimal Localization with Sequential Pseudorange Measurements for Moving Users in a Time Division Broadcast Positioning System

In a time division broadcast positioning system (TDBPS), a user device (UD) determines its position by obtaining sequential time-of-arrival (TOA) or pseudorange measurements from signals broadcast by multiple synchronized base stations (BSs). The existing localization method using sequential pseudorange measurements and a linear clock drift model for the TDPBS, namely LSPM-D, does not compensate the position displacement caused by the UD movement and will result in position error. In this paper, depending on the knowledge of the UD velocity, we develop a set of optimal localization methods for different cases. First, for known UD velocity, we develop the optimal localization method, namely LSPM-KVD, to compensate the movement-caused position error. We show that the LSPM-D is a special case of the LSPM-KVD when the UD is stationary with zero velocity. Second, for the case with unknown UD velocity, we develop a maximum likelihood (ML) method to jointly estimate the UD position and velocity, namely LSPM-UVD. Third, in the case that we have prior distribution information of the UD velocity, we present a maximum a posteriori (MAP) estimator for localization, namely LSPM-PVD. We derive the Cramer-Rao lower bound (CRLB) for all three estimators and analyze their localization error performance. We show that the position error of the LSPM-KVD increases as the assumed known velocity deviates from the true value. As expected, the LSPM-KVD has the smallest position error while the LSPM-PVD and the LSPM-UVD are more robust when the prior knowledge of the UD velocity is limited. Numerical results verify the theoretical analysis on the optimality and the positioning accuracy of the proposed methods.

preprint2020arXiv

Change Detection in Heterogeneous Optical and SAR Remote Sensing Images via Deep Homogeneous Feature Fusion

Change detection in heterogeneous remote sensing images is crucial for disaster damage assessment. Recent methods use homogenous transformation, which transforms the heterogeneous optical and SAR remote sensing images into the same feature space, to achieve change detection. Such transformations mainly operate on the low-level feature space and may corrupt the semantic content, deteriorating the performance of change detection. To solve this problem, this paper presents a new homogeneous transformation model termed deep homogeneous feature fusion (DHFF) based on image style transfer (IST). Unlike the existing methods, the DHFF method segregates the semantic content and the style features in the heterogeneous images to perform homogeneous transformation. The separation of the semantic content and the style in homogeneous transformation prevents the corruption of image semantic content, especially in the regions of change. In this way, the detection performance is improved with accurate homogeneous transformation. Furthermore, we present a new iterative IST (IIST) strategy, where the cost function in each IST iteration measures and thus maximizes the feature homogeneity in additional new feature subspaces for change detection. After that, change detection is accomplished accurately on the original and the transformed images that are in the same feature space. Real remote sensing images acquired by SAR and optical satellites are utilized to evaluate the performance of the proposed method. The experiments demonstrate that the proposed DHFF method achieves significant improvement for change detection in heterogeneous optical and SAR remote sensing images, in terms of both accuracy rate and Kappa index.

preprint2020arXiv

Dynamic Spatiotemporal Graph Neural Network with Tensor Network

Dynamic spatial graph construction is a challenge in graph neural network (GNN) for time series data problems. Although some adaptive graphs are conceivable, only a 2D graph is embedded in the network to reflect the current spatial relation, regardless of all the previous situations. In this work, we generate a spatial tensor graph (STG) to collect all the dynamic spatial relations, as well as a temporal tensor graph (TTG) to find the latent pattern along time at each node. These two tensor graphs share the same nodes and edges, which leading us to explore their entangled correlations by Projected Entangled Pair States (PEPS) to optimize the two graphs. We experimentally compare the accuracy and time costing with the state-of-the-art GNN based methods on the public traffic datasets.

preprint2020arXiv

Re-synchronization using the Hand Preceding Model for Multi-modal Fusion in Automatic Continuous Cued Speech Recognition

Cued Speech (CS) is an augmented lip reading complemented by hand coding, and it is very helpful to the deaf people. Automatic CS recognition can help communications between the deaf people and others. Due to the asynchronous nature of lips and hand movements, fusion of them in automatic CS recognition is a challenging problem. In this work, we propose a novel re-synchronization procedure for multi-modal fusion, which aligns the hand features with lips feature. It is realized by delaying hand position and hand shape with their optimal hand preceding time which is derived by investigating the temporal organizations of hand position and hand shape movements in CS. This re-synchronization procedure is incorporated into a practical continuous CS recognition system that combines convolutional neural network (CNN) with multi-stream hidden markov model (MSHMM). A significant improvement of about 4.6\% has been achieved retaining 76.6\% CS phoneme recognition correctness compared with the state-of-the-art architecture (72.04\%), which did not take into account the asynchrony of multi-modal fusion in CS. To our knowledge, this is the first work to tackle the asynchronous multi-modal fusion in the automatic continuous CS recognition.

preprint2016arXiv

Near-Infrared Coloring via a Contrast-Preserving Mapping Model

Near-infrared gray images captured together with corresponding visible color images have recently proven useful for image restoration and classification. This paper introduces a new coloring method to add colors to near-infrared gray images based on a contrast-preserving mapping model. A naive coloring method directly adds the colors from the visible color image to the near-infrared gray image; however, this method results in an unrealistic image because of the discrepancies in brightness and image structure between the captured near-infrared gray image and the visible color image. To solve the discrepancy problem, first we present a new contrast-preserving mapping model to create a new near-infrared gray image with a similar appearance in the luminance plane to the visible color image, while preserving the contrast and details of the captured near-infrared gray image. Then based on the proposed contrast-preserving mapping model, we develop a method to derive realistic colors that can be added to the newly created near-infrared gray image. Experimental results show that the proposed method can not only preserve the local contrasts and details of the captured near-infrared gray image, but transfers the realistic colors from the visible color image to the newly created near-infrared gray image. Experimental results also show that the proposed approach can be applied to near-infrared denoising.

preprint2016arXiv

Near-Infrared Image Dehazing Via Color Regularization

Near-infrared imaging can capture haze-free near-infrared gray images and visible color images, according to physical scattering models, e.g., Rayleigh or Mie models. However, there exist serious discrepancies in brightness and image structures between the near-infrared gray images and the visible color images. The direct use of the near-infrared gray images brings about another color distortion problem in the dehazed images. Therefore, the color distortion should also be considered for near-infrared dehazing. To reflect this point, this paper presents an approach of adding a new color regularization to conventional dehazing framework. The proposed color regularization can model the color prior for unknown haze-free images from two captured images. Thus, natural-looking colors and fine details can be induced on the dehazed images. The experimental results show that the proposed color regularization model can help remove the color distortion and the haze at the same time. Also, the effectiveness of the proposed color regularization is verified by comparing with other conventional regularizations. It is also shown that the proposed color regularization can remove the edge artifacts which arise from the use of the conventional dark prior model.

preprint2016arXiv

Rain structure transfer using an exemplar rain image for synthetic rain image generation

This letter proposes a simple method of transferring rain structures of a given exemplar rain image into a target image. Given the exemplar rain image and its corresponding masked rain image, rain patches including rain structures are extracted randomly, and then residual rain patches are obtained by subtracting those rain patches from their mean patches. Next, residual rain patches are selected randomly, and then added to the given target image along a raster scanning direction. To decrease boundary artifacts around the added patches on the target image, minimum error boundary cuts are found using dynamic programming, and then blending is conducted between overlapping patches. Our experiment shows that the proposed method can generate realistic rain images that have similar rain structures in the exemplar images. Moreover, it is expected that the proposed method can be used for rain removal. More specifically, natural images and synthetic rain images generated via the proposed method can be used to learn classifiers, for example, deep neural networks, in a supervised manner.

preprint2015arXiv

Performance Limits and Geometric Properties of Array Localization

Location-aware networks are of great importance and interest in both civil and military applications. This paper determines the localization accuracy of an agent, which is equipped with an antenna array and localizes itself using wireless measurements with anchor nodes, in a far-field environment. In view of the Cramér-Rao bound, we first derive the localization information for static scenarios and demonstrate that such information is a weighed sum of Fisher information matrices from each anchor-antenna measurement pair. Each matrix can be further decomposed into two parts: a distance part with intensity proportional to the squared baseband effective bandwidth of the transmitted signal and a direction part with intensity associated with the normalized anchor-antenna visual angle. Moreover, in dynamic scenarios, we show that the Doppler shift contributes additional direction information, with intensity determined by the agent velocity and the root mean squared time duration of the transmitted signal. In addition, two measures are proposed to evaluate the localization performance of wireless networks with different anchor-agent and array-antenna geometries, and both formulae and simulations are provided for typical anchor deployments and antenna arrays.

preprint2014arXiv

Bayesian Nonparametric Dictionary Learning for Compressed Sensing MRI

We develop a Bayesian nonparametric model for reconstructing magnetic resonance images (MRI) from highly undersampled k-space data. We perform dictionary learning as part of the image reconstruction process. To this end, we use the beta process as a nonparametric dictionary learning prior for representing an image patch as a sparse combination of dictionary elements. The size of the dictionary and the patch-specific sparsity pattern are inferred from the data, in addition to other dictionary learning variables. Dictionary learning is performed directly on the compressed image, and so is tailored to the MRI being considered. In addition, we investigate a total variation penalty term in combination with the dictionary learning model, and show how the denoising property of dictionary learning removes dependence on regularization parameters in the noisy setting. We derive a stochastic optimization algorithm based on Markov Chain Monte Carlo (MCMC) for the Bayesian model, and use the alternating direction method of multipliers (ADMM) for efficiently performing total variation minimization. We present empirical results on several MRI, which show that the proposed regularization framework can improve reconstruction accuracy over other methods.

preprint2013arXiv

Cosmogenic Nuclei Production Rate on the Lunar Surface

A physical model of Geant4-based simulation of galactic cosmic ray (GCR) particles interaction with the lunar surface matter has been developed to investigate the production rate of cosmogenic nuclei. In this model the GCRs, mainly very high energy protons and $α$ particles, bombard the surface of the Moon and produce many secondary particles such as protons and neutrons. The energies of proton and neutron at different depths are recorded and saved into ROOT files, and the analytical expressions for the differential proton and neutron fluxes are obtained through the best-fit procedure under the ROOT software. To test the validity of this model, we calculate the production rates of long-lived nuclei $^{10}$Be and $^{26}$Al in the Apollo 15 long drill core by combining the above differential fluxes and the newly evaluated spallation reaction cross sections. Numerical results show that the theoretical production rates agree quite well with the measured data. It means that this model works well. Therefore, it can be expected that this model can be used to investigate the cosmogenic nuclei in lunar samples returned by Chinese lunar exploration program and can be extended to study other objects, such as the meteorites and the Earth's atmosphere.

preprint2013arXiv

Far-Field Tunable Nano-focusing Based on Metallic Slits Surrounded with Nonlinear-Variant Widths and Linear-Variant Depths of Circular Dielectric Grating

In this work, we design a new tunable nanofocusing lens by the linear-variant depths and nonlinear-variant widths of circular grating for far field practical applications. The constructively interference of cylindrical surface plasmon launched by the subwavelength metallic structure can form a subdiffraction-limited focus, and the focal length of the this structures can be adjusted if the each groove depth and width of circular grating are arranged in traced profile. According to the numerical calculation, the range of focusing points shift is much more than other plasmonic lens, and the relative phase of emitting light scattered by surface plasmon coupling circular grating can be modulated by the nonlinear-variant width and linear-variant depth. The simulation result indicates that the different relative phase of emitting light lead to variant focal length. We firstly show a unique phenomenon for the linear-variant depths and nonlinear-variant widths of circular grating that the positive change and negative change of the depths and widths of grooves can result in different of variation trend between relative phases and focal lengths. These results paved the road for utilizing the plasmonic lens in high-density optical storage, nanolithography, superresolution optical microscopic imaging, optical trapping, and sensing.

preprint2010arXiv

Noise Invalidation Denoising

A denoising technique based on noise invalidation is proposed. The adaptive approach derives a noise signature from the noise order statistics and utilizes the signature to denoise the data. The novelty of this approach is in presenting a general-purpose denoising in the sense that it does not need to employ any particular assumption on the structure of the noise-free signal, such as data smoothness or sparsity of the coefficients. An advantage of the method is in denoising the corrupted data in any complete basis transformation (orthogonal or non-orthogonal). Experimental results show that the proposed method, called Noise Invalidation Denoising (NIDe), outperforms existing denoising approaches in terms of Mean Square Error (MSE).

Xiao-Ping Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

27 published item(s)

Parameter Convergence Radar Detector Based on VAMP Deep Unfolding

Step-by-Step Causality: Transparent Causal Discovery with Multi-Agent Tree-Query and Adversarial Confidence Estimation

Verifiable Process Rewards for Agentic Reasoning

Language-free Compositional Action Generation via Decoupling Refinement

Finding the Most Transferable Tasks for Brain Image Segmentation

Boosting Video Representation Learning with Multi-Faceted Integration

Distributed Optimal Power Flow for VSC-MTDC Meshed AC/DC Grids Using ALADIN

Dual Vision Transformer

New Closed-form Joint Localization and Synchronization using Sequential One-way TOAs

Parameterized Image Quality Score Distribution Prediction

Refine-Net: Normal Refinement Neural Network for Noisy Point Clouds

SAFARI: Sparsity enabled Federated Learning with Limited and Unreliable Communications

Segmented Learning for Class-of-Service Network Traffic Classification

Sequential Doppler Shift based Optimal Localization and Synchronization with TOA

Non-local Channel Aggregation Network for Single Image Rain Removal

Optimal Localization with Sequential Pseudorange Measurements for Moving Users in a Time Division Broadcast Positioning System

Change Detection in Heterogeneous Optical and SAR Remote Sensing Images via Deep Homogeneous Feature Fusion

Dynamic Spatiotemporal Graph Neural Network with Tensor Network

Re-synchronization using the Hand Preceding Model for Multi-modal Fusion in Automatic Continuous Cued Speech Recognition

Near-Infrared Coloring via a Contrast-Preserving Mapping Model

Near-Infrared Image Dehazing Via Color Regularization

Rain structure transfer using an exemplar rain image for synthetic rain image generation

Performance Limits and Geometric Properties of Array Localization

Bayesian Nonparametric Dictionary Learning for Compressed Sensing MRI

Cosmogenic Nuclei Production Rate on the Lunar Surface

Far-Field Tunable Nano-focusing Based on Metallic Slits Surrounded with Nonlinear-Variant Widths and Linear-Variant Depths of Circular Dielectric Grating

Noise Invalidation Denoising