Source author record

Haifeng Li

Haifeng Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

38works

21topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

iGSP:Implicit Gradient Subspace Projection for Efficient Continual Learning of Vision-Language Models

Vision-Language Models require efficient adaptation to continually emerging downstream tasks. While Parameter-Efficient Fine-Tuning mitigates catastrophic forgetting, assigning isolated modules per task leads to parameter explosion. Conversely, recent similarity-driven sharing mechanisms falsely equate superficial visual similarity with underlying alignment consistency. This fundamental mismatch triggers severe negative transfer between visually similar but logically distinct tasks and fails to exploit alignment reuse across visually diverse ones. We argue thatalignment sharing is fundamentally a geometric problem of overlapping optimization trajectories within shared low-rank subspaces. Grounded in this insight, we propose iGSP, a novel framework that achieves efficient adaptation via implicit gradient subspace projection. Leveraging the early convergence of MoE routers to establish the subspace basis, iGSP bifurcates the adaptation process into two phases. First, the Subspace Identification phase introduces candidate experts via basis pre-expansion, applies a novel subspace-constrained regularization to implicitly project new task gradients onto the historical subspace, and precisely prunes redundant dimensions by treating routing probabilities as gradient flow indicators, ultimately to maximize knowledge reuse. Second, the Orthogonal Subspace Fine-Tuning phase fixes this structural basis and removes the regularization to rapidly fit the task-specific residual loss. Extensive experiments on the MTIL benchmark demonstrate that iGSP achieves state-of-the-art accuracy while significantly improving training efficiency, reducing the average trainable parameters by 42.7\% compared to current SOTA methods, and decreasing the final total parameters by 86.9\% relative to counterparts. The source code is available at https://github.com/GeoX-Lab/iGSP.

preprint2026arXiv

RS-Claw: Progressive Active Tool Exploration via Hierarchical Skill Trees for Remote Sensing Agents

The rise of multi-modal large language models (MLLMs) is shifting remote sensing (RS) intelligence from "see" to "action", as OpenClaw-style frameworks enable agents to autonomously operate massive RS image-processing tools for complex tasks. Existing RS agents adopt a passive selection paradigm for tool invocation, relying on either full tool registration (Flat) or retrieval-augmented generation (RAG). However, in the massive and multi-source heterogeneous RS tool ecosystem, such passive mechanisms struggle to dynamically balance "context load" and "toolset completeness" throughout task reasoning, thus exhibiting inherent limitations: full tool registration triggers context space deficits during long-horizon tasks, whereas RAG retrieval may omit critical tools in essential steps. To overcome these bottlenecks, this paper redefines tool selection by arguing that the agent should act as an active explorer within the tool space. Based on this perspective, we propose RS-Claw, a novel RS agent architecture. By leveraging Skill encapsulation technology at the tool end, this architecture hierarchically structures tool descriptions, enabling the agent to execute on-demand sequential decision-making: initially selecting relevant skill branches by reading only tool summaries, then dynamically loading detailed descriptions, and ultimately achieving precise invocation. This active paradigm not only significantly liberates the agent's context space but also effectively ensures the accurate hit rate of critical tools during long-horizon reasoning. Systematic experiments on the Earth-Bench benchmark demonstrate that RS-Claw's active exploration mechanism effectively filters semantic noise and substantially frees up reasoning space, achieving an input token compression ratio of up to 86%, and comprehensively outperforming existing Flat and RAG baselines across complex reasoning evaluations.

preprint2026arXiv

The Wittgensteinian Representation Hypothesis: Is Language the Attractor of Multimodal Convergence?

Understanding why independently trained neural networks from different modalities converge toward shared representations, and where this convergence leads, remains an open question in representation learning. All existing evidence relies on symmetric similarity measures, which can detect convergence but are structurally blind to its direction. We introduce directional convergence analysis using cycle-kNN, an asymmetric alignment measure, applied across dozens of independently trained unimodal models spanning point clouds, vision, and language. We uncover a consistent directional asymmetry: non-language modalities move toward the neighborhood structure of language significantly more than the reverse, and this pattern holds across all model families and scales--yet is entirely invisible to symmetric measures. Mechanistic analysis traces the directionality to feature density asymmetry, whereby language representations occupy the most compact regions of representational space. The Information Bottleneck framework provides a principled interpretation: optimization under compression drives representations toward discrete, compositional structures characteristic of language. We formalize this as the Wittgensteinian Representation Hypothesis: the semantic structure of language is the asymptotic attractor of multimodal representation convergence.

preprint2026arXiv

UAVFF3D: A Geometry-Aware Benchmark for Feed-Forward UAV 3D Reconstruction

Feed-forward 3D reconstruction has advanced rapidly, but current models remain unreliable in UAV photogrammetric acquisition. We argue that this failure is caused not only by appearance-domain shift, but also by UAV-specific camera-geometry variations, especially oblique views and HFOV-height ambiguity. Existing UAV datasets mainly emphasize scene diversity and provide limited coverage of camera configurations, which restricts robustness evaluation and UAV-domain adaptation. To address this gap, we introduce UAVFF3D, a geometry-aware real-synthetic benchmark for feed-forward UAV 3D reconstruction. UAVFF3D contains more than 170k real UAV images and more than 370k synthetic images rendered from high-quality textured 3D models, covering diverse HFOVs, flight altitudes, viewing directions, and acquisition patterns. It also includes a controlled HFOV-height test subset for diagnosing projection-geometry ambiguity. We further propose an evaluation protocol that jointly assesses camera-geometry estimation and dense scene reconstruction under a shared global alignment, avoiding the bias caused by separate camera and geometry alignments. Experiments on representative feed-forward reconstruction models show that UAVFF3D-based domain adaptation consistently improves camera and geometry estimation, reducing Ray Error by up to 84.2%, Pose ATE by up to 76.0%, and Chamfer Distance by up to 41.1%. In oblique scenes, adaptation reduces the oblique-nadir rotation gap by up to 90.7%. Under HFOV-height ambiguity, it improves robustness across HFOV-height configurations and yields more stable performance across HFOV settings. Incorporating camera priors further improves reconstruction under UAV-specific acquisition geometries. The dataset and evaluation code are available at https://github.com/yanxian-ll/UAVFF3D .

preprint2022arXiv

A Data-driven Adversarial Examples Recognition Framework via Adversarial Feature Genome

Adversarial examples pose many security threats to convolutional neural networks (CNNs). Most defense algorithms prevent these threats by finding differences between the original images and adversarial examples. However, the found differences do not contain features about the classes, so these defense algorithms can only detect adversarial examples without recovering the correct labels. In this regard, we propose the Adversarial Feature Genome (AFG), a novel type of data that contains both the differences and features about classes. This method is inspired by an observed phenomenon, namely the Adversarial Feature Separability (AFS), where the difference between the feature maps of the original images and adversarial examples becomes larger with deeper layers. On top of that, we further develop an adversarial example recognition framework that detects adversarial examples and can recover the correct labels. In the experiments, the detection and classification of adversarial examples by AFGs has an accuracy of more than 90.01\% in various attack scenarios. To the best of our knowledge, our method is the first method that focuses on both attack detecting and recovering. AFG gives a new data-driven perspective to improve the robustness of CNNs. The source code is available at https://github.com/GeoX-Lab/Adv_Fea_Genome.

preprint2022arXiv

Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds

Point cloud scene flow estimation is of practical importance for dynamic scene navigation in autonomous driving. Since scene flow labels are hard to obtain, current methods train their models on synthetic data and transfer them to real scenes. However, large disparities between existing synthetic datasets and real scenes lead to poor model transfer. We make two major contributions to address that. First, we develop a point cloud collector and scene flow annotator for GTA-V engine to automatically obtain diverse realistic training samples without human intervention. With that, we develop a large-scale synthetic scene flow dataset GTA-SF. Second, we propose a mean-teacher-based domain adaptation framework that leverages self-generated pseudo-labels of the target domain. It also explicitly incorporates shape deformation regularization and surface correspondence refinement to address distortions and misalignments in domain transfer. Through extensive experiments, we show that our GTA-SF dataset leads to a consistent boost in model generalization to three real datasets (i.e., Waymo, Lyft and KITTI) as compared to the most widely used FT3D dataset. Moreover, our framework achieves superior adaptation performance on six source-target dataset pairs, remarkably closing the average domain gap by 60%. Data and codes are available at https://github.com/leolyj/DCA-SRSFE

preprint2022arXiv

Efficient ILC analysis on polarization maps after EB leakage correction

The Internal Linear Combination (ILC) is widely used to extract the cosmic microwave background (CMB) signal from multi-frequency observation maps, especially for Satellite experiments with quasi-full sky coverage. We extend ILC method to CMB polarization map analysis with a small sky patch which is especially typical for ground-based experiments, by combing ILC with a template cleaning method which can give pure $B$ map free from $EB$ leakage caused by partial sky coverage. The feature of our methods is that we do the ILC analysis on pseudo-scalar $B$ maps, and the advantage is that it totally avoids the impact of $EB$ leakage on ILC, so that it can improve the efficiency of component separation dramatically. We demonstrate our methods with mock data of a future ground-based experiment with a deep survey on a clean patch in the northern sky, and the results show that the level of foreground residual can be well controlled, it biases the tensor to scalar ratio ($r$) at the order of $10^{-3}$ which is comparable to the statistical error by noise.

preprint2022arXiv

Global and Local Contrastive Self-Supervised Learning for Semantic Segmentation of HR Remote Sensing Images

Supervised learning for semantic segmentation requires a large number of labeled samples, which is difficult to obtain in the field of remote sensing. Self-supervised learning (SSL), can be used to solve such problems by pre-training a general model with a large number of unlabeled images and then fine-tuning it on a downstream task with very few labeled samples. Contrastive learning is a typical method of SSL that can learn general invariant features. However, most existing contrastive learning methods are designed for classification tasks to obtain an image-level representation, which may be suboptimal for semantic segmentation tasks requiring pixel-level discrimination. Therefore, we propose a global style and local matching contrastive learning network (GLCNet) for remote sensing image semantic segmentation. Specifically, 1) the global style contrastive learning module is used to better learn an image-level representation, as we consider that style features can better represent the overall image features. 2) The local features matching contrastive learning module is designed to learn representations of local regions, which is beneficial for semantic segmentation. The experimental results show that our method mostly outperforms SOTA self-supervised methods and the ImageNet pre-training method. Specifically, with 1\% annotation from the original dataset, our approach improves Kappa by 6\% on the ISPRS Potsdam dataset relative to the existing baseline. Moreover, our method outperforms supervised learning methods when there are some differences between the datasets of upstream tasks and downstream tasks. Since SSL could directly learn the essential characteristics of data from unlabeled data, which is easy to obtain in the remote sensing field, this may be of great significance for tasks such as global mapping. The source code is available at https://github.com/GeoX-Lab/G-RSIM.

preprint2022arXiv

Image Segmentation with Adaptive Spatial Priors from Joint Registration

Image segmentation is a crucial but challenging task that has many applications. In medical imaging for instance, intensity inhomogeneity and noise are common. In thigh muscle images, different muscles are closed packed together and there are often no clear boundaries between them. Intensity based segmentation models cannot separate one muscle from another. To solve such problems, in this work we present a segmentation model with adaptive spatial priors from joint registration. This model combines segmentation and registration in a unified framework to leverage their positive mutual influence. The segmentation is based on a modified Gaussian mixture model (GMM), which integrates intensity inhomogeneity and spacial smoothness. The registration plays the role of providing a shape prior. We adopt a modified sum of squared difference (SSD) fidelity term and Tikhonov regularity term for registration, and also utilize Gaussian pyramid and parametric method for robustness. The connection between segmentation and registration is guaranteed by the cross entropy metric that aims to make the segmentation map (from segmentation) and deformed atlas (from registration) as similar as possible. This joint framework is implemented within a constraint optimization framework, which leads to an efficient algorithm. We evaluate our proposed model on synthetic and thigh muscle MR images. Numerical results show the improvement as compared to segmentation and registration performed separately and other joint models.

preprint2022arXiv

KST-GCN: A Knowledge-Driven Spatial-Temporal Graph Convolutional Network for Traffic Forecasting

While considering the spatial and temporal features of traffic, capturing the impacts of various external factors on travel is an essential step towards achieving accurate traffic forecasting. However, existing studies seldom consider external factors or neglect the effect of the complex correlations among external factors on traffic. Intuitively, knowledge graphs can naturally describe these correlations. Since knowledge graphs and traffic networks are essentially heterogeneous networks, it is challenging to integrate the information in both networks. On this background, this study presents a knowledge representation-driven traffic forecasting method based on spatial-temporal graph convolutional networks. We first construct a knowledge graph for traffic forecasting and derive knowledge representations by a knowledge representation learning method named KR-EAR. Then, we propose the Knowledge Fusion Cell (KF-Cell) to combine the knowledge and traffic features as the input of a spatial-temporal graph convolutional backbone network. Experimental results on the real-world dataset show that our strategy enhances the forecasting performances of backbones at various prediction horizons. The ablation and perturbation analysis further verify the effectiveness and robustness of the proposed method. To the best of our knowledge, this is the first study that constructs and utilizes a knowledge graph to facilitate traffic forecasting; it also offers a promising direction to integrate external information and spatial-temporal information for traffic forecasting. The source code is available at https://github.com/lehaifeng/T-GCN/tree/master/KST-GCN.

preprint2022arXiv

TOV: The Original Vision Model for Optical Remote Sensing Image Understanding via Self-supervised Learning

Do we on the right way for remote sensing image understanding (RSIU) by training models via supervised data-dependent and task-dependent way, instead of human vision in a label-free and task-independent way? We argue that a more desirable RSIU model should be trained with intrinsic structure from data rather that extrinsic human labels to realize generalizability across a wide range of RSIU tasks. According to this hypothesis, we proposed \textbf{T}he \textbf{O}riginal \textbf{V}ision model (TOV) in remote sensing filed. Trained by massive unlabeled optical data along a human-like self-supervised learning (SSL) path that is from general knowledge to specialized knowledge, TOV model can be easily adapted to various RSIU tasks, including scene classification, object detection, and semantic segmentation, and outperforms dominant ImageNet supervised pretrained method as well as two recently proposed SSL pretrained methods on majority of 12 publicly available benchmarks. Moreover, we analyze the influences of two key factors on the performance of building TOV model for RSIU, including the influence of using different data sampling methods and the selection of learning paths during self-supervised optimization. We believe that a general model which is trained by a label-free and task-independent way may be the next paradigm for RSIU and hope the insights distilled from this study can help to foster the development of an original vision model for RSIU.

preprint2022arXiv

Weighted Simultaneous Algebra Reconstruction Technique (wSART) for Additive Light Field Synthesis

We apply an iterative weighting scheme for additive light field synthesis. Unlike previous work optimizing additive light field evenly over viewpoints, we constrain the optimization to deliver a reconstructed light field of high image quality for viewpoints of large weight.

preprint2021arXiv

Crystal field effects in the zig-zag chain compound SrTm$_2$O$_4$

The single ion properties of the zig-zag chain compound SrTm$_2$O$_4$ have been investigated using heat capacity, magnetic susceptibility, magnetization, inelastic neutron scattering, and polarized muon spectroscopy. Two crystal field models are employed to estimate the single ion properties; a Density Function Theory based model and an effective charge model based on the Hutchings point charge model. The latter describes our experimental results well. This model estimates an easy-axis anisotropy for one of the Tm$^{3+}$ sites and an easy-plane anisotropy for the second site. It also predicts a mixed ground state with dominating $J = 0$ characteristics for both sites. Additionally, muon spin rotation/relaxation ($μ^+$SR) spectra reveal oscillations, typically a sign of long-range magnetic order. However, the temperature dependence of the precession frequency and the relaxation rates indicate that the system is in an extended critical regime and the observed relaxation is actually dynamic.

preprint2021arXiv

Depth-Enhanced Feature Pyramid Network for Occlusion-Aware Verification of Buildings from Oblique Images

Detecting the changes of buildings in urban environments is essential. Existing methods that use only nadir images suffer from severe problems of ambiguous features and occlusions between buildings and other regions. Furthermore, buildings in urban environments vary significantly in scale, which leads to performance issues when using single-scale features. To solve these issues, this paper proposes a fused feature pyramid network, which utilizes both color and depth data for the 3D verification of existing buildings 2D footprints from oblique images. First, the color data of oblique images are enriched with the depth information rendered from 3D mesh models. Second, multiscale features are fused in the feature pyramid network to convolve both the color and depth data. Finally, multi-view information from both the nadir and oblique images is used in a robust voting procedure to label changes in existing buildings. Experimental evaluations using both the ISPRS benchmark datasets and Shenzhen datasets reveal that the proposed method outperforms the ResNet and EfficientNet networks by 5\% and 2\%, respectively, in terms of recall rate and precision. We demonstrate that the proposed method can successfully detect all changed buildings; therefore, only those marked as changed need to be manually checked during the pipeline updating procedure; this significantly reduces the manual quality control requirements. Moreover, ablation studies indicate that using depth data, feature pyramid modules, and multi-view voting strategies can lead to clear and progressive improvements.

preprint2021arXiv

Generating Multi-scale Maps from Remote Sensing Images via Series Generative Adversarial Networks

Considering the success of generative adversarial networks (GANs) for image-to-image translation, researchers have attempted to translate remote sensing images (RSIs) to maps (rs2map) through GAN for cartography. However, these studies involved limited scales, which hinders multi-scale map creation. By extending their method, multi-scale RSIs can be trivially translated to multi-scale maps (multi-scale rs2map translation) through scale-wise rs2map models trained for certain scales (parallel strategy). However, this strategy has two theoretical limitations. First, inconsistency between various spatial resolutions of multi-scale RSIs and object generalization on multi-scale maps (RS-m inconsistency) increasingly complicate the extraction of geographical information from RSIs for rs2map models with decreasing scale. Second, as rs2map translation is cross-domain, generators incur high computation costs to transform the RSI pixel distribution to that on maps. Thus, we designed a series strategy of generators for multi-scale rs2map translation to address these limitations. In this strategy, high-resolution RSIs are inputted to an rs2map model to output large-scale maps, which are translated to multi-scale maps through series multi-scale map translation models. The series strategy avoids RS-m inconsistency as inputs are high-resolution large-scale RSIs, and reduces the distribution gap in multi-scale map generation through similar pixel distributions among multi-scale maps. Our experimental results showed better quality multi-scale map generation with the series strategy, as shown by average increases of 11.69%, 53.78%, 55.42%, and 72.34% in the structural similarity index, edge structural similarity index, intersection over union (road), and intersection over union (water) for data from Mexico City and Tokyo at zoom level 17-13.

preprint2021arXiv

Graph Information Vanishing Phenomenon inImplicit Graph Neural Networks

One of the key problems of GNNs is how to describe the importance of neighbor nodes in the aggregation process for learning node representations. A class of GNNs solves this problem by learning implicit weights to represent the importance of neighbor nodes, which we call implicit GNNs such as Graph Attention Network. The basic idea of implicit GNNs is to introduce graph information with special properties followed by Learnable Transformation Structures (LTS) which encode the importance of neighbor nodes via a data-driven way. In this paper, we argue that LTS makes the special properties of graph information disappear during the learning process, resulting in graph information unhelpful for learning node representations. We call this phenomenon Graph Information Vanishing (GIV). Also, we find that LTS maps different graph information into highly similar results. To validate the above two points, we design two sets of 70 random experiments on five Implicit GNNs methods and seven benchmark datasets by using a random permutation operator to randomly disrupt the order of graph information and replacing graph information with random values. We find that randomization does not affect the model performance in 93\% of the cases, with about 7 percentage causing an average 0.5\% accuracy loss. And the cosine similarity of output results, generated by LTS mapping different graph information, over 99\% with an 81\% proportion. The experimental results provide evidence to support the existence of GIV in Implicit GNNs and imply that the existing methods of Implicit GNNs do not make good use of graph information. The relationship between graph information and LTS should be rethought to ensure that graph information is used in node representation.

preprint2021arXiv

Overcoming Long-term Catastrophic Forgetting through Adversarial Neural Pruning and Synaptic Consolidation

Artificial neural networks face the well-known problem of catastrophic forgetting. What's worse, the degradation of previously learned skills becomes more severe as the task sequence increases, known as the long-term catastrophic forgetting. It is due to two facts: first, as the model learns more tasks, the intersection of the low-error parameter subspace satisfying for these tasks becomes smaller or even does not exist; second, when the model learns a new task, the cumulative error keeps increasing as the model tries to protect the parameter configuration of previous tasks from interference. Inspired by the memory consolidation mechanism in mammalian brains with synaptic plasticity, we propose a confrontation mechanism in which Adversarial Neural Pruning and synaptic Consolidation (ANPyC) is used to overcome the long-term catastrophic forgetting issue. The neural pruning acts as long-term depression to prune task-irrelevant parameters, while the novel synaptic consolidation acts as long-term potentiation to strengthen task-relevant parameters. During the training, this confrontation achieves a balance in that only crucial parameters remain, and non-significant parameters are freed to learn subsequent tasks. ANPyC avoids forgetting important information and makes the model efficient to learn a large number of tasks. Specifically, the neural pruning iteratively relaxes the current task's parameter conditions to expand the common parameter subspace of the task; the synaptic consolidation strategy, which consists of a structure-aware parameter-importance measurement and an element-wise parameter updating strategy, decreases the cumulative error when learning new tasks. The full source code is available at https://github.com/GeoX-Lab/ANPyC.

preprint2020arXiv

A3T-GCN: Attention Temporal Graph Convolutional Network for Traffic Forecasting

Accurate real-time traffic forecasting is a core technological problem against the implementation of the intelligent transportation system. However, it remains challenging considering the complex spatial and temporal dependencies among traffic flows. In the spatial dimension, due to the connectivity of the road network, the traffic flows between linked roads are closely related. In terms of the temporal factor, although there exists a tendency among adjacent time points in general, the importance of distant past points is not necessarily smaller than that of recent past points since traffic flows are also affected by external factors. In this study, an attention temporal graph convolutional network (A3T-GCN) traffic forecasting method was proposed to simultaneously capture global temporal dynamics and spatial correlations. The A3T-GCN model learns the short-time trend in time series by using the gated recurrent units and learns the spatial dependence based on the topology of the road network through the graph convolutional network. Moreover, the attention mechanism was introduced to adjust the importance of different time points and assemble global temporal information to improve prediction accuracy. Experimental results in real-world datasets demonstrate the effectiveness and robustness of proposed A3T-GCN. The source code can be visited at https://github.com/lehaifeng/T-GCN/A3T.

preprint2020arXiv

Adversarial Example in Remote Sensing Image Recognition

With the wide application of remote sensing technology in various fields, the accuracy and security requirements for remote sensing images (RSIs) recognition are also increasing. In recent years, due to the rapid development of deep learning in the field of image recognition, RSI recognition models based on deep convolution neural networks (CNNs) outperform traditional hand-craft feature techniques. However, CNNs also pose security issues when they show their capability of accurate classification. By adding a very small variation of the adversarial perturbation to the input image, the CNN model can be caused to produce erroneous results with extremely high confidence, and the modification of the image is not perceived by the human eye. This added adversarial perturbation image is called an adversarial example, which poses a serious security problem for systems based on CNN model recognition results. This paper, for the first time, analyzes adversarial example problem of RSI recognition under CNN models. In the experiments, we used different attack algorithms to fool multiple high-accuracy RSI recognition models trained on multiple RSI datasets. The results show that RSI recognition models are also vulnerable to adversarial examples, and the models with different structures trained on the same RSI dataset also have different vulnerabilities. For each RSI dataset, the number of features also affects the vulnerability of the model. Many features are good for defensive adversarial examples. Further, we find that the attacked class of RSI has an attack selectivity property. The misclassification of adversarial examples of the RSIs are related to the similarity of the original classes in the CNN feature space. In addition, adversarial examples in RSI recognition are of great significance for the security of remote sensing applications, showing a huge potential for future research.

preprint2020arXiv

Convolution Neural Network Architecture Learning for Remote Sensing Scene Classification

Remote sensing image scene classification is a fundamental but challenging task in understanding remote sensing images. Recently, deep learning-based methods, especially convolutional neural network-based (CNN-based) methods have shown enormous potential to understand remote sensing images. CNN-based methods meet with success by utilizing features learned from data rather than features designed manually. The feature-learning procedure of CNN largely depends on the architecture of CNN. However, most of the architectures of CNN used for remote sensing scene classification are still designed by hand which demands a considerable amount of architecture engineering skills and domain knowledge, and it may not play CNN's maximum potential on a special dataset. In this paper, we proposed an automatically architecture learning procedure for remote sensing scene classification. We designed a parameters space in which every set of parameters represents a certain architecture of CNN (i.e., some parameters represent the type of operators used in the architecture such as convolution, pooling, no connection or identity, and the others represent the way how these operators connect). To discover the optimal set of parameters for a given dataset, we introduced a learning strategy which can allow efficient search in the architecture space by means of gradient descent. An architecture generator finally maps the set of parameters into the CNN used in our experiments.

preprint2020arXiv

Deep Fusion of Local and Non-Local Features for Precision Landslide Recognition

Precision mapping of landslide inventory is crucial for hazard mitigation. Most landslides generally co-exist with other confusing geological features, and the presence of such areas can only be inferred unambiguously at a large scale. In addition, local information is also important for the preservation of object boundaries. Aiming to solve this problem, this paper proposes an effective approach to fuse both local and non-local features to surmount the contextual problem. Built upon the U-Net architecture that is widely adopted in the remote sensing community, we utilize two additional modules. The first one uses dilated convolution and the corresponding atrous spatial pyramid pooling, which enlarged the receptive field without sacrificing spatial resolution or increasing memory usage. The second uses a scale attention mechanism to guide the up-sampling of features from the coarse level by a learned weight map. In implementation, the computational overhead against the original U-Net was only a few convolutional layers. Experimental evaluations revealed that the proposed method outperformed state-of-the-art general-purpose semantic segmentation approaches. Furthermore, ablation studies have shown that the two models afforded extensive enhancements in landslide-recognition performance.

preprint2020arXiv

RSI-CB: A Large Scale Remote Sensing Image Classification Benchmark via Crowdsource Data

In recent years, deep convolutional neural network (DCNN) has seen a breakthrough progress in natural image recognition because of three points: universal approximation ability via DCNN, large-scale database (such as ImageNet), and supercomputing ability powered by GPU. The remote sensing field is still lacking a large-scale benchmark compared to ImageNet and Place2. In this paper, we propose a remote sensing image classification benchmark (RSI-CB) based on massive, scalable, and diverse crowdsource data. Using crowdsource data, such as Open Street Map (OSM) data, ground objects in remote sensing images can be annotated effectively by points of interest, vector data from OSM, or other crowdsource data. The annotated images can be used in remote sensing image classification tasks. Based on this method, we construct a worldwide large-scale benchmark for remote sensing image classification. This benchmark has two sub-datasets with 256 by 256 and 128 by 128 sizes because different DCNNs require different image sizes. The former contains 6 categories with 35 subclasses of more than 24,000 images. The latter contains 6 categories with 45 subclasses of more than 36,000 images. This classification system of ground objects is defined according to the national standard of land-use classification in China and is inspired by the hierarchy mechanism of ImageNet. Finally, we conduct many experiments to compare RSI-CB with the SAT-4, SAT-6, and UC-Merced datasets on handcrafted features, such as scale-invariant feature transform, color histogram, local binary patterns, and GIST, and classical DCNN models, such as AlexNet, VGGNet, GoogLeNet, and ResNet.

preprint2020arXiv

SCAttNet: Semantic Segmentation Network with Spatial and Channel Attention Mechanism for High-Resolution Remote Sensing Images

High-resolution remote sensing images (HRRSIs) contain substantial ground object information, such as texture, shape, and spatial location. Semantic segmentation, which is an important task for element extraction, has been widely used in processing mass HRRSIs. However, HRRSIs often exhibit large intraclass variance and small interclass variance due to the diversity and complexity of ground objects, thereby bringing great challenges to a semantic segmentation task. In this paper, we propose a new end-to-end semantic segmentation network, which integrates lightweight spatial and channel attention modules that can refine features adaptively. We compare our method with several classic methods on the ISPRS Vaihingen and Potsdam datasets. Experimental results show that our method can achieve better semantic segmentation results. The source codes are available at https://github.com/lehaifeng/SCAttNet.

preprint2020arXiv

Urban Traffic Flow Forecast Based on FastGCRNN

Traffic forecasting is an important prerequisite for the application of intelligent transportation systems in urban traffic networks. The existing works adopted RNN and CNN/GCN, among which GCRN is the state of art work, to characterize the temporal and spatial correlation of traffic flows. However, it is hard to apply GCRN to the large scale road networks due to high computational complexity. To address this problem, we propose to abstract the road network into a geometric graph and build a Fast Graph Convolution Recurrent Neural Network (FastGCRNN) to model the spatial-temporal dependencies of traffic flow. Specifically, We use FastGCN unit to efficiently capture the topological relationship between the roads and the surrounding roads in the graph with reducing the computational complexity through importance sampling, combine GRU unit to capture the temporal dependency of traffic flow, and embed the spatiotemporal features into Seq2Seq based on the Encoder-Decoder framework. Experiments on large-scale traffic data sets illustrate that the proposed method can greatly reduce computational complexity and memory consumption while maintaining relatively high accuracy.

preprint2020arXiv

Volume Preserving Image Segmentation with Entropic Regularization Optimal Transport and Its Applications in Deep Learning

Image segmentation with a volume constraint is an important prior for many real applications. In this work, we present a novel volume preserving image segmentation algorithm, which is based on the framework of entropic regularized optimal transport theory. The classical Total Variation (TV) regularizer and volume preserving are integrated into a regularized optimal transport model, and the volume and classification constraints can be regarded as two measures preserving constraints in the optimal transport problem. By studying the dual problem, we develop a simple and efficient dual algorithm for our model. Moreover, to be different from many variational based image segmentation algorithms, the proposed algorithm can be directly unrolled to a new Volume Preserving and TV regularized softmax (VPTV-softmax) layer for semantic segmentation in the popular Deep Convolution Neural Network (DCNN). The experiment results show that our proposed model is very competitive and can improve the performance of many semantic segmentation nets such as the popular U-net.

preprint2016arXiv

On the Neuron Response Features of Convolutional Neural Networks for Remote Sensing Image

In this paper, some patterns of the Neuron Response of deep Convolutional Neural Networks were observed.

preprint2016arXiv

Principal eigenvector and spectral radius of uniform hypergraphs

In this paper, we give some bounds for principal eigenvector and spectral radius of connected uniform hypergraphs in terms of vertex degrees, the diameter, and the number of vertices and edges.

preprint2016arXiv

Queue Theory based Response Time Analyses for Geo-Information Processing Chain

Typical characteristics of remote sensing applications are concurrent tasks, such as those found in disaster rapid response. The existing composition approach to geographical information processing service chain, searches for an optimisation solution and is what can be deemed a "selfish" way. This way leads to problems of conflict amongst concurrent tasks and decreases the performance of all service chains. In this study, a non-cooperative game-based mathematical model to analyse the competitive relationships between tasks, is proposed. A best response function is used, to assure each task maintains utility optimisation by considering composition strategies of other tasks and quantifying conflicts between tasks. Based on this, an iterative algorithm that converges to Nash equilibrium is presented, the aim being to provide good convergence and maximise the utilisation of all tasks under concurrent task conditions. Theoretical analyses and experiments showed that the newly proposed method, when compared to existing service composition methods, has better practical utility in all tasks.

preprint2015arXiv

Urban spatial-temporal activity structures: a New Approach to Inferring the Intra-urban Functional Regions via Social Media Check-In Data

Most existing literature focuses on the exterior temporal rhythm of human movement to infer the functional regions in a city, but they neglects the underlying interdependence between the functional regions and human activities which uncovers more detailed characteristics of regions. In this research, we proposed a novel model based on the low rank approximation (LRA) to detect the functional regions using the data from about 15 million check-in records during a yearlong period in Shanghai, China. We find a series of latent structures, called urban spatial-temporal activity structure (USTAS). While interpreting these structures, a series of outstanding underlying associations between the spatial and temporal activity patterns can be found. Moreover, we can not only reproduce the observed data with a lower dimensional representative but also simultaneously project both the spatial and temporal activity patterns in the same coordinate system. By utilizing the K-means clustering algorithm, five significant types of clusters which are directly annotated with a corresponding combination of temporal activities can be obtained. This provides a clear picture of how the groups of regions are associated with different activities at different time of day. Besides the commercial and transportation dominant area, we also detect two kinds of residential areas, the developed residential areas and the developing residential areas. We further verify the spatial distribution of these clusters in the view of urban form analysis. The results shows a high consistency with the government planning from the same periods, indicating our model is applicable for inferring the functional regions via social media check-in data, and can benefit a wide range of fields, such as urban planning, public services and location-based recommender systems and other purposes.

preprint2014arXiv

A second-order spin-flop transition in collinear two-sublattice antiferromagnets

Identifying the nature of a spin-flop (SFO) transition, first- or second-order (FO or SO), remains a major challenge in condensed-matter physics due to the technically undistinguishable effect of misalignment between applied-field direction and the relevant antiferromagnetic (AFM) easy axis. A classical SFO transition is believed to be of FO in character. Here a mean-field theoretical calculation endowed with AFM exchange interaction (\emph{J}), easy axis anisotropy ($γ$), uniaxial single-ion exchange anisotropy (\emph{D}), and Zeeman coupling to a magnetic field parallel to the easy axis unambiguously reveals that a SO SFO transition indeed exists by virtue of its relatively lower free energy. Their equilibrium phase conditions are found to be: $D \geq 0$ (FO); $-\frac{1}{2} γ< D < 0$ (SO). Compared numerically to the associated AFM and spin-flip phases, the deduced SO SFO transition results from a negative single-ion anisotropy which is restricted to a certain range by the anisotropic exchange interaction

preprint2014arXiv

An adaptive Simulated Annealing-based satellite observation scheduling method combined with a dynamic task clustering strategy

Efficient scheduling is of great significance to rationally make use of scarce satellite resources. Task clustering has been demonstrated to realize an effective strategy to improve the efficiency of satellite scheduling. However, the previous task clustering strategy is static. That is, it is integrated into the scheduling in a two-phase manner rather than in a dynamic fashion, without expressing its full potential in improving the satellite scheduling performance. In this study, we present an adaptive Simulated Annealing based scheduling algorithm aggregated with a dynamic task clustering strategy (or ASA-DTC for short) for satellite observation scheduling problems (SOSPs). First, we develop a formal model for the scheduling of Earth observing satellites. Second, we analyze the related constraints involved in the observation task clustering process. Thirdly, we detail an implementation of the dynamic task clustering strategy and the adaptive Simulated Annealing algorithm. The adaptive Simulated Annealing algorithm is efficient, with the endowment of some sophisticated mechanisms, i.e. adaptive temperature control, tabu-list based revisiting avoidance mechanism, and intelligent combination of neighborhood structures. Finally, we report on experimental simulation studies to demonstrate the competitive performance of ASA-DTC. Moreover, we show that ASA-DTC is especially effective when SOSPs contain a large number of targets or these targets are densely distributed in a certain area.

preprint2014arXiv

Nonmagnetic ordering state of single-crystal SrTm$_2$O$_4$: A polarized and unpolarized neutron-scattering study

Our single-crystal polarized neutron scattering at 65 mK and powder unpolarized neutron diffraction at 0.5 K show no evidence for a long-range magnetic order and even detect no sign of diffuse magnetic neutron scattering in single-crystal SrTm2O4. The data refinements reveal that the two TmO6 octahedral distortion modes are the same as those of the TbO6 octahedra in SrTb2O4, i.e., one distortion is stronger than the other one especially at low temperatures, which is attributed probably to different crystal electric fields for the two inequivalent octahedra. Consequently, we conclude that SrTm2O4 has no magnetic order, neither long-ranged nor short-ranged, even down to 65 mK. Therefore, SrTm2O4 is a different compound from its brethren in the new family of frustrated SrRE2O4 (RE = Gd, Tb, Dy, Ho, Er, and Yb) magnets. We propose that crystal field anisotropy may dominate over weak dipolar spin interactions in SrTm2O4, leading to a virtually nonmagnetic ordering state.

preprint2013arXiv

Magnetic structures and the Ce-Fe coupling induced Fe spin reorientation in CeFeAsO single crystal

Neutron and synchrotron resonant X-ray magnetic scattering (RXMS) complemented by heat capacity and resistivity measurements reveal the evolution of the magnetic structures of Fe and Ce sublattices in single crystal CeFeAsO. The RXMS of magnetic reflections at the Ce $L_{\rm II}$-edge shows a magnetic transition that is specific to the Ce antiferromagnetic long-range ordering at $T_\texttt{Ce}\approx$ 4 K with short-range Ce ordering above $T_\texttt{Ce}$, whereas neutron diffraction measurements of a few magnetic reflections indicate a transition at $T^{*}\approx$ 12 K with unusual order parameter. Detailed order parameter measurements on several magnetic reflections by neutrons show a weak anomaly at 4 K which we associate with the Ce ordering. The successive transitions at $T_\texttt{Ce}$ and $T^{*}$ can also be clearly identified by two anomalies in heat capacity and resistivity measurements. The higher transition temperature at $T^{*}\approx$ 12 K is mainly ascribed to Fe spin reorientation transition, below which Fe spins rotate uniformly and gradually in the \textit{ab} plane. The Fe spin reorientation transition and short-range Ce ordering above $T_\texttt{Ce}$ reflect the strong Fe-Ce couplings prior to long-range ordering of the Ce. The evolution of the intricate magnetic structures in CeFeAsO going through $T^{*}$ and $T_\texttt{Ce}$ is proposed.

preprint2012arXiv

Derivation of exact master equation with stochastic description: Models in quantum optics

The methodology of stochastic description for dissipation, a generic scheme to decouple the interaction between two subsystems, is applied to the study of dissipative dynamics in quantum optics. It is shown that the influence of the coupled thermal or vacuum field on the quantum mode can be exactly represented by the induced stochastic fields. The quantum mode thereby satisfies a stochastic differential equation and dissipation effect due to the coupling with the environment is obtained through statistical averaging. Within the framework of stochastic description, it is demonstrated how to derive the master equation for a single optical mode interacting with the bosonic bath. A numerical algorithm for solving the master equation in which the coefficients are determined by a set of integral equations is discussed and a comparison with the known results is displayed. The derivation of the master equation for the spontaneous decay of two-state atoms in the vacuum is also presented.

preprint2011arXiv

Derivation of exact master equation with stochastic description: Dissipative harmonic oscillator

A systematic procedure for deriving the master equation of a dissipative system is reported in the framework of stochastic description. For the Caldeira-Leggett model of the harmonic-oscillator bath, a detailed and elementary derivation of the bath-induced stochastic field is presented. The dynamics of the system is thereby fully described by a stochastic differential equation and the desired master equation would be acquired with statistical averaging. It is shown that the existence of a closed-form master equation depends on the specificity of the system as well as the feature of the dissipation characterized by the spectral density function. For a dissipative harmonic oscillator it is observed that the correlation between the stochastic field due to the bath and the system can be decoupled and the master equation naturally comes out. Such an equation possesses the Lindblad form in which time dependent coefficients are determined by a set of integral equations. It is proved that the obtained master equation is equivalent to the well-known Hu-Paz-Zhang equation based on the path integral technique. The procedure is also used to obtain the master equation of a dissipative harmonic oscillator in time-dependent fields.

preprint2010arXiv

Magnetic and lattice coupling in single-crystal SrFe$_2$As$_2$: A neutron scattering study

A detailed elastic neutron scattering study of the structural and magnetic phase transitions in single-crystal SrFe$_2$As$_2$ reveals that the orthorhombic (O)-tetragonal (T) and the antiferromagnetic transitions coincide at $T_\texttt{O}$ = $T_\texttt{N}$ = (201.5 $\pm$ 0.25) K. The observation of coexisting O-T phases over a finite temperature range at the transition and the sudden onset of the O distortion provide strong evidences that the structural transition is first order. The simultaneous appearance and disappearance within 0.5 K upon cooling and within 0.25 K upon warming, respectively, indicate that the magnetic and structural transitions are intimately coupled. We find that the hysteresis in the transition temperature extends over a 1-2 K range. Based on the observation of a remnant orthorhombic phase at temperatures higher than \emph{T}$_\texttt{O}$, we suggest that the T-O transition may be an order-disorder transition.

preprint2010arXiv

The magnetic form factor of iron in SrFe2As2

The iron magnetic form factor in SrFe2As2 has been determined by neutron diffraction and by density functional theory (DFT). As noted previously, the magnitude of the calculated moment using DFT is sensitive to the Fe-As distance. However, the shape of the calculated form factor is practically insensitive to the Fe-As distance, and further we show that the form factor closely resembles that of bcc iron, and agrees well with experiment. The spin density exhibits some anisotropy due to geometry and As hybridization.

preprint2008arXiv

Three-Dimensional Grain Boundary Spectroscopy in Transparent High Power Ceramic Laser Materials

Using confocal Raman and fluorescence spectroscopic imaging in 3-dimensions, we show direct evidence for Nd3+-Nd3+ interactions across grain boundaries (GBs) in Nd3+:YAG laser ceramics. It is clearly shown that Nd3+ segregation takes place at GBs leading to self-fluorescence quenching which affects a volume fraction as high as 20%. In addition, we show a clear trend of increasing spatial inhomogeneities in Nd3+ concentration when the doping levels exceeds 3 at%, which is not detected by standard spectrometry techniques. These results could point the way to further improvements in what is already an impressive class of ceramic laser materials.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2605.09352:author:7:haifeng-li

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.13391:author:7:haifeng-li

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.19301:author:11:haifeng-li

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.17942:author:3:haifeng-li

Imported May 20, 2026Synced May 20, 2026

7 works

Chao Tao

Researcher

Chao Tao contributes to research discovery and scholarly infrastructure.

Open to collaborate

5 works

Qing Zhu

Researcher

Qing Zhu contributes to research discovery and scholarly infrastructure.

Open to collaborate

4 works

Jian Peng

Researcher

Jian Peng contributes to research discovery and scholarly infrastructure.

Open to collaborate

4 works

Jiawei Zhu

Researcher

Jiawei Zhu contributes to research discovery and scholarly infrastructure.

Open to collaborate

Haifeng Li

What is connected

Connect this record

See the researcher in context

Building this map preview

38 published item(s)

iGSP:Implicit Gradient Subspace Projection for Efficient Continual Learning of Vision-Language Models

RS-Claw: Progressive Active Tool Exploration via Hierarchical Skill Trees for Remote Sensing Agents

The Wittgensteinian Representation Hypothesis: Is Language the Attractor of Multimodal Convergence?

UAVFF3D: A Geometry-Aware Benchmark for Feed-Forward UAV 3D Reconstruction

A Data-driven Adversarial Examples Recognition Framework via Adversarial Feature Genome

Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds

Efficient ILC analysis on polarization maps after EB leakage correction

Global and Local Contrastive Self-Supervised Learning for Semantic Segmentation of HR Remote Sensing Images

Image Segmentation with Adaptive Spatial Priors from Joint Registration

KST-GCN: A Knowledge-Driven Spatial-Temporal Graph Convolutional Network for Traffic Forecasting

TOV: The Original Vision Model for Optical Remote Sensing Image Understanding via Self-supervised Learning

Weighted Simultaneous Algebra Reconstruction Technique (wSART) for Additive Light Field Synthesis

Crystal field effects in the zig-zag chain compound SrTm$_2$O$_4$

Depth-Enhanced Feature Pyramid Network for Occlusion-Aware Verification of Buildings from Oblique Images

Generating Multi-scale Maps from Remote Sensing Images via Series Generative Adversarial Networks

Graph Information Vanishing Phenomenon inImplicit Graph Neural Networks

Overcoming Long-term Catastrophic Forgetting through Adversarial Neural Pruning and Synaptic Consolidation

A3T-GCN: Attention Temporal Graph Convolutional Network for Traffic Forecasting

Adversarial Example in Remote Sensing Image Recognition

Convolution Neural Network Architecture Learning for Remote Sensing Scene Classification

Deep Fusion of Local and Non-Local Features for Precision Landslide Recognition

RSI-CB: A Large Scale Remote Sensing Image Classification Benchmark via Crowdsource Data

SCAttNet: Semantic Segmentation Network with Spatial and Channel Attention Mechanism for High-Resolution Remote Sensing Images

Urban Traffic Flow Forecast Based on FastGCRNN

Volume Preserving Image Segmentation with Entropic Regularization Optimal Transport and Its Applications in Deep Learning

On the Neuron Response Features of Convolutional Neural Networks for Remote Sensing Image

Principal eigenvector and spectral radius of uniform hypergraphs

Queue Theory based Response Time Analyses for Geo-Information Processing Chain

Urban spatial-temporal activity structures: a New Approach to Inferring the Intra-urban Functional Regions via Social Media Check-In Data

A second-order spin-flop transition in collinear two-sublattice antiferromagnets

An adaptive Simulated Annealing-based satellite observation scheduling method combined with a dynamic task clustering strategy

Nonmagnetic ordering state of single-crystal SrTm$_2$O$_4$: A polarized and unpolarized neutron-scattering study

Magnetic structures and the Ce-Fe coupling induced Fe spin reorientation in CeFeAsO single crystal

Derivation of exact master equation with stochastic description: Models in quantum optics

Derivation of exact master equation with stochastic description: Dissipative harmonic oscillator

Magnetic and lattice coupling in single-crystal SrFe$_2$As$_2$: A neutron scattering study

The magnetic form factor of iron in SrFe2As2

Three-Dimensional Grain Boundary Spectroscopy in Transparent High Power Ceramic Laser Materials