Researcher profile

Tingfa Xu

Tingfa Xu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

Efficient Hyperspectral Image Reconstruction Using Lightweight Separate Spectral Transformers

Hyperspectral imaging (HSI) is essential across various disciplines for its capacity to capture rich spectral information. However, efficiently reconstructing hyperspectral images from compressive sensing measurements presents significant challenges. To tackle these, we adopt a divide-and-conquer strategy that capitalizes on the unique spectral and spatial characteristics of hyperspectral images. We introduce the Lightweight Separate Spectral Transformer (LSST), an innovative architecture tailored for efficient hyperspectral image reconstruction. This architecture consists of Separate Spectral Transformer Blocks (SSTB) for modeling spectral relationships and Lightweight Spatial Convolution Blocks (LSCB) for spatial processing. The SSTB employs Grouped Spectral Self-attention and a Spectrum Shuffle operation to effectively manage both local and non-local spectral relationships. Simultaneously, the LSCB utilizes depth-wise separable convolutions and strategic ordering to enhance spatial information processing. Furthermore, we implement the Focal Spectrum Loss, a novel loss weighting mechanism that dynamically adjusts during training to improve reconstruction across spectrally complex bands. Extensive testing demonstrates that our LSST achieves superior performance while requiring fewer FLOPs and parameters, underscoring its efficiency and effectiveness. The source code is available at: https://github.com/wcz1124/LSST.

preprint2026arXiv

HyperCOD: The First Challenging Benchmark and Baseline for Hyperspectral Camouflaged Object Detection

RGB-based camouflaged object detection struggles in real-world scenarios where color and texture cues are ambiguous. While hyperspectral image offers a powerful alternative by capturing fine-grained spectral signatures, progress in hyperspectral camouflaged object detection (HCOD) has been critically hampered by the absence of a dedicated, large-scale benchmark. To spur innovation, we introduce HyperCOD, the first challenging benchmark for HCOD. Comprising 350 high-resolution hyperspectral images, It features complex real-world scenarios with minimal objects, intricate shapes, severe occlusions, and dynamic lighting to challenge current models. The advent of foundation models like the Segment Anything Model (SAM) presents a compelling opportunity. To adapt the Segment Anything Model (SAM) for HCOD, we propose HyperSpectral Camouflage-aware SAM (HSC-SAM). HSC-SAM ingeniously reformulates the hyperspectral image by decoupling it into a spatial map fed to SAM's image encoder and a spectral saliency map that serves as an adaptive prompt. This translation effectively bridges the modality gap. Extensive experiments show that HSC-SAM sets a new state-of-the-art on HyperCOD and generalizes robustly to other public HSI datasets. The HyperCOD dataset and our HSC-SAM baseline provide a robust foundation to foster future research in this emerging area.

preprint2026arXiv

Hyperspectral Image Classification via Efficient Global Spectral Supertoken Clustering

Hyperspectral image classification demands spatially coherent predictions and precise boundary delineation. Yet prevailing superpixel-based methods face an inherent contradiction: clustering aggregates similar pixels into regions, but the subsequent classifier operates pixel-wise, undermining regional consistency. Consequently, existing approaches do not guarantee region-level, boundary-aligned classification. To address this limitation, we propose the Dual-stage Spectrum-Constrained Clustering-based Classifier (DSCC), an end-to-end framework that explicitly decouples clustering from classification by first grouping spectral similar and spatially proximate pixels into spectral supertokens and then performing token-level prediction. At its core, DSCC computes an image-level multi-criteria feature distance between pixels and centers, followed by a locality-aware assignment regularization, enabling the generation of boundary-preserving spectral supertokens. A density-isolation based center selection further yields representative, well-separated centers, reducing redundancy and improving robustness to scale variation. To accommodate mixed land-cover compositions within each token, we introduce a soft-label scheme that encodes class proportions and improves robustness for mixed-class tokens. DSCC attains a CF1 of 0.728 at 197.75 FPS on the WHU-OHS dataset, offering a superior accuracy-efficiency trade-off compared with state-of-the-art methods. Extensive experiments further validate the effectiveness and generality of the proposed dual-stage paradigm for hyperspectral image classification. The source code is available at https://github.com/laprf/DSCC.

preprint2022arXiv

RTNet: Relation Transformer Network for Diabetic Retinopathy Multi-lesion Segmentation

Automatic diabetic retinopathy (DR) lesions segmentation makes great sense of assisting ophthalmologists in diagnosis. Although many researches have been conducted on this task, most prior works paid too much attention to the designs of networks instead of considering the pathological association for lesions. Through investigating the pathogenic causes of DR lesions in advance, we found that certain lesions are closed to specific vessels and present relative patterns to each other. Motivated by the observation, we propose a relation transformer block (RTB) to incorporate attention mechanisms at two main levels: a self-attention transformer exploits global dependencies among lesion features, while a cross-attention transformer allows interactions between lesion and vessel features by integrating valuable vascular information to alleviate ambiguity in lesion detection caused by complex fundus structures. In addition, to capture the small lesion patterns first, we propose a global transformer block (GTB) which preserves detailed information in deep network. By integrating the above blocks of dual-branches, our network segments the four kinds of lesions simultaneously. Comprehensive experiments on IDRiD and DDR datasets well demonstrate the superiority of our approach, which achieves competitive performance compared to state-of-the-arts.

preprint2021arXiv

Delving into Sample Loss Curve to Embrace Noisy and Imbalanced Data

Corrupted labels and class imbalance are commonly encountered in practically collected training data, which easily leads to over-fitting of deep neural networks (DNNs). Existing approaches alleviate these issues by adopting a sample re-weighting strategy, which is to re-weight sample by designing weighting function. However, it is only applicable for training data containing only either one type of data biases. In practice, however, biased samples with corrupted labels and of tailed classes commonly co-exist in training data. How to handle them simultaneously is a key but under-explored problem. In this paper, we find that these two types of biased samples, though have similar transient loss, have distinguishable trend and characteristics in loss curves, which could provide valuable priors for sample weight assignment. Motivated by this, we delve into the loss curves and propose a novel probe-and-allocate training strategy: In the probing stage, we train the network on the whole biased training data without intervention, and record the loss curve of each sample as an additional attribute; In the allocating stage, we feed the resulting attribute to a newly designed curve-perception network, named CurveNet, to learn to identify the bias type of each sample and assign proper weights through meta-learning adaptively. The training speed of meta learning also blocks its application. To solve it, we propose a method named skip layer meta optimization (SLMO) to accelerate training speed by skipping the bottom layers. Extensive synthetic and real experiments well validate the proposed method, which achieves state-of-the-art performance on multiple challenging benchmarks.

preprint2020arXiv

Attribute-conditioned Layout GAN for Automatic Graphic Design

Modeling layout is an important first step for graphic design. Recently, methods for generating graphic layouts have progressed, particularly with Generative Adversarial Networks (GANs). However, the problem of specifying the locations and sizes of design elements usually involves constraints with respect to element attributes, such as area, aspect ratio and reading-order. Automating attribute conditional graphic layouts remains a complex and unsolved problem. In this paper, we introduce Attribute-conditioned Layout GAN to incorporate the attributes of design elements for graphic layout generation by forcing both the generator and the discriminator to meet attribute conditions. Due to the complexity of graphic designs, we further propose an element dropout method to make the discriminator look at partial lists of elements and learn their local patterns. In addition, we introduce various loss designs following different design principles for layout optimization. We demonstrate that the proposed method can synthesize graphic layouts conditioned on different element attributes. It can also adjust well-designed layouts to new sizes while retaining elements' original reading-orders. The effectiveness of our method is validated through a user study.

preprint2020arXiv

Exploiting Semantics for Face Image Deblurring

In this paper, we propose an effective and efficient face deblurring algorithm by exploiting semantic cues via deep convolutional neural networks. As the human faces are highly structured and share unified facial components (e.g., eyes and mouths), such semantic information provides a strong prior for restoration. We incorporate face semantic labels as input priors and propose an adaptive structural loss to regularize facial local structures within an end-to-end deep convolutional neural network. Specifically, we first use a coarse deblurring network to reduce the motion blur on the input face image. We then adopt a parsing network to extract the semantic features from the coarse deblurred image. Finally, the fine deblurring network utilizes the semantic information to restore a clear face image. We train the network with perceptual and adversarial losses to generate photo-realistic results. The proposed method restores sharp images with more accurate facial features and details. Quantitative and qualitative evaluations demonstrate that the proposed face deblurring algorithm performs favorably against the state-of-the-art methods in terms of restoration quality, face recognition and execution speed.

preprint2020arXiv

Human-Aware Motion Deblurring

This paper proposes a human-aware deblurring model that disentangles the motion blur between foreground (FG) humans and background (BG). The proposed model is based on a triple-branch encoder-decoder architecture. The first two branches are learned for sharpening FG humans and BG details, respectively; while the third one produces global, harmonious results by comprehensively fusing multi-scale deblurring information from the two domains. The proposed model is further endowed with a supervised, human-aware attention mechanism in an end-to-end fashion. It learns a soft mask that encodes FG human information and explicitly drives the FG/BG decoder-branches to focus on their specific domains. To further benefit the research towards Human-aware Image Deblurring, we introduce a large-scale dataset, named HIDE, which consists of 8,422 blurry and sharp image pairs with 65,784 densely annotated FG human bounding boxes. HIDE is specifically built to span a broad range of scenes, human object sizes, motion patterns, and background complexities. Extensive experiments on public benchmarks and our dataset demonstrate that our model performs favorably against the state-of-the-art motion deblurring methods, especially in capturing semantic details.