Researcher profile

Fumin Shen

Fumin Shen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Multimodal Learning on Low-Quality Data with Conformal Predictive Self-Calibration

Multimodal learning often grapples with the challenge of low-quality data, which predominantly manifests as two facets: modality imbalance and noisy corruption. While these issues are often studied in isolation, we argue that they share a common root in the predictive uncertainty towards the reliability of individual modalities and instances during learning. In this paper, we propose a unified framework, termed Conformal Predictive Self-Calibration (CPSC), which leverages conformal prediction to equip the model with the ability to perform self-guided calibration on-the-fly. The core of our proposed CPSC lies in a novel self-calibrating training loop that seamlessly integrates two key modules: (1) Representation Self-Calibration, which decomposes unimodal features into components, and selectively fuses the most robust ones identified by a conformal predictor to enhance feature resilience. (2) Gradient Self-Calibration, which recalibrates the gradient flow during backpropagation based on instance-wise reliability scores, steering the optimization towards more trustworthy directions. Furthermore, we also devise a self-update strategy for the conformal predictor to ensure the entire system co-evolves consistently throughout the training process. Extensive experiments on six benchmark datasets under both imbalanced and noisy settings demonstrate that our CPSC framework consistently outperforms existing state-of-the-art methods. Our code is available at https://github.com/XunCHN/CPSC.

preprint2022arXiv

Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation

Optical flow is an easily conceived and precious cue for advancing unsupervised video object segmentation (UVOS). Most of the previous methods directly extract and fuse the motion and appearance features for segmenting target objects in the UVOS setting. However, optical flow is intrinsically an instantaneous velocity of all pixels among consecutive frames, thus making the motion features not aligned well with the primary objects among the corresponding frames. To solve the above challenge, we propose a concise, practical, and efficient architecture for appearance and motion feature alignment, dubbed hierarchical feature alignment network (HFAN). Specifically, the key merits in HFAN are the sequential Feature AlignMent (FAM) module and the Feature AdaptaTion (FAT) module, which are leveraged for processing the appearance and motion features hierarchically. FAM is capable of aligning both appearance and motion features with the primary object semantic representations, respectively. Further, FAT is explicitly designed for the adaptive fusion of appearance and motion features to achieve a desirable trade-off between cross-modal features. Extensive experiments demonstrate the effectiveness of the proposed HFAN, which reaches a new state-of-the-art performance on DAVIS-16, achieving 88.7 $\mathcal{J}\&\mathcal{F}$ Mean, i.e., a relative improvement of 3.5% over the best published result.

preprint2022arXiv

Saliency Guided Inter- and Intra-Class Relation Constraints for Weakly Supervised Semantic Segmentation

Weakly supervised semantic segmentation with only image-level labels aims to reduce annotation costs for the segmentation task. Existing approaches generally leverage class activation maps (CAMs) to locate the object regions for pseudo label generation. However, CAMs can only discover the most discriminative parts of objects, thus leading to inferior pixel-level pseudo labels. To address this issue, we propose a saliency guided Inter- and Intra-Class Relation Constrained (I$^2$CRC) framework to assist the expansion of the activated object regions in CAMs. Specifically, we propose a saliency guided class-agnostic distance module to pull the intra-category features closer by aligning features to their class prototypes. Further, we propose a class-specific distance module to push the inter-class features apart and encourage the object region to have a higher activation than the background. Besides strengthening the capability of the classification network to activate more integral object regions in CAMs, we also introduce an object guided label refinement module to take a full use of both the segmentation prediction and the initial labels for obtaining superior pseudo-labels. Extensive experiments on PASCAL VOC 2012 and COCO datasets demonstrate well the effectiveness of I$^2$CRC over other state-of-the-art counterparts. The source codes, models, and data have been made available at \url{https://github.com/NUST-Machine-Intelligence-Laboratory/I2CRC}.

preprint2021arXiv

Exploiting Web Images for Fine-Grained Visual Recognition by Eliminating Noisy Samples and Utilizing Hard Ones

Labeling objects at a subordinate level typically requires expert knowledge, which is not always available when using random annotators. As such, learning directly from web images for fine-grained recognition has attracted broad attention. However, the presence of label noise and hard examples in web images are two obstacles for training robust fine-grained recognition models. Therefore, in this paper, we propose a novel approach for removing irrelevant samples from real-world web images during training, while employing useful hard examples to update the network. Thus, our approach can alleviate the harmful effects of irrelevant noisy web images and hard examples to achieve better performance. Extensive experiments on three commonly used fine-grained datasets demonstrate that our approach is far superior to current state-of-the-art web-supervised methods.

preprint2021arXiv

Semantically Meaningful Class Prototype Learning for One-Shot Image Semantic Segmentation

One-shot semantic image segmentation aims to segment the object regions for the novel class with only one annotated image. Recent works adopt the episodic training strategy to mimic the expected situation at testing time. However, these existing approaches simulate the test conditions too strictly during the training process, and thus cannot make full use of the given label information. Besides, these approaches mainly focus on the foreground-background target class segmentation setting. They only utilize binary mask labels for training. In this paper, we propose to leverage the multi-class label information during the episodic training. It will encourage the network to generate more semantically meaningful features for each category. After integrating the target class cues into the query features, we then propose a pyramid feature fusion module to mine the fused features for the final classifier. Furthermore, to take more advantage of the support image-mask pair, we propose a self-prototype guidance branch to support image segmentation. It can constrain the network for generating more compact features and a robust prototype for each semantic class. For inference, we propose a fused prototype guidance branch for the segmentation of the query image. Specifically, we leverage the prediction of the query image to extract the pseudo-prototype and combine it with the initial prototype. Then we utilize the fused prototype to guide the final segmentation of the query image. Extensive experiments demonstrate the superiority of our proposed approach.

preprint2020arXiv

Auto-Encoding Twin-Bottleneck Hashing

Conventional unsupervised hashing methods usually take advantage of similarity graphs, which are either pre-computed in the high-dimensional space or obtained from random anchor points. On the one hand, existing methods uncouple the procedures of hash function learning and graph construction. On the other hand, graphs empirically built upon original data could introduce biased prior knowledge of data relevance, leading to sub-optimal retrieval performance. In this paper, we tackle the above problems by proposing an efficient and adaptive code-driven graph, which is updated by decoding in the context of an auto-encoder. Specifically, we introduce into our framework twin bottlenecks (i.e., latent variables) that exchange crucial information collaboratively. One bottleneck (i.e., binary codes) conveys the high-level intrinsic data structure captured by the code-driven graph to the other (i.e., continuous variables for low-level detail information), which in turn propagates the updated network feedback for the encoder to learn more discriminative binary codes. The auto-encoding learning objective literally rewards the code-driven graph to learn an optimal encoder. Moreover, the proposed model can be simply optimized by gradient descent without violating the binary constraints. Experiments on benchmarked datasets clearly show the superiority of our framework over the state-of-the-art hashing methods. Our source code can be found at https://github.com/ymcidence/TBH.

preprint2020arXiv

Collaborative Learning for Extremely Low Bit Asymmetric Hashing

Hashing techniques are in great demand for a wide range of real-world applications such as image retrieval and network compression. Nevertheless, existing approaches could hardly guarantee a satisfactory performance with the extremely low-bit (e.g., 4-bit) hash codes due to the severe information loss and the shrink of the discrete solution space. In this paper, we propose a novel \textit{Collaborative Learning} strategy that is tailored for generating high-quality low-bit hash codes. The core idea is to jointly distill bit-specific and informative representations for a group of pre-defined code lengths. The learning of short hash codes among the group can benefit from the manifold shared with other long codes, where multiple views from different hash codes provide the supplementary guidance and regularization, making the convergence faster and more stable. To achieve that, an asymmetric hashing framework with two variants of multi-head embedding structures is derived, termed as Multi-head Asymmetric Hashing (MAH), leading to great efficiency of training and querying. Extensive experiments on three benchmark datasets have been conducted to verify the superiority of the proposed MAH, and have shown that the 8-bit hash codes generated by MAH achieve $94.3\%$ of the MAP (Mean Average Precision (MAP)) score on the CIFAR-10 dataset, which significantly surpasses the performance of the 48-bit codes by the state-of-the-arts in image retrieval tasks.