Researcher profile

Da-Han Wang

Da-Han Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

DAPE: Dynamic Non-uniform Alignment and Progressive Detail Enhancement Techniques for Improving the Performance of Efficient Visual Language Models

In recent years, pre-trained visual-linguistic models have demonstrated tremendous potential, becoming a crucial foundational framework for numerous downstream tasks. However, the information density between text and images is not uniformly distributed. Existing methods often overlook the inherent and dynamic differences in information density and semantic scope between text tags and image blocks. These common uniform alignment strategies result in coarse-grained cross-modal interactions and loss of fine semantic details. Moreover, pursuing finer alignment typically requires substantial computational overhead, limiting practical model deployment. To address this challenge, this paper proposes a novel framework for dynamic cross-modal alignment with continuous detail introduction. First, we design a dynamically adaptive cross-modal matching mechanism that uses a learnable matching function to dynamically assign varying numbers and sizes of image tags to text tags of the same size but different information density, enabling more precise attention interaction. Second, we develop a continuous detail introduction module to progressively incorporate high-resolution visual feature enhancement into the alignment process. Extensive experiments across multiple benchmarks demonstrate significant improvements in the accuracy of various downstream tasks while reducing computational overhead.

preprint2025arXiv

MCITlib: Multimodal Continual Instruction Tuning Library and Benchmark

Continual learning enables AI systems to acquire new knowledge while retaining previously learned information. While traditional unimodal methods have made progress, the rise of Multimodal Large Language Models (MLLMs) brings new challenges in Multimodal Continual Learning (MCL), where models are expected to address both catastrophic forgetting and cross-modal coordination. To advance research in this area, we present MCITlib, a comprehensive library for Multimodal Continual Instruction Tuning. MCITlib currently implements 8 representative algorithms and conducts evaluations on 3 benchmarks under 2 backbone models. The library will be continuously updated to support future developments in MCL. The codebase is released at https://github.com/Ghy0501/MCITlib.

preprint2022arXiv

2020 CATARACTS Semantic Segmentation Challenge

Surgical scene segmentation is essential for anatomy and instrument localization which can be further used to assess tissue-instrument interactions during a surgical procedure. In 2017, the Challenge on Automatic Tool Annotation for cataRACT Surgery (CATARACTS) released 50 cataract surgery videos accompanied by instrument usage annotations. These annotations included frame-level instrument presence information. In 2020, we released pixel-wise semantic annotations for anatomy and instruments for 4670 images sampled from 25 videos of the CATARACTS training set. The 2020 CATARACTS Semantic Segmentation Challenge, which was a sub-challenge of the 2020 MICCAI Endoscopic Vision (EndoVis) Challenge, presented three sub-tasks to assess participating solutions on anatomical structure and instrument segmentation. Their performance was assessed on a hidden test set of 531 images from 10 videos of the CATARACTS test set.

preprint2022arXiv

Stain-Adaptive Self-Supervised Learning for Histopathology Image Analysis

It is commonly recognized that color variations caused by differences in stains is a critical issue for histopathology image analysis. Existing methods adopt color matching, stain separation, stain transfer or the combination of them to alleviate the stain variation problem. In this paper, we propose a novel Stain-Adaptive Self-Supervised Learning(SASSL) method for histopathology image analysis. Our SASSL integrates a domain-adversarial training module into the SSL framework to learn distinctive features that are robust to both various transformations and stain variations. The proposed SASSL is regarded as a general method for domain-invariant feature extraction which can be flexibly combined with arbitrary downstream histopathology image analysis modules (e.g. nuclei/tissue segmentation) by fine-tuning the features for specific downstream tasks. We conducted experiments on publicly available pathological image analysis datasets including the PANDA, BreastPathQ, and CAMELYON16 datasets, achieving the state-of-the-art performance. Experimental results demonstrate that the proposed method can robustly improve the feature extraction ability of the model, and achieve stable performance improvement in downstream tasks.

preprint2020arXiv

Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification

Extreme multi-label text classification (XMTC) is a task for tagging a given text with the most relevant labels from an extremely large label set. We propose a novel deep learning method called APLC-XLNet. Our approach fine-tunes the recently released generalized autoregressive pretrained model (XLNet) to learn a dense representation for the input text. We propose Adaptive Probabilistic Label Clusters (APLC) to approximate the cross entropy loss by exploiting the unbalanced label distribution to form clusters that explicitly reduce the computational time. Our experiments, carried out on five benchmark datasets, show that our approach has achieved new state-of-the-art results on four benchmark datasets. Our source code is available publicly at https://github.com/huiyegit/APLC_XLNet.