Researcher profile

Yihang Liu

Yihang Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

Evading Visual Aphasia: Contrastive Adaptive Semantic Token Pruning for Vision-Language Models

Are low-attention visual tokens truly redundant in vision-language reasoning? Existing pruning methods often assume so, ranking visual tokens by shallow text-to-image attention and discarding low-scoring patches to accelerate LVLM inference. We show that this scalar criterion is unreliable for compositional reasoning: tokens ignored in early layers can later become essential for resolving secondary objects, spatial relations, and contextual cues. Premature pruning can therefore induce Visual Aphasia, a failure mode in which the model loses visual grounding and falls back on language priors. We introduce COAST (COntrastive Adaptive Semantic Token Pruning), a training-free pruning framework that casts compression as adaptive semantic routing. COAST uses native cross-modal attention to identify query-specific anchors and estimate contextual dispersion via attention entropy, then adapts the retention trade-off between semantic evidence and spatial context. It further uses a contrastive routing score to preserve both anchor-aligned evidence and complementary spatial context. Across seven benchmarks, COAST reduces visual tokens by 77.8% and achieves a 2.15x latency speedup while retaining 98.64% of the original average performance. Beyond a single backbone or compression setting, COAST consistently outperforms strong pruning baselines across token budgets and generalizes across multiple LVLM families, showing that adaptive semantic routing is a robust alternative to one-shot scalar pruning

preprint2026arXiv

PRISM: Iterative Cross-Modal Posterior Refinement for Dynamic Text-Attributed Graphs

Dynamic text-attributed graphs (DyTAGs) provide a powerful framework for modeling evolving systems in which node semantics and time-dependent interactions are tightly coupled. Recently, multimodal learning has emerged as a promising yet underexplored direction for enhancing DyTAG representation learning. However, existing methods typically rely on rigid modality partitions and one-shot fusion strategies, which limit their ability to capture the intrinsic and evolving dependencies between node semantics and interaction behaviors. To address these limitations, we propose \textbf{PRISM}, an iterative cross-modal posterior refinement framework for DyTAG representation learning. PRISM organizes DyTAG information into semantic and behavioral modalities, providing a more intrinsic alternative to carrier-level modality partitions. Instead of fusing the two modalities in a single step, PRISM learns a refinement trajectory that progressively transforms semantic priors into behavior-conditioned posterior states through cross-modal interaction with behavioral evidence. Extensive experiments on DTGB benchmark datasets show that PRISM achieves strong performance on temporal link prediction and destination node retrieval tasks. Further ablation studies validate the effectiveness of semantic--behavioral modeling and iterative posterior refinement.

preprint2023arXiv

Hierarchical Dynamic Masks for Visual Explanation of Neural Networks

Saliency methods generating visual explanatory maps representing the importance of image pixels for model classification is a popular technique for explaining neural network decisions. Hierarchical dynamic masks (HDM), a novel explanatory maps generation method, is proposed in this paper to enhance the granularity and comprehensiveness of saliency maps. First, we suggest the dynamic masks (DM), which enables multiple small-sized benchmark mask vectors to roughly learn the critical information in the image through an optimization method. Then the benchmark mask vectors guide the learning of large-sized auxiliary mask vectors so that their superimposed mask can accurately learn fine-grained pixel importance information and reduce the sensitivity to adversarial perturbations. In addition, we construct the HDM by concatenating DM modules. These DM modules are used to find and fuse the regions of interest in the remaining neural network classification decisions in the mask image in a learning-based way. Since HDM forces DM to perform importance analysis in different areas, it makes the fused saliency map more comprehensive. The proposed method outperformed previous approaches significantly in terms of recognition and localization capabilities when tested on natural and medical datasets.

preprint2022arXiv

MDM: Multiple Dynamic Masks for Visual Explanation of Neural Networks

The Class Activation Map (CAM) lookup of a neural network tells us to which regions the neural network focuses when it makes a decision. In the past, the CAM search method was dependent upon a specific internal module of the network. It has specific constraints on the structure of the neural network. To make the search of CAM have generality and high performance. We propose a learning-based algorithm, namely Multiple Dynamic Masks (MDM). It is based on a public cognition that only active features of a picture related to classification will affect the classification results of the neural network, and other features will hardly affect the classification results of the network. The mask generated by MDM conforms to the above cognition. It trains mask vectors of different sizes by constraining mask values and activating consistency, then it uses stacking masks of different scale to generate CAM that can balance spatial information and semantic information. Comparing the results of MDM with those of the recent advanced CAM search method, the performance of MDM has reached the state of the art results. We applied the MDM method to the interpretable neural networks ProtoPNet and XProtoNet, which improved the performance of model in the explainable prototype search. Finally, we visualized the CAM generation effect of MDM on neural networks of different architectures, verifying the generality of the MDM method.

preprint2020arXiv

High-speed and high-efficiency three-dimensional shape measurement based on Gray-coded light

Fringe projection profilometry has been increasingly sought and applied in dynamic three-dimensional (3D) shape measurement. In this work, a robust and high-efficiency 3D measurement based on Gray-code light is proposed. Unlike the traditional method, a novel tripartite phase unwrapping method is proposed to avoid the jump errors on the boundary of code words, which are mainly caused by the defocusing of the projector and the motion of the tested object. Subsequently, the time-overlapping coding strategy is presented to greatly increase the coding efficiency, decreasing the projected number in each group, e.g. from 7 (3 + 4) to 4 (3 + 1) for one restored 3D frame. Combination of two proposed techniques allows to reconstruct a pixel-wise and unambiguous 3D geometry of dynamic scenes with strong noise using every 4 projected patterns. The presented techniques preserve the high anti-noise ability of Gray-coded-based method while overcoming the drawbacks of jump errors and low coding efficiency. Experiments have demonstrated that the proposed method can achieve the robust and high-efficiency 3D shape measurement of high-speed dynamic scenes even polluted by strong noise.