Researcher profile

Liyuan Li

Liyuan Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

Towards Annotation-Free Validation of MLLMs: A Vision-Language Logical Consistency Metric

Dominant accuracy evaluation might reward unwarranted guessing of Large Language Models, and it might not be applicable to novel tasks for model validation without ground-truth (gt) annotation. Based on basic logic principle, we propose a novel framework to evaluate the vision-language logical consistency of MLLMs on both sufficient and necessary cause-effect relations. We define Vision-Language Logical Consistency Metric (VL-LCM) on traditional MC-VQA tests, and recent NaturalBench tests without the need for gt annotation. Through systematic experiments on representative VL benchmark MMMU and recent VL challenges like NaturalBench, we evaluated 11 recent open-source MLLMs from 4 frontier families. Our findings reveal that, despite significant progress of recent MLLMs on accuracy, logical consistency lags behind significantly. Extensive evaluations on the correlations of VL-LCM with metrics on gt, the reliability of LCM, and the relation of VL-LCM with response distribution justify the validity and applicability of VL-LCM even without gt annotation. Our findings suggest that, beyond accuracy, logical consistency could be employed for both accuracy and reliability. VL-LCM can also be employed for MLLM selection, validation, and reliable answer justification in novel tasks without gt annotation.

preprint2022arXiv

Combined CNN Transformer Encoder for Enhanced Fine-grained Human Action Recognition

Fine-grained action recognition is a challenging task in computer vision. As fine-grained datasets have small inter-class variations in spatial and temporal space, fine-grained action recognition model requires good temporal reasoning and discrimination of attribute action semantics. Leveraging on CNN's ability in capturing high level spatial-temporal feature representations and Transformer's modeling efficiency in capturing latent semantics and global dependencies, we investigate two frameworks that combine CNN vision backbone and Transformer Encoder to enhance fine-grained action recognition: 1) a vision-based encoder to learn latent temporal semantics, and 2) a multi-modal video-text cross encoder to exploit additional text input and learn cross association between visual and text semantics. Our experimental results show that both our Transformer encoder frameworks effectively learn latent temporal semantics and cross-modality association, with improved recognition performance over CNN vision model. We achieve new state-of-the-art performance on the FineGym benchmark dataset for both proposed architectures.

preprint2022arXiv

TAILOR: Teaching with Active and Incremental Learning for Object Registration

When deploying a robot to a new task, one often has to train it to detect novel objects, which is time-consuming and labor-intensive. We present TAILOR -- a method and system for object registration with active and incremental learning. When instructed by a human teacher to register an object, TAILOR is able to automatically select viewpoints to capture informative images by actively exploring viewpoints, and employs a fast incremental learning algorithm to learn new objects without potential forgetting of previously learned objects. We demonstrate the effectiveness of our method with a KUKA robot to learn novel objects used in a real-world gearbox assembly task through natural interactions.

preprint2020arXiv

Maximizing spin-orbit torque efficiency of Ta(O)/Py via modulating oxygen-induced interface orbital hybridization

Spin-orbit torques due to interfacial Rashba and spin Hall effects have been widely considered as a potentially more efficient approach than the conventional spin-transfer torque to control the magnetization of ferromagnets. We report a comprehensive study of spin-orbit torque efficiency in Ta(O)/Ni81Fe19 bilayers by tuning low-oxidation of \b{eta}-phase tantalum, and find that the spin Hall angle θDL increases from ~ -0.18 of the pure Ta/Py to the maximum value ~ -0.30 of Ta(O)/Py with 7.8% oxidation. Furthermore, we distinguish the efficiency of the spin-orbit torque generated by the bulk spin Hall effect and by interfacial Rashba effect, respectively, via a series of Py/Cu(0-2 nm)/Ta(O) control experiments. The latter has more than twofold enhancement, and even more significant than that of the former at the optimum oxidation level. Our results indicate that 65% enhancement of the efficiency should be related to the modulation of the interfacial Rashba-like spin-orbit torque due to oxygen-induced orbital hybridization cross the interface. Our results suggest that the modulation of interfacial coupling via oxygen-induced orbital hybridization can be an alternative method to boost the change-spin conversion rate.