Researcher profile

Xueqian Li

Xueqian Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 11 - UnverifiedVerification L1Unclaimed author
1works
0followers
1topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

1 published item(s)

preprint2026arXiv

CVBench: Benchmarking Cross-Video Synergies for Complex Multimodal Reasoning

While multimodal large language models (MLLMs) exhibit strong performance on single-video tasks (e.g., video question answering), their capability for spatiotemporal pattern reasoning across multiple videos remains a critical gap in pattern recognition research. However, this capability is essential for real-world applications, including multi-camera surveillance and cross-video procedural learning. To bridge this gap, we present CVBench, the first diagnostic benchmark designed to assess cross-video relational reasoning rigorously. CVBench comprises 1,000 question-answer pairs spanning three hierarchical tiers: cross-video object association (identifying shared entities), cross-video event association (linking temporal or causal event chains), and cross-video complex reasoning (integrating commonsense and domain knowledge). Built from five domain-diverse video clusters (e.g., sports, life records), the benchmark challenges models to analyze and integrate spatiotemporal patterns from dynamic visual streams. Extensive evaluation of 10+ leading MLLMs (including GPT-4o, Gemini-2.0-flash, Qwen2.5-VL) under zero-shot or chain-of-thought prompting paradigms. Key findings reveal stark performance gaps: even top models, such as GPT-4o, achieve only 63.5% accuracy on causal reasoning tasks, compared to the 91.3% accuracy of human performance. Crucially, our analysis reveals fundamental bottlenecks inherent in current MLLMs architectures, notably deficient inter-video context retention and poor disambiguation of overlapping entities. CVBench establishes a rigorous framework for advancing pattern recognition methodologies in multi-video scenarios, providing architectural insights for next-generation models. The data and evaluation code are available at: https://github.com/Hokhim2/CVBench.