Researcher profile

Yuqi Tang

Yuqi Tang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos

Recent video generative models have greatly improved the realism of AI-generated videos, yet their outputs still exhibit artifacts such as temporal inconsistencies, structural distortions, and semantic incoherence. While Multimodal Large Language Models (MLLMs) show strong visual understanding capabilities, their ability to perceive and reason about such artifacts remains unclear. Existing benchmarks often lack systematic evaluation of artifact-aware perception and fine-grained diagnostic reasoning, especially across diverse AI-generated video domains beyond photorealistic content. To address this gap, we introduce Artifact-Bench, a comprehensive benchmark for evaluating MLLMs on AI-generated video artifact detection and analysis. We first establish a three-level hierarchical taxonomy of realism artifacts, covering photorealistic, animated, and CG-style videos. Based on this taxonomy, Artifact-Bench defines three complementary tasks: real vs. AI-generated video classification, pairwise realism comparison, and fine-grained artifact identification. Experiments on 19 leading MLLMs reveal substantial limitations in artifact perception and reasoning, with many models approaching random or even below-random performance in challenging settings. We further observe significant misalignment between MLLM judgments and human perceptual preferences, highlighting their limited reliability as general evaluators for AI-generated video realism.

preprint2026arXiv

Learning an Efficient Multi-Turn Dialogue Evaluator from Multiple LLM Judges

Evaluating the conversational abilities of large language models (LLMs) remains a challenging task. Current mainstream approaches primarily rely on the "LLM-as-a-judge" paradigm, where an LLM is prompted to serve as an evaluator to assess dialogue quality. However, such methods often suffer from various biases, which undermine the reliability and consistency of the evaluation results. To mitigate these biases, recent methods employ multiple LLMs as judges and aggregate their judgments to select the optimal assessment. Although effective, this multi-judge approach incurs significant computational overhead during inference. In this paper, we propose an efficient dialogue evaluator that captures the collective wisdom of multiple LLM judges by aggregating their preference knowledge into a single model. Our approach preserves the advantages of diverse multi-judge feedback while drastically reducing the evaluation cost, enabling fast, flexible, and fine-grained dialogue quality assessment. Extensive experiments on seven single rating and pairwise comparison dialogue evaluation benchmarks demonstrate that our method outperforms existing baselines across diverse scenarios, showcasing its efficiency and robustness.

preprint2026arXiv

Vision-Language Introspection: Mitigating Overconfident Hallucinations in MLLMs via Interpretable Bi-Causal Steering

Object hallucination critically undermines the reliability of Multimodal Large Language Models, often stemming from a fundamental failure in cognitive introspection, where models blindly trust linguistic priors over specific visual evidence. Existing mitigations remain limited: contrastive decoding approaches operate superficially without rectifying internal semantic misalignments, while current latent steering methods rely on static vectors that lack instance-specific precision. We introduce Vision-Language Introspection (VLI), a training-free inference framework that simulates a metacognitive self-correction process. VLI first performs Attributive Introspection to diagnose hallucination risks via probabilistic conflict detection and localize the causal visual anchors. It then employs Interpretable Bi-Causal Steering to actively modulate the inference process, dynamically isolating visual evidence from background noise while neutralizing blind confidence through adaptive calibration. VLI achieves state-of-the-art performance on advanced models, reducing object hallucination rates by 12.67% on MMHal-Bench and improving accuracy by 5.8% on POPE.

preprint2021arXiv

XAlgo: a Design Probe of Explaining Algorithms' Internal States via Question-Answering

Algorithms often appear as 'black boxes' to non-expert users. While prior work focuses on explainable representations and expert-oriented exploration, we propose and study an interactive approach using question answering to explain deterministic algorithms to non-expert users who need to understand the algorithms' internal states (e.g., students learning algorithms, operators monitoring robots, admins troubleshooting network routing). We construct XAlgo -- a formal model that first classifies the type of question based on a taxonomy and generates an answer based on a set of rules that extract information from representations of an algorithm's internal states, e.g., the pseudocode. A design probe in an algorithm learning scenario with 18 participants (9 for a Wizard-of-Oz XAlgo and 9 as a control group) reports findings and design implications based on what kinds of questions people ask, how well XAlgo responds, and what remain as challenges to bridge users' gulf of understanding algorithms.

preprint2020arXiv

3D Monte Carlo Simulation of Light Distribution in Mouse Brain in Quantitative Photoacoustic Computed Tomography

Photoacoustic computed tomography (PACT) detects light-induced ultrasound waves to reconstruct the optical absorption contrast of the biological tissues. Due to its relatively deep penetration (several centimeters in soft tissue), high spatial resolution, and inherent functional sensitivity, PACT has great potential for imaging mouse brains with endogenous and exogenous contrasts, which is of immense interest to the neuroscience community. However, conventional PACT either assumes homogenous optical fluence within the brain or uses a simplified attenuation model for optical fluence estimation. Both approaches underestimate the complexity of the fluence heterogeneity and can result in poor quantitative imaging accuracy. To optimize the quantitative performance of PACT, we explore for the first time 3D Monte Carlo simulation to study the optical fluence distribution in a complete mouse brain model. We apply the MCX Monte Carlo simulation package on a digital mouse (Digimouse) brain atlas that has complete anatomy information. To evaluate the impact of the brain vasculature on light delivery, we also incorporate the whole-brain vasculature in the Digimouse atlas. The simulation results clearly show that the optical fluence in the mouse brain is heterogeneous at the global level and can decrease by a factor of five with increasing depth. Moreover, the strong absorption and scattering of the brain vasculature also induce the fluence disturbance at the local level. Our results suggest that both global and local fluence heterogeneity contributes to the reduced quantitative accuracy of the reconstructed PACT images of mouse brain.