Source author record

Zhen Ye

Zhen Ye appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.dis-nn Artificial Intelligence Computation and Language cond-mat eess.AS Machine Learning Robotics Sound

Catalog footprint

What is connected

7works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

EARL: Towards a Unified Analysis-Guided Reinforcement Learning Framework for Egocentric Interaction Reasoning and Pixel Grounding

Understanding human--environment interactions from egocentric vision is essential for assistive robotics and embodied intelligent agents, yet existing multimodal large language models (MLLMs) still struggle with accurate interaction reasoning and fine-grained pixel grounding. To this end, this paper introduces EARL, an Egocentric Analysis-guided Reinforcement Learning framework that explicitly transfers coarse interaction semantics to query-oriented answering and grounding. Specifically, EARL adopts a two-stage parsing framework including coarse-grained interpretation and fine-grained response. The first stage holistically interprets egocentric interactions and generates a structured textual description. The second stage produces the textual answer and pixel-level mask in response to the user query. To bridge the two stages, we extract a global interaction descriptor as a semantic prior, which is integrated via a novel Analysis-guided Feature Synthesizer (AFS) for query-oriented reasoning. To optimize heterogeneous outputs, including textual answers, bounding boxes, and grounding masks, we design a multi-faceted reward function and train the response stage with GRPO. Experiments on Ego-IRGBench show that EARL achieves 65.48% cIoU for pixel grounding, outperforming previous RL-based methods by 8.37%, while OOD grounding results on EgoHOS indicate strong transferability to unseen egocentric grounding scenarios.

preprint2026arXiv

Towards Comprehensive Stage-wise Benchmarking of Large Language Models in Fact-Checking

Large Language Models (LLMs) are increasingly deployed in real-world fact-checking systems, yet existing evaluations focus predominantly on claim verification and overlook the broader fact-checking workflow, including claim extraction and evidence retrieval. This narrow focus prevents current benchmarks from revealing systematic reasoning failures, factual blind spots, and robustness limitations of modern LLMs. To bridge this gap, we present FactArena, a fully automated arena-style evaluation framework that conducts comprehensive, stage-wise benchmarking of LLMs across the complete fact-checking pipeline. FactArena integrates three key components: (i) an LLM-driven fact-checking process that standardizes claim decomposition, evidence retrieval via tool-augmented interactions, and justification-based verdict prediction; (ii) an arena-styled judgment mechanism guided by consolidated reference guidelines to ensure unbiased and consistent pairwise comparisons across heterogeneous judge agents; and (iii) an arena-driven claim-evolution module that adaptively generates more challenging and semantically controlled claims to probe LLMs' factual robustness beyond fixed seed data. Across 16 state-of-the-art LLMs spanning seven model families, FactArena produces stable and interpretable rankings. Our analyses further reveal significant discrepancies between static claim-verification accuracy and end-to-end fact-checking competence, highlighting the necessity of holistic evaluation. The proposed framework offers a scalable and trustworthy paradigm for diagnosing LLMs' factual reasoning, guiding future model development, and advancing the reliable deployment of LLMs in safety-critical fact-checking applications.

preprint2024arXiv

CoMoSVC: Consistency Model-based Singing Voice Conversion

The diffusion-based Singing Voice Conversion (SVC) methods have achieved remarkable performances, producing natural audios with high similarity to the target timbre. However, the iterative sampling process results in slow inference speed, and acceleration thus becomes crucial. In this paper, we propose CoMoSVC, a consistency model-based SVC method, which aims to achieve both high-quality generation and high-speed sampling. A diffusion-based teacher model is first specially designed for SVC, and a student model is further distilled under self-consistency properties to achieve one-step sampling. Experiments on a single NVIDIA GTX4090 GPU reveal that although CoMoSVC has a significantly faster inference speed than the state-of-the-art (SOTA) diffusion-based SVC system, it still achieves comparable or superior conversion performance based on both subjective and objective metrics. Audio samples and codes are available at https://comosvc.github.io/.

preprint2021arXiv

Pairwise Point Cloud Registration using Graph Matching and Rotation-invariant Features

Registration is a fundamental but critical task in point cloud processing, which usually depends on finding element correspondence from two point clouds. However, the finding of reliable correspondence relies on establishing a robust and discriminative description of elements and the correct matching of corresponding elements. In this letter, we develop a coarse-to-fine registration strategy, which utilizes rotation-invariant features and a new weighted graph matching method for iteratively finding correspondence. In the graph matching method, the similarity of nodes and edges in Euclidean and feature space are formulated to construct the optimization function. The proposed strategy is evaluated using two benchmark datasets and compared with several state-of-the-art methods. Regarding the experimental results, our proposed method can achieve a fine registration with rotation errors of less than 0.2 degrees and translation errors of less than 0.1m.

preprint2002arXiv

Are waves all localized in two dimensional random media?

It has been the dominant view for over two decades that all waves are localized in two dimensions for any given amount of disorder. Here, I would like to raise questions about this assertion. The discussion leads to the conclusion that there is a lack of the conclusive and definite support of the claim. Rather, the recent evidence tends to indicate that waves are not necessarily always localized in two dimensional random systems. Reasons are elaborated.

preprint2001arXiv

Propagation inhibition and wave localization in a 2D random liquid medium

Acoustic propagation and scattering in water containing many parallel air-filled cylinders is studied. Two situations are considered and compared: (1) wave propagating through the array of cylinders, imitating a traditional experimental setup, and (2) wave transmitted from a source located inside the ensemble. We show that waves can be blocked from propagation by disorders in the first scenario, but the inhibition does not necessarily imply wave localization. Furthermore, the results reveal the phenomenon of wave localization in a range of frequencies.

preprint2000arXiv

Localization of acoustic waves in 1D random liquid media

We study acoustic propagation in one dimensional water ducts containing many air-filled blocks. The acoustic band structures for the periodic arrangements of the blocks is calculated, whereas the transmission for various random configurations of the blocks is computed by the transfer matrix method. It is shown that waves at all frequencies become localized even with a small amount of randomness. The spatial distribution of the localized energy is investigated, and is shown not to be trapped near the source, contrary to higher dimensional cases. The results also reveal a distinct collective behaviour for localized waves, a feature useful for distinguishing the localization from the residual absorption effect.