Source author record

Yan Lu

Yan Lu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence astro-ph.GA Computation and Language eess.AS Human-Computer Interaction Sound

Catalog footprint

What is connected

5works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Unified Neural Codec Language Model for Selective Editable Text to Speech Generation

Neural codec language models achieve impressive zero-shot Text-to-Speech (TTS) by fully imitating the acoustic characteristics of a short speech prompt, including timbre, prosody, and paralinguistic information. However, such holistic imitation limits their ability to isolate and control individual attributes. In this paper, we present a unified codec language model SpeechEdit that extends zero-shot TTS with a selective control mechanism. By default, SpeechEdit reproduces the complete acoustic profile inferred from the speech prompt, but it selectively overrides only the attributes specified by explicit control instructions. To enable controllable modeling, SpeechEdit is trained on our newly constructed LibriEdit dataset, which provides delta (difference-aware) training pairs derived from LibriHeavy. Experimental results show that our approach maintains naturalness and robustness while offering flexible and localized control over desired attributes. Audio samples are available at https://speech-editing.github.io/speech-editing/.

preprint2026arXiv

An Efficient Streaming Video Understanding Framework with Agentic Control

Streaming video requires handling dynamic information density under strict latency budgets. Yet, existing methods typically employ static strategies, such as fixed memory compression or reliance on a single model, forcing a trade-off: fast models fail on complex queries, while always-on heavy models violate real-time constraints and overcomplicate simple queries. Rather than fixing these decisions upfront, we propose R3-Streaming (Remember, Respond, Reason), which formulates streaming video understanding as a cascaded control problem: for each query, the system compresses memory, judges response readiness, and routes computation sequentially, so that each downstream decision builds on progressively refined information states. To optimize this pipeline, we introduce an age-aware forgetting policy for memory compression, as aggressively compressing historical frames can yield substantial performance gains. For compute routing, we propose TB-GRPO, a target-balanced reinforcement learning objective that routes hard queries to a stronger model while preventing mode collapse. Extensive evaluations demonstrate that R3-Streaming achieves state-of-the-art results among streaming MLLMs, reaching 57.92 on OVO-Bench and 76.36 on StreamingBench, while reducing visual token usage by 95 to 96 percent.

preprint2026arXiv

Breaking Coordinate Overfitting: Geometry-Aware WiFi Sensing for Cross-Layout 3D Pose Estimation

WiFi-based 3D human pose estimation offers a low-cost and privacy-preserving alternative to vision-based systems for smart interaction. However, existing approaches rely on visual 3D poses as supervision and directly regress CSI to a camera-based coordinate system. We find that this practice leads to coordinate overfitting: models memorize deployment-specific WiFi transceiver layouts rather than only learning activity-relevant representations, resulting in severe generalization failures. To address this challenge, we present PerceptAlign, the first geometry-conditioned framework for WiFi-based cross-layout pose estimation. PerceptAlign introduces a lightweight coordinate unification procedure that aligns WiFi and vision measurements in a shared 3D space using only two checkerboards and a few photos. Within this unified space, it encodes calibrated transceiver positions into high-dimensional embeddings and fuses them with CSI features, making the model explicitly aware of device geometry as a conditional variable. This design forces the network to disentangle human motion from deployment layouts, enabling robust and, for the first time, layout-invariant WiFi pose estimation. To support systematic evaluation, we construct the largest cross-domain 3D WiFi pose estimation dataset to date, comprising 21 subjects, 5 scenes, 18 actions, and 7 device layouts. Experiments show that PerceptAlign reduces in-domain error by 12.3% and cross-domain error by more than 60% compared to state-of-the-art baselines. These results establish geometry-conditioned learning as a viable path toward scalable and practical WiFi sensing.

preprint2026arXiv

InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training

GUI agents that interact with graphical interfaces on behalf of users represent a promising direction for practical AI assistants. However, training such agents is hindered by the scarcity of suitable environments. We present InfiniteWeb, a system that automatically generates functional web environments at scale for GUI agent training. While LLMs perform well on generating a single webpage, building a realistic and functional website with many interconnected pages faces challenges. We address these challenges through unified specification, task-centric test-driven development, and a combination of website seed with reference design image to ensure diversity. Our system also generates verifiable task evaluators enabling dense reward signals for reinforcement learning. Experiments show that InfiniteWeb surpasses commercial coding agents at realistic website construction, and GUI agents trained on our generated environments achieve significant performance improvements on OSWorld and Online-Mind2Web, demonstrating the effectiveness of proposed system.

preprint2026arXiv

Radio AGN feedback sustains quiescence only in a minority of massive galaxies

Radio active galactic nuclei (AGNs) eject a huge amount of energy into the surrounding medium and are thought to potentially prevent gas cooling and maintain the quiescence of massive galaxies. The short-lived, sporadic, and anisotropic nature of radio activities, coupled with the detection of abundant cold gas around some massive quiescent galaxies, raise questions about the efficiency of radio feedback in massive galaxies. Here we present an innovative method rooted in artificial intelligence to separate galaxies in which radio feedback is effective (RFE), regardless of current radio emission, from those in which radio feedback is ineffective (RFI), according to their optical images. Galaxies categorized as RFE are all dynamically hot, whereas quiescent RFI (RFI-Q) galaxies usually have extended cold-disk components. At given stellar mass, dark matter halos hosting RFE galaxies are between four to ten times more massive than those of RFI-Q galaxies. We find, for the first time, that almost all RFE galaxies have scant cold gas, irrespective of AGN activity. In contrast, many RFI-Q galaxies are surrounded by substantial amounts of condensed atomic gas, indicating a different evolutionary path from RFE galaxies. Our finding provides direct and compelling evidence that a radio AGN has gone through about 300 on-off cycles and that radio feedback can prevent gas cooling over a timescale much longer than that of radio activity. Contrary to general belief, our analysis shows that only a small fraction of massive galaxies are influenced by strong radio AGNs, suggesting that current galaxy formation models need serious revision.

Yan Lu

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

A Unified Neural Codec Language Model for Selective Editable Text to Speech Generation

An Efficient Streaming Video Understanding Framework with Agentic Control

Breaking Coordinate Overfitting: Geometry-Aware WiFi Sensing for Cross-Layout 3D Pose Estimation

InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training

Radio AGN feedback sustains quiescence only in a minority of massive galaxies