Source author record

Yuji Wang

Yuji Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mes-hall Multimedia Sound

Catalog footprint

What is connected

2works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Omni-Customizer: End-to-End MultiModal Customization for Joint Audio-Video Generation

The landscape of joint audio and video generation has been fundamentally transformed by the advent of powerful foundation models. Despite these strides, achieving cohesive multimodal customization for the simultaneous preservation of visual identities and vocal timbres across multiple interacting subjects remains largely underexplored. To bridge this gap, we present Omni-Customizer, an end-to-end framework targeted at the precise binding and seamless fusion of multimodal identity information. Specifically, we introduce an Omni-Context Fusion (OCF) module that effectively enriches the base textual prompt with dense, multimodal identity cues, along with a Masked TTS Cross-Attention (MTP-CA) mechanism explicitly designed to prevent the severe "speech leakage" problem. Within this architecture, we propose Semantic-Anchored Multimodal RoPE (SA-MRoPE) to anchor visual and audio reference tokens, along with TTS embeddings, to their corresponding semantic descriptions, enabling structured multimodal fusion and robust identity binding. Furthermore, we devise a comprehensive training strategy that incorporates interleaved audio-video scheduling to rapidly adapt the audio branch to multilingual scenarios without degrading foundational priors, and a progressive in-pair to cross-pair curriculum to facilitate the learning of high-level and robust identity features. Extensive experiments demonstrate that Omni-Customizer achieves state-of-the-art performance in dual-modal customized generation, excelling across visual identity similarity, timbre consistency, precise audio-video synchronization, and overall video-audio fidelity.

preprint2013arXiv

Low frequency noise in chemical vapor deposited MoS2

Inherent low frequency noise is a ubiquitous phenomenon, which limits operation and performance of electronic devices and circuits. This limiting factor is very important for nanoscale electronic devices, such as 2D semiconductor devices. In this work, low frequency noise in high mobility single crystal MoS2 grown by chemical vapor deposition (CVD) is investigated. The measured low frequency noise follows an empirical formulation of mobility fluctuations with Hooge' s parameter ranging between 1.44E-3 and 3.51E-2. Small variation of Hooge's parameter suggests superior material uniformity and processing control of CVD grown MoS2 devices than reported single-layer MoS2 FET. The extracted Hooge's parameter is one order of magnitude lower than CVD grown graphene. The Hooge's parameter shows an inverse relationship with the field mobility.