Researcher profile

Xihua Wang

Xihua Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2026arXiv

Qwen-Image-2.0 Technical Report

We present Qwen-Image-2.0, an omni-capable image generation foundation model that unifies high-fidelity generation and precise image editing within a single framework. Despite recent progress, existing models still struggle with ultra-long text rendering, multilingual typography, high-resolution photorealism, robust instruction following, and efficient deployment, especially in text-rich and compositionally complex scenarios. Qwen-Image-2.0 addresses these challenges by coupling Qwen3-VL as the condition encoder with a Multimodal Diffusion Transformer for joint condition-target modeling, supported by large-scale data curation and a customized multi-stage training pipeline. This enables strong multimodal understanding while preserving flexible generation and editing capabilities. The model supports instructions of up to 1K tokens for generating text-rich content such as slides, posters, infographics, and comics, while significantly improving multilingual text fidelity and typography. It also enhances photorealistic generation with richer details, more realistic textures, and coherent lighting, and follows complex prompts more reliably across diverse styles. Extensive human evaluations show that Qwen-Image-2.0 substantially outperforms previous Qwen-Image models in both generation and editing, marking a step toward more general, reliable, and practical image generation foundation models.

preprint2026arXiv

SyncDPO: Enhancing Temporal Synchronization in Video-Audio Joint Generation via Preference Learning

Recent advancements in video-audio joint generation have achieved remarkable success in semantic correspondence. However, achieving precise temporal synchronization, which requires fine-grained alignment between audio events and their visual triggers, remains a challenging problem. The post-training method for joint generation is largely dominated by Supervised Fine-Tuning, but the commonly used Mean Squared Error loss provides insufficient penalties for subtle temporal misalignments. Direct Preference Optimization offers an alternative by introducing explicit misaligned counterparts to better improve temporal sensitivity. In this paper we propose a post-training framework SyncDPO, leveraging DPO to improve the temporal sensitivity of V-A joint generation. Conventional DPO pipelines typically depend on costly sampling-and-ranking procedures to construct preference pairs, resulting in substantial computational cost. To improve efficiency, we introduce a suite of on-the-fly rule-based negative construction strategies that distort temporal structures without incurring additional annotation or sampling. We demonstrate that the temporal alignment capability can be effectively reinforced by providing explicit negative supervision through temporally distorted V-A pairs. Accordingly, we implement a curriculum learning strategy that progressively increases the difficulty of negative samples, transitioning from coarse misalignment to subtle inconsistencies. Extensive objective and subjective experiments across four diverse benchmarks, ranging from ambient sound videos to human speech videos, demonstrate that SyncDPO significantly outperforms other methods in improving model's temporal alignment capability. It also demonstrates superior generalization on out-of-distribution benchmark by capturing intrinsic motion-sound dynamics. Demo and code is available in https://syncdpo.github.io/syncdpo/.

preprint2022arXiv

Surface microlenses for much more efficient photodegradation in water treatment

The global need for clean water requires sustainable technology for purifying contaminated water. Highly efficient solar-driven photodegradation is a sustainable strategy for wastewater treatment. In this work, we demonstrate that the photodegradation efficiency of micropollutants in water can be improved by ~2-24 times by leveraging polymeric microlenses (MLs). These microlenses (MLs) are fabricated from the in-situ polymerization of surface nanodroplets. We found that photodegradation efficiency (η) in water correlates approximately linearly with the sum of the intensity from all focal points of MLs, although no difference in the photodegradation pathway is detected from the chemical analysis of the byproducts. With the same overall power over a given surface area, η is doubled by using ordered arrays, compared to heterogeneous MLs on an unpatterned substrate. Higher η from ML arrays may be attributed to a coupled effect from the focal points on the same plane that creates high local concentrations of active species to further speed up the rate of photodegradation. As a proof-of-concept for ML-enhanced water treatment, MLs were formed on the inner wall of glass bottles that were used as containers for water to be treated. Three representative micropollutants (norfloxacin, sulfadiazine, and sulfamethoxazole) in the bottles functionalized by MLs were photodegraded by 30% to 170% faster than in normal bottles. Our findings suggest that the ML-enhanced photodegradation may lead to a highly efficient solar water purification approach without a large solar collector size. Such an approach may be particularly suitable for portable transparent bottles in remote regions.