Source author record

Qian He

Qian He appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Social and Information Networks

Catalog footprint

What is connected

5works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer

Video Face Swapping (VFS) requires seamlessly injecting a source identity into a target video while meticulously preserving the original pose, expression, lighting, background, and dynamic information. Existing methods struggle to maintain identity similarity and attribute preservation while preserving temporal consistency. To address the challenge, we propose a comprehensive framework to seamlessly transfer the superiority of Image Face Swapping (IFS) to the video domain. We first introduce a novel data pipeline SyncID-Pipe that pre-trains an Identity-Anchored Video Synthesizer and combines it with IFS models to construct bidirectional ID quadruplets for explicit supervision. Building upon paired data, we propose the first Diffusion Transformer-based framework DreamID-V, employing a core Modality-Aware Conditioning module to discriminatively inject multi-model conditions. Meanwhile, we propose a Synthetic-to-Real Curriculum mechanism and an Identity-Coherence Reinforcement Learning strategy to enhance visual realism and identity consistency under challenging scenarios. To address the issue of limited benchmarks, we introduce IDBench-V, a comprehensive benchmark encompassing diverse scenes. Extensive experiments demonstrate DreamID-V outperforms state-of-the-art methods and further exhibits exceptional versatility, which can be seamlessly adapted to various swap-related tasks.

preprint2026arXiv

DreamStyle: A Unified Framework for Video Stylization

Video stylization, an important downstream task of video generation models, has not yet been thoroughly explored. Its input style conditions typically include text, style image, and stylized first frame. Each condition has a characteristic advantage: text is more flexible, style image provides a more accurate visual anchor, and stylized first frame makes long-video stylization feasible. However, existing methods are largely confined to a single type of style condition, which limits their scope of application. Additionally, their lack of high-quality datasets leads to style inconsistency and temporal flicker. To address these limitations, we introduce DreamStyle, a unified framework for video stylization, supporting (1) text-guided, (2) style-image-guided, and (3) first-frame-guided video stylization, accompanied by a well-designed data curation pipeline to acquire high-quality paired video data. DreamStyle is built on a vanilla Image-to-Video (I2V) model and trained using a Low-Rank Adaptation (LoRA) with token-specific up matrices that reduces the confusion among different condition tokens. Both qualitative and quantitative evaluations demonstrate that DreamStyle is competent in all three video stylization tasks, and outperforms the competitors in style consistency and video quality.

preprint2026arXiv

Lance: Unified Multimodal Modeling by Multi-Task Synergy

We present Lance, a lightweight native unified model supporting multimodal understanding, generation, and editing for both images and videos. Rather than relying on model capacity scaling or text-image-dominant designs, Lance explores a practical paradigm for unified multimodal modeling via collaborative multi-task training. It is grounded in two core principles: unified context modeling and decoupled capability pathways. Specifically, Lance is trained from scratch and employs a dual-stream mixture-of-experts architecture on shared interleaved multimodal sequences, enabling joint context learning while decoupling the pathways for understanding and generation. We further introduce modality-aware rotary positional encoding to mitigate interference among heterogeneous visual tokens and boost cross-task alignment. During training, Lance adopts a staged multi-task training paradigm with capability-oriented objectives and adaptive data scheduling to strengthen both semantic comprehension and visual generation performance. Experimental results demonstrate that Lance substantially outperforms existing open-source unified models in image and video generation, while retaining strong multimodal understanding capabilities. The homepage is available at https://lance-project.github.io.

preprint2026arXiv

Mobility Inequity and Risk Response After Hurricane Helene: Evidence from Real-Time Travel and Social Sentiment Data

Hurricanes severely disrupt infrastructure and restrict access to essential services. While the physical impacts on post-disaster mobility are well studied, less is known about how individual travel behaviors change during and after disasters, and how these responses are shaped by social and geographic disparities. This study examines mobility patterns following Hurricane Helene, a Category 4 storm that struck six southeastern U.S. states on September 26, 2024, causing over 230 fatalities. Using anonymized GPS mobility data, hurricane severity metrics, and county-level social media sentiment, we examine shifts in travel behavior and their implications for equity. We ask two questions: How do post-hurricane mobility patterns reflect community vulnerability and adaptive capacity? and How do sociodemographic conditions and public sentiment factors shape the direction and extent of mobility change? Results from robust linear and ordered logistic regressions indicate that evacuation orders increase mobility; however, severe storm conditions, particularly high wind speeds, can limit travel. Communities with lower incomes, located in rural areas, and with higher percentages of Black populations exhibit the steepest declines in mobility, suggesting resource constraints and infrastructural barriers, while wealthier, urban, and higher-education areas maintain greater flexibility. Results also show that positive social sentiment is associated with higher mobility and a greater likelihood of increased travel during the hurricane. Our findings highlight the need to address structural barriers and social conditions in post-disaster mobility and disaster response.

preprint2026arXiv

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

We present NextFlow, a unified decoder-only autoregressive transformer trained on 6 trillion interleaved text-image discrete tokens. By leveraging a unified vision representation within a unified autoregressive architecture, NextFlow natively activates multimodal understanding and generation capabilities, unlocking abilities of image editing, interleaved content and video generation. Motivated by the distinct nature of modalities - where text is strictly sequential and images are inherently hierarchical - we retain next-token prediction for text but adopt next-scale prediction for visual generation. This departs from traditional raster-scan methods, enabling the generation of 1024x1024 images in just 5 seconds - orders of magnitude faster than comparable AR models. We address the instabilities of multi-scale generation through a robust training recipe. Furthermore, we introduce a prefix-tuning strategy for reinforcement learning. Experiments demonstrate that NextFlow achieves state-of-the-art performance among unified models and rivals specialized diffusion baselines in visual quality.

Qian He

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer

DreamStyle: A Unified Framework for Video Stylization

Lance: Unified Multimodal Modeling by Multi-Task Synergy

Mobility Inequity and Risk Response After Hurricane Helene: Evidence from Real-Time Travel and Social Sentiment Data

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation