Source author record

Xinglong Wu

Xinglong Wu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision math.AP cond-mat.mtrl-sci Artificial Intelligence cond-mat.mes-hall Machine Learning

Catalog footprint

What is connected

8works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

We present NextFlow, a unified decoder-only autoregressive transformer trained on 6 trillion interleaved text-image discrete tokens. By leveraging a unified vision representation within a unified autoregressive architecture, NextFlow natively activates multimodal understanding and generation capabilities, unlocking abilities of image editing, interleaved content and video generation. Motivated by the distinct nature of modalities - where text is strictly sequential and images are inherently hierarchical - we retain next-token prediction for text but adopt next-scale prediction for visual generation. This departs from traditional raster-scan methods, enabling the generation of 1024x1024 images in just 5 seconds - orders of magnitude faster than comparable AR models. We address the instabilities of multi-scale generation through a robust training recipe. Furthermore, we introduce a prefix-tuning strategy for reinforcement learning. Experiments demonstrate that NextFlow achieves state-of-the-art performance among unified models and rivals specialized diffusion baselines in visual quality.

preprint2026arXiv

VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation

Visual generation is dominated by three paradigms: AutoRegressive (AR), diffusion, and Visual AutoRegressive (VAR) models. Unlike AR and diffusion, VARs operate on heterogeneous input structures across their generation steps, which creates severe asynchronous policy conflicts. This issue becomes particularly acute in reinforcement learning (RL) scenarios, leading to unstable training and suboptimal alignment. To resolve this, we propose a novel framework to enhance Group Relative Policy Optimization (GRPO) by explicitly managing these conflicts. Our method integrates three synergistic components: 1) a stabilizing intermediate reward to guide early-stage generation; 2) a dynamic time-step reweighting scheme for precise credit assignment; and 3) a novel mask propagation algorithm, derived from principles of Reward Feedback Learning (ReFL), designed to isolate optimization effects both spatially and temporally. Our approach demonstrates significant improvements in sample quality and objective alignment over the vanilla GRPO baseline, enabling robust and effective optimization for VAR models.

preprint2022arXiv

CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

Training a text-to-image generator in the general domain (e.g., Dall.e, CogView) requires huge amounts of paired text-image data, which is too expensive to collect. In this paper, we propose a self-supervised scheme named as CLIP-GEN for general text-to-image generation with the language-image priors extracted with a pre-trained CLIP model. In our approach, we only require a set of unlabeled images in the general domain to train a text-to-image generator. Specifically, given an image without text labels, we first extract the embedding of the image in the united language-vision embedding space with the image encoder of CLIP. Next, we convert the image into a sequence of discrete tokens in the VQGAN codebook space (the VQGAN model can be trained with the unlabeled image dataset in hand). Finally, we train an autoregressive transformer that maps the image tokens from its unified language-vision representation. Once trained, the transformer can generate coherent image tokens based on the text embedding extracted from the text encoder of CLIP upon an input text. Such a strategy enables us to train a strong and general text-to-image generator with large text-free image dataset such as ImageNet. Qualitative and quantitative evaluations verify that our method significantly outperforms optimization-based text-to-image methods in terms of image quality while not compromising the text-image matching. Our method can even achieve comparable performance as flagship supervised models like CogView.

preprint2016arXiv

Efficient Thermal Conductance in Organometallic Perovskite CH3NH3PbI3 Films

Perovskite-based optoelectronic devices have shown great promise for solar conversion and other optoelectronic applications, but their long-term performance instability is regarded as a major obstacle to their widespread deployment. Previous works have shown that the ultralow thermal conductivity and inefficient heat spreading might put an intrinsic limit on the lifetime of perovskite devices. Here, we report the observation of a remarkably efficient thermal conductance, with conductivity of 11.2 +/- 0.8 W m^-1 K^-1 at room temperature, in densely-packed perovskite CH3NH3PbI3 films, via noncontact time-domain thermal reflectance measurements. The temperature-dependent experiments suggest the important roles of organic cations and structural phase transitions, which are further confirmed by temperature-dependent Raman spectra. The thermal conductivity at room temperature observed here is over one order of magnitude larger than that in the early report, suggesting that perovskite device performance will not be limited by thermal stability.

preprint2015arXiv

Raman vibrational spectra of bulk to monolayer ReS2 with lower symmetry

Lattice structure and symmetry of two-dimensional (2D) layered materials are of key importance to their fundamental mechanical, thermal, electronic and optical properties. Raman spectroscopy, as a convenient and nondestructive tool, however has its limitations on identifying all symmetry allowing Raman modes and determining the corresponding crystal structure of 2D layered materials with high symmetry like graphene and MoS2. Due to lower structural symmetry and extraordinary weak interlayer coupling of ReS2, we successfully identified all 18 first-order Raman active modes for bulk and monolayer ReS2. Without van der Waals (vdW) correction, our local density approximation (LDA) calculations successfully reproduce all the Raman modes. Our calculations also suggest no surface reconstruction effect and the absence of low frequency rigid-layer Raman modes below 100 cm-1. Combining with Raman and LDA thus provides a general approach for studying the vibrational and structural properties of 2D layered materials with lower symmetry.

preprint2014arXiv

The blow-up phenomena and exponential decay of solutions for a three-component Camassa-Holm equations

The present paper is mainly concerned with the blow-up phenomena and exponential decay of solution for a three-component Camassa-Holm equation. Comparing with the result of Hu, ect. in the paper[1], a new wave-breaking solution is obtained. The results of exponential decay of solution in our paper cover and extent the corresponding results in [12, 19, 22].

preprint2014arXiv

The Well-posedness and Blow-up rate of Solution for the Generalized Zakharov equations with Magnetic field in R^d

The present paper is devoted to the study of the well-posedness and the lower bound of blow-up rate to the Cauchy problem of the generalized Zakharov(GZ) equations with magnetic field in R^d. The work of well-posedness of the GZ system bases on the local well-posedness theory in [9]. At first, the existence, uniqueness and continuity of solution to the GZ system with magnetic field in Rd is proved. Next, we establish the lower bound of blow-up rate of blow-up solution in sobolev spaces to the GZ system, which is almost a critical index. Finally, we obtain the long time behavior of global solution,whose H^k-norm grows at k-exponentially in time.

preprint2013arXiv

Global Existence and Nonlinear Stability for the Coupled CGL Burgers Equations for Sequential flames in RN

The present paper is devoted to the study of the global solution and nonlinear stability to the coupled complex Ginzburg Landau and Burgers equations for sequential flames