Source author record

Shiwen Zhang

Shiwen Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision math-ph math.MP math.SP math.AP math.DS math.FA math.NA Numerical Analysis

Catalog footprint

What is connected

11works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

QwenStyle: Content-Preserving Style Transfer with Qwen-Image-Edit

Content-Preserving Style transfer, given content and style references, remains challenging for Diffusion Transformers (DiTs) due to its internal entangled content and style features. In this technical report, we propose the first content-preserving style transfer model trained on Qwen-Image-Edit, which activates Qwen-Image-Edit's strong content preservation and style customization capability. We collected and filtered high quality data of limited specific styles and synthesized triplets with thousands categories of style images in-the-wild. We introduce the Curriculum Continual Learning framework to train QwenStyle with such mixture of clean and noisy triplets, which enables QwenStyle to generalize to unseen styles without degradation of the precise content preservation capability. Our QwenStyle V1 achieves state-of-the-art performance in three core metrics: style similarity, content consistency, and aesthetic quality.

preprint2025arXiv

TeleWorld: Towards Dynamic Multimodal Synthesis with a 4D World Model

World models aim to endow AI systems with the ability to represent, generate, and interact with dynamic environments in a coherent and temporally consistent manner. While recent video generation models have demonstrated impressive visual quality, they remain limited in real-time interaction, long-horizon consistency, and persistent memory of dynamic scenes, hindering their evolution into practical world models. In this report, we present TeleWorld, a real-time multimodal 4D world modeling framework that unifies video generation, dynamic scene reconstruction, and long-term world memory within a closed-loop system. TeleWorld introduces a novel generation-reconstruction-guidance paradigm, where generated video streams are continuously reconstructed into a dynamic 4D spatio-temporal representation, which in turn guides subsequent generation to maintain spatial, temporal, and physical consistency. To support long-horizon generation with low latency, we employ an autoregressive diffusion-based video model enhanced with Macro-from-Micro Planning (MMPL)--a hierarchical planning method that reduces error accumulation from frame-level to segment-level-alongside efficient Distribution Matching Distillation (DMD), enabling real-time synthesis under practical computational budgets. Our approach achieves seamless integration of dynamic object modeling and static scene representation within a unified 4D framework, advancing world models toward practical, interactive, and computationally accessible systems. Extensive experiments demonstrate that TeleWorld achieves strong performance in both static and dynamic world understanding, long-term consistency, and real-time generation efficiency, positioning it as a practical step toward interactive, memory-enabled world models for multimodal generation and embodied intelligence.

preprint2022arXiv

Approximating the ground state eigenvalue via the effective potential

In this paper, we study 1-d random Schrödinger operators on a finite interval with Dirichlet boundary conditions. We are interested in the approximation of the ground state energy using the minimum of the effective potential. For the 1-d continuous Anderson Bernoulli model, we show that the ratio of the ground state energy and the minimum of the effective potential approaches $\frac{π^2}{8}$ as the domain size approaches infinity. Besides, we will discuss various approximations to the ratio in different situations. There will be numerical experiments supporting our main results for the ground state energy and also supporting approximations for the excited states energies.

preprint2022arXiv

On the Reduced Hartree-Fock Equations with a Small Anderson Type Background Charge Distribution

We demonstrate that the reduced Hartree-Fock equation (REHF) with a small Anderson type background charge distribution has an unique stationary solution by explicitly computing a screening mass at positive temperature.

preprint2022arXiv

TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal Reasoning

Temporal Reasoning is one important functionality for vision intelligence. In computer vision research community, temporal reasoning is usually studied in the form of video classification, for which many state-of-the-art Neural Network structures and dataset benchmarks are proposed in recent years, especially 3D CNNs and Kinetics. However, some recent works found that current video classification benchmarks contain strong biases towards static features, thus cannot accurately reflect the temporal modeling ability. New video classification benchmarks aiming to eliminate static biases are proposed, with experiments on these new benchmarks showing that the current clip-based 3D CNNs are outperformed by RNN structures and recent video transformers. In this paper, we find that 3D CNNs and their efficient depthwise variants, when video-level sampling strategy is used, are actually able to beat RNNs and recent vision transformers by significant margins on static-unbiased temporal reasoning benchmarks. Further, we propose Temporal Fully Connected Block (TFC Block), an efficient and effective component, which approximates fully connected layers along temporal dimension to obtain video-level receptive field, enhancing the spatiotemporal reasoning ability. With TFC blocks inserted into Video-level 3D CNNs (V3D), our proposed TFCNets establish new state-of-the-art results on synthetic temporal reasoning benchmark, CATER, and real world static-unbiased dataset, Diving48, surpassing all previous methods.

preprint2021arXiv

The landscape law for tight binding Hamiltonians

The present paper extends the landscape theory pioneered in [FM, ADFJM2, DFM] to the tight-binding Schrödinger operator on $\Z^d$. In particular, we establish upper and lower bounds for the integrated density of states in terms of the counting function based upon the localization landscape.

preprint2020arXiv

Knowledge Integration Networks for Action Recognition

In this work, we propose Knowledge Integration Networks (referred as KINet) for video action recognition. KINet is capable of aggregating meaningful context features which are of great importance to identifying an action, such as human information and scene context. We design a three-branch architecture consisting of a main branch for action recognition, and two auxiliary branches for human parsing and scene recognition which allow the model to encode the knowledge of human and scene for action recognition. We explore two pre-trained models as teacher networks to distill the knowledge of human and scene for training the auxiliary tasks of KINet. Furthermore, we propose a two-level knowledge encoding mechanism which contains a Cross Branch Integration (CBI) module for encoding the auxiliary knowledge into medium-level convolutional features, and an Action Knowledge Graph (AKG) for effectively fusing high-level context information. This results in an end-to-end trainable framework where the three tasks can be trained collaboratively, allowing the model to compute strong context knowledge efficiently. The proposed KINet achieves the state-of-the-art performance on a large-scale action recognition benchmark Kinetics-400, with a top-1 accuracy of 77.8%. We further demonstrate that our KINet has strong capability by transferring the Kinetics-trained model to UCF-101, where it obtains 97.8% top-1 accuracy.

preprint2020arXiv

Large deviation estimates and Hölder regularity of the Lyapunov exponents for quasi-periodic Schrödinger cocycles

We consider one-dimensional quasi-periodic Schrödinger operators with analytic potentials. In the positive Lyapunov exponent regime, we prove large deviation estimates which lead to optimal Hölder continuity of the Lyapunov exponents and the integrated density of states, in both small Lyapunov exponent and large coupling regimes. Our results cover all the Diophantine frequencies and some Liouville frequencies.

preprint2020arXiv

V4D:4D Convolutional Neural Networks for Video-level Representation Learning

Most existing 3D CNNs for video representation learning are clip-based methods, and thus do not consider video-level temporal evolution of spatio-temporal features. In this paper, we propose Video-level 4D Convolutional Neural Networks, referred as V4D, to model the evolution of long-range spatio-temporal representation with 4D convolutions, and at the same time, to preserve strong 3D spatio-temporal representation with residual connections. Specifically, we design a new 4D residual block able to capture inter-clip interactions, which could enhance the representation power of the original clip-level 3D CNNs. The 4D residual blocks can be easily integrated into the existing 3D CNNs to perform long-range modeling hierarchically. We further introduce the training and inference methods for the proposed V4D. Extensive experiments are conducted on three video recognition benchmarks, where V4D achieves excellent results, surpassing recent 3D CNNs by a large margin.

preprint2015arXiv

Mixed spectral types for the one frequency discrete quasi-periodic Schrödinger operator

We consider a family of one frequency discrete analytic quasi-periodic Schrödinger operators which appear in [Bjer]. We show that this family provides an example of coexistence of absolutely continuous and point spectrum for some parameters as well as coexistence of absolutely continuous and singular continuous spectrum for some other parameters.

preprint2015arXiv

Quantitative continuity of singular continuous spectral measures and arithmetic criteria for quasiperiodic Schrödinger operators

We introduce a notion of $β$-almost periodicity and prove quantitative lower spectral/quantum dynamical bounds for general bounded $β$-almost periodic potentials. Applications include a sharp arithmetic criterion of full spectral dimensionality for analytic quasiperiodic Schrödinger operators in the positive Lyapunov exponent regime and arithmetic criteria for families with zero Lyapunov exponents, with applications to Sturmian potentials and the critical almost Mathieu operator.

Shiwen Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

QwenStyle: Content-Preserving Style Transfer with Qwen-Image-Edit

TeleWorld: Towards Dynamic Multimodal Synthesis with a 4D World Model

Approximating the ground state eigenvalue via the effective potential

On the Reduced Hartree-Fock Equations with a Small Anderson Type Background Charge Distribution

TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal Reasoning

The landscape law for tight binding Hamiltonians

Knowledge Integration Networks for Action Recognition

Large deviation estimates and Hölder regularity of the Lyapunov exponents for quasi-periodic Schrödinger cocycles

V4D:4D Convolutional Neural Networks for Video-level Representation Learning

Mixed spectral types for the one frequency discrete quasi-periodic Schrödinger operator

Quantitative continuity of singular continuous spectral measures and arithmetic criteria for quasiperiodic Schrödinger operators

Shiwen Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

QwenStyle: Content-Preserving Style Transfer with Qwen-Image-Edit

TeleWorld: Towards Dynamic Multimodal Synthesis with a 4D World Model

Approximating the ground state eigenvalue via the effective potential

On the Reduced Hartree-Fock Equations with a Small Anderson Type Background Charge Distribution

TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal Reasoning

The landscape law for tight binding Hamiltonians

Knowledge Integration Networks for Action Recognition

Large deviation estimates and Hölder regularity of the Lyapunov exponents for quasi-periodic Schrödinger cocycles

V4D:4D Convolutional Neural Networks for Video-level Representation Learning

Mixed spectral types for the one frequency discrete quasi-periodic Schrödinger operator

Quantitative continuity of singular continuous spectral measures and arithmetic criteria for quasiperiodic Schrödinger operators

Mixed spectral types for the one frequency discrete quasi-periodic Schrödinger operator