Researcher profile

Wen Huang

Wen Huang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2026arXiv

AdaptiveLoad: Towards Efficient Video Diffusion Transformer Training

In video generation models, particularly world models, training large-scale video diffusion Transformers (such as DiT and MMDiT) poses significant computational challenges due to the extreme variance in sequence lengths within mixed-mode datasets. Existing bucket-based data loading strategies typically rely on "equal token length" constraints. This approach fails to account for the quadratic complexity of self-attention mechanisms, leading to severe load imbalance and underutilization of GPU resources. This paper proposes \textit{AdaptiveLoad}, an integrated optimization framework consisting of two core components: (1) A dual-constraint adaptive load balancing system, which eliminates long-sequence bottlenecks by simultaneously limiting memory consumption and computational load ($B \times S^p \le M_{\text{comp}}$); (2) A fused LayerNorm-Modulate CUDA kernel, which utilizes a D-tile coalesced reduction strategy to increase throughput and alleviate memory pressure. Experimental results on the Wan 2.1 world model demonstrate that our method reduces the computational imbalance rate from 39\% to 18.9\%, improves peak VRAM utilization efficiency by 22.7\%, and achieves an overall training throughput increase of 27.2\%.

preprint2026arXiv

D-VLA: A High-Concurrency Distributed Asynchronous Reinforcement Learning Framework for Vision-Language-Action Models

The rapid evolution of Embodied AI has enabled Vision-Language-Action (VLA) models to excel in multimodal perception and task execution. However, applying Reinforcement Learning (RL) to these massive models in large-scale distributed environments faces severe systemic bottlenecks, primarily due to the resource conflict between high-fidelity physical simulation and the intensive VRAM/bandwidth demands of deep learning. This conflict often leaves overall throughput constrained by execution-phase inefficiencies. To address these challenges, we propose D-VLA, a high-concurrency, low-latency distributed RL framework for large-scale embodied foundation models. D-VLA introduces "Plane Decoupling," physically isolating high-frequency training data from low-frequency weight control to eliminate interference between simulation and optimization. We further design a four-thread asynchronous "Swimlane" pipeline, enabling full parallel overlap of sampling, inference, gradient computation, and parameter distribution. Additionally, a dual-pool VRAM management model and topology-aware replication resolve memory fragmentation and optimize communication efficiency. Experiments on benchmarks like LIBERO show that D-VLA significantly outperforms mainstream RL frameworks in throughput and sampling efficiency for billion-parameter VLA models. In trillion-parameter scalability tests, our framework maintains exceptional stability and linear speedup, providing a robust system for high-performance general-purpose embodied agents.

preprint2026arXiv

Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction

Asynchronous reinforcement learning improves rollout throughput for large language model agents by decoupling sample generation from policy optimization, but it also introduces a critical failure mode for PPO-style off-policy correction. In heterogeneous training systems, the total importance ratio should ideally be decomposed into two semantically distinct factors: a \emph{training--inference discrepancy term} that aligns inference-side and training-side distributions at the same behavior-policy version, and a \emph{policy-staleness term} that constrains the update from the historical policy to the current policy. We show that practical asynchronous pipelines with delayed updates and partial rollouts often lose the required historical training-side logits, or old logits. This missing-old-logit problem entangles discrepancy repair with staleness correction, breaks the intended semantics of decoupled correction, and makes clipping and masking thresholds interact undesirably. To address this issue, we study both exact and approximate correction routes. We propose three exact old-logit acquisition strategies: snapshot-based version tracking, a dedicated old-logit model, and synchronization via partial rollout interruption, and compare their system trade-offs. From the perspective of approximate correction, we focus on preserving the benefits of decoupled correction through a more appropriate approximate policy when exact old logits cannot be recovered at low cost, without incurring extra system overhead. Following this analysis, we adopt a revised PPO-EWMA method, which achieves significant gains in both training speed and optimization performance.

preprint2026arXiv

Multiple nodal superconducting phases and order-parameter evolution in pressurized UTe$_2$

Spin-triplet superconductivity (SC) offers a unique avenue for realizing non-Abelian Majorana zero modes and thus the fault-tolerant topological quantum computation, and has attracted a broad audience for both fundamental research and potential applications. The recently discovered heavy-fermion spin-triplet superconductor candidate UTe$_2$ has sparked great interest for its ultrahigh upper critical field and reentrant SC phases in the proximity to a field-polarized magnetic state. Despite extensive studies on the phase diagrams and competing orders induced by pressure and magnetic field, limited has been known about its SC order parameters and their evolution with these control parameters, largely due to the lack of appropriate symmetry-sensitive detections. Here, we report comprehensive point-contact spectroscopy measurements of pressurized UTe$_2$ on the (0~0~1) surface. The observation of Andreev bound state strongly suggests the presence of a $p_z$ component in the SC order parameters. Quantitative analysis based on an extended Blonder-Tinkham-Klapwijk model unveils $B_{2u}$ or $B_{3u}$ as the most likely representation for both ambient and pressurized UTe$_2$, and remarkably, the multiple SC phases can be distinguished by a single parameter $\langle Δ_{z}\rangle/\langleΔ_{x(y)}\rangle$, the relative weight between the $p_z$-wave and $p_{x(y)}$-wave pairings. These findings not only impose stringent constraints on the superconducting order parameter in UTe$_2$, but also provide key spectroscopic evidence for the existence of multiple SC phases tuned through pressure.

preprint2026arXiv

Sword: Style-Robust World Models as Simulators via Dynamic Latent Bootstrapping for VLA Policy Post-Training

The integration of Vision-Language-Action (VLA) models with World Models has gained increasing attention. One representative approach treats learned World Models as generative simulators, enabling policy optimization entirely within "imagination." However, when deployed as simulators for specific environments such as the LIBERO benchmark, existing World Models often suffer from poor generalization and long-horizon error accumulation. During closed-loop rollouts, these models are highly sensitive to initial-state perturbations; minor changes in color, illumination, and other visual factors can trigger cascading hallucinations, leading to severe blurriness or overexposure. Moreover, long-horizon error accumulation further degrades the quality and fidelity of predicted future states. These issues limit the reliability of World Models as simulators. To mitigate these problems, we propose Sword, a robust World Model framework. Our method introduces Structure-Guided Style Augmentation to disentangle the visual textures of interactive environments from task-relevant dynamics, thereby improving generalization. We further propose Dynamic Latent Bootstrapping, which maintains consistency between training and inference while keeping memory consumption low. Extensive experiments on the LIBERO benchmark show that our method significantly outperforms the baseline WoVR in terms of generalization, generation quality, robustness, fidelity, and the success rate of reinforcement-learning post-training for VLA models.

preprint2023arXiv

Polynomial Furstenberg joinings and its applications

In this paper, a polynomial version of Furstenberg joining is introduced and its structure is investigated. Particularly, it is shown that if all polynomials are non-linear, then almost every ergodic component of the joining is a direct product of an infinity-step pro-nilsystem and a Bernoulli system. As applications, some new convergence theorems are obtained. Particularly, it is proved that if $T$ and $S$ are ergodic measure preserving transformations on a probability space $(X,{\mathcal X},μ)$ and $T$ has zero entropy, then for all $c_i\in {\mathbb Z}\setminus \{0\}$, all integral polynomials $p_j$ with $°{p_j}\ge 2$, and for all $f_i, g_j\in L^\infty(X,μ)$, $1\le i\le m$ and $1\le j\le d$, $$\lim_{N\to\infty} \frac{1}{N}\sum_{n=0}^{N-1}f_1(T^{c_1n}x)\cdots f_m(T^{c_mn}x)\cdot g_1(S^{p_1(n)}x)\cdots g_d(S^{p_d(n)}x),$$ exists in $L^2(X,μ)$, which extends the recent result by Host and Frantzikinakis. Moreover, it is shown that for an ergodic measure-preserving system $(X,{\mathcal X},μ,T)$, a non-linear integral polynomial $p$ and $f\in L^\infty(X,μ)$, the Furstenberg systems of $\big(f(T^{p(n)})x\big)_{n\in {\mathbb Z}}$ are ergodic and isomorphic to direct products of infinite-step pro-nilsystems and Bernoulli systems for almost every $x\in X$, which answers a problem by Frantzikinakis.

preprint2023arXiv

Topological dynamical systems induced by polynomials and combinatorial consequences

Let $d\in {\mathbb N}$ and $p_i$ be an integral polynomial with $p_i(0)=0$, $1\le i\le d$. It is shown that if $S$ is piecewise syndetic in $\mathbb Z$, then $$\{(m,n)\in{\mathbb Z}^2: m+p_1(n),\ldots,m+p_d(n)\in S\}$$ is piecewise syndetic in ${\mathbb Z}^2$, which extends the result by Glasner and Furstenberg for linear polynomials. Our result is obtained by showing the density of minimal points of a dynamical system of ${\mathbb Z}^2$ action associated with the piecewise syndetic set $S$ and the polynomials $\{p_1,\ldots,p_d\}$. Moreover, it is proved that if $(X,T)$ is minimal, then for each non-empty open subset $U$ of $X$, there is $x\in U$ with $\{n\in {\mathbb Z}: T^{p_1(n)}x\in U, \ldots, T^{p_d(n)}x\in U\}$ piecewise syndetic.

preprint2022arXiv

Solving Stackelberg Prediction Game with Least Squares Loss via Spherically Constrained Least Squares Reformulation

The Stackelberg prediction game (SPG) is popular in characterizing strategic interactions between a learner and an attacker. As an important special case, the SPG with least squares loss (SPG-LS) has recently received much research attention. Although initially formulated as a difficult bi-level optimization problem, SPG-LS admits tractable reformulations which can be polynomially globally solved by semidefinite programming or second order cone programming. However, all the available approaches are not well-suited for handling large-scale datasets, especially those with huge numbers of features. In this paper, we explore an alternative reformulation of the SPG-LS. By a novel nonlinear change of variables, we rewrite the SPG-LS as a spherically constrained least squares (SCLS) problem. Theoretically, we show that an $ε$ optimal solution to the SCLS (and the SPG-LS) can be achieved in $\tilde{O}(N/\sqrtε)$ floating-point operations, where $N$ is the number of nonzero entries in the data matrix. Practically, we apply two well-known methods for solving this new reformulation, i.e., the Krylov subspace method and the Riemannian trust region method. Both algorithms are factorization free so that they are suitable for solving large scale problems. Numerical results on both synthetic and real-world datasets indicate that the SPG-LS, equipped with the SCLS reformulation, can be solved orders of magnitude faster than the state of the art.

preprint2020arXiv

Edge current and orbital angular momentum of chiral superfluids revisited

Cooper pairs in chiral superfluids carry quantized units of relative orbital angular momentum (OAM). Various predictions of the intrinsic OAM density or the macroscopic OAM of a two-dimensional chiral superfluid differ by several orders of magnitude, which constitute the so-called Angular Momentum Paradox. Following several previous studies, we substantiate the semiclassical Bogoliubov-de Gennes theory of the single-particle edge current and OAM in two-dimensional chiral superfluids in the BCS limit. The analysis provides a simple intuitive understanding for the vanishing of OAM for a non-p-wave chiral superfluid (such as $d+id$) confined in a rigid potential. When generalized to anisotropic chiral superconductors and three-dimensional chiral superfluids, the theory similarly returns an accurate description. We also present a detailed numerical study of the chiral phases in the BEC limit. Our study suggests that, in both BCS and BEC phases the relative OAM of the individual Cooper pairs contribute to the total OAM additively, and that in both phases the corresponding macroscopic OAM density distribution is localized at the boundary.

preprint2020arXiv

Half-Magnetic Topological Insulator

Topological magnets are a new family of quantum materials providing great potential to realize emergent phenomena, such as quantum anomalous Hall effect and axion-insulator state. Here we present our discovery that stoichiometric ferromagnet MnBi8Te13 with natural heterostructure MnBi2Te4-(Bi2Te3)3 is an unprecedented half-magnetic topological insulator, with the magnetization existing at the MnBi2Te4 surface but not at the opposite surface terminated by triple Bi2Te3 layers. Our angle-resolved photoemission spectroscopy measurements unveil a massive Dirac gap at the MnBi2Te4 surface, and gapless Dirac cone on the other side. Remarkably, the Dirac gap (~28 meV) at MnBi2Te4 surface decreases monotonically with increasing temperature and closes right at the Curie temperature, thereby representing the first smoking-gun spectroscopic evidence of magnetization-induced topological surface gap among all known magnetic topological materials. We further demonstrate theoretically that the half-magnetic topological insulator is desirable to realize the half-quantized surface anomalous Hall effect, which serves as a direct proof of the general concept of axion electrodynamics in condensed matter systems.

preprint2020arXiv

Half-quantum vortices on c-axis domain walls in chiral p-wave superconductors

Chiral superconductors are two-fold degenerate and domains of opposite chirality can form, separated by domain walls. There are indications of such domain formation in the quasi two-dimensional putative chiral $p$-wave superconductor Sr$_2$RuO$_4$, yet no experiment has explicitly resolved individual domains in this material. In this work, $c$-axis domain walls lying parallel to the layers in chiral $p$-wave superconductors are explored from a theoretical point of view. First, using both a phenomenological Ginzburg-Landau and a quasiclassical Bogoliubov-deGennes approach, a consistent qualitative description of the domain wall structure is obtained. While these domains are decoupled in the isotropic limit, there is a finite coupling in anisotropic systems and the domain wall can be treated as an effective Josephson junction. In the second part, the formation and structure of half-quantum vortices (HQV) on such $c$-axis domain walls are discussed.

preprint2020arXiv

Minimal systems with finitely many ergodic measures

In this paper it is proved that if a minimal system has the property that its sequence entropy is uniformly bounded for all sequences, then it has only finitely many ergodic measures and is an almost finite to one extension of its maximal equicontinuous factor. This result is obtained as an application of a general criteria which states that if a minimal system is an almost finite to one extension of its maximal equicontinuous factor and has no infinite independent sets of length $k$ for some $k\ge 2$, then it has only finitely many ergodic measures.

preprint2020arXiv

Polynomial mean complexity and Logarithmic Sarnak conjecture

In this paper, we reduce the logarithmic Sarnak conjecture to the $\{0,1\}$-symbolic systems with polynomial mean complexity. By showing that the logarithmic Sarnak conjecture holds for any topologically dynamical system with sublinear complexity, we provide a variant of the $1$-Fourier uniformity conjecture, where the frequencies are restricted to any subset of $[0,1]$ with packing dimension less than one.

preprint2020arXiv

Positive entropy implies chaos along any infinite sequence

Let $G$ be an infinite countable discrete amenable group. For any $G$-action on a compact metric space $(X,ρ)$, it turns out that if the action has positive topological entropy, then for any sequence $\{s_i\}_{i=1}^{+\infty}$ with pairwise distinct elements in $G$ there exists a Cantor subset $K$ of $X$ which is Li-Yorke chaotic along this sequence, that is, for any two distinct points $x,y\in K$, one has \[\limsup_{i\to+\infty}ρ(s_i x,s_iy)>0,\ \text{and}\ \liminf_{i\to+\infty}ρ(s_ix,s_iy)=0.\]

preprint2020arXiv

Topological characteristic factors and nilsystems

We prove that the maximal infinite step pro-nilfactor $X_\infty$ of a minimal dynamical system $(X,T)$ is the topological characteristic factor in a certain sense. Namely, we show that by an almost one to one modification of $π:X \rightarrow X_\infty$, the induced open extension $π^*:X^* \rightarrow X^*_\infty$ has the following property: for $x$ in a dense $G_δ$ set of $X^*$, the orbit closure $L_x=\overline{\mathcal{O}}((x,x,\ldots,x), T\times T^2\times \ldots \times T^d)$ is $(π^*)^{(d)}$-saturated, i.e. $L_x=((π^*)^{(d)})^{-1}(π^*)^{(d)}(L_x)$. Using results derived from the above fact, we are able to answer several open questions: (1) if $(X,T^k)$ is minimal for some $k\ge 2$, then for any $d\in {\mathbb N}$ and any $0\le j<k$ there is a sequence $\{n_i\}$ of $\mathbb Z$ with $n_i\equiv j\ (\text{mod}\ k)$ such that $T^{n_i}x\rightarrow x, T^{2n_i}x\rightarrow x, \ldots, T^{dn_i}x\rightarrow x$ for $x$ in a dense $G_δ$ subset of $X$; (2) if $(X,T)$ is totally minimal, then $\{T^{n^2}x:n\in {\mathbb Z}\}$ is dense in $X$ for $x$ in a dense $G_δ$ subset of $X$; (3) for any $d\in\mathbb N$ and any minimal system, which is an open extension of its maximal distal factor, ${\bf RP}^{[d]}={\bf AP}^{[d]}$, where the latter is the regionally proximal relation of order $d$ along arithmetic progressions.

preprint2020arXiv

Vortex end Majorana zero modes in superconducting Dirac and Weyl semimetals

Time-reversal invariant (TRI) Dirac and Weyl semimetals in three dimensions (3D) can host open Fermi arcs and spin-momentum locking Fermi loops on the surfaces. We find that when they become superconducting with $s$-wave pairing and the doping is lower than a critical level, straight $π$-flux vortex lines terminating at surfaces with Fermi arcs or spin-momentum locking Fermi loops can realize 1D topological superconductivity and harbor Majorana zero modes at their ends. Remarkably, we find that the vortex-generation-associated Zeeman field can open (when the surfaces have only Fermi arcs) or enhance the topological gap protecting Majorana zero modes, which is contrary to the situation in superconducting topological insulators. By studying the tilting effect of bulk Dirac and Weyl cones, we further find that type-I Dirac and Weyl semimetals in general have a much broader topological regime than type-II ones. Our findings build up a connection between TRI Dirac and Weyl semimetals and Majorana zero modes in vortices.

preprint2019arXiv

Synchronization in discrete-time, discrete-state Random Dynamical Systems

We characterize synchronization phenomenon in discrete-time, discrete-state random dynamical systems, with random and probabilistic Boolean networks as particular examples. In terms of multiplicative ergodic properties of the induced linear cocycle, we show such a random dynamical system with finite state synchronizes if and only if the Lyapunov exponent $0$ has simple multiplicity. For the case of countable state space, characterization of synchronization is provided in terms of the spectral subspace corresponding to the Lyapunov exponent $-\infty$. In addition, for both cases of finite and countable state spaces, the mechanism of partial synchronization is described by partitioning the state set into synchronized subsets. Applications to biological networks are also discussed.