Researcher profile

Xiangfeng Wang

Xiangfeng Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

STEP3-VL-10B Technical Report

We present STEP3-VL-10B, a lightweight open-source foundation model designed to redefine the trade-off between compact efficiency and frontier-level multimodal intelligence. STEP3-VL-10B is realized through two strategic shifts: first, a unified, fully unfrozen pre-training strategy on 1.2T multimodal tokens that integrates a language-aligned Perception Encoder with a Qwen3-8B decoder to establish intrinsic vision-language synergy; and second, a scaled post-training pipeline featuring over 1k iterations of reinforcement learning. Crucially, we implement Parallel Coordinated Reasoning (PaCoRe) to scale test-time compute, allocating resources to scalable perceptual reasoning that explores and synthesizes diverse visual hypotheses. Consequently, despite its compact 10B footprint, STEP3-VL-10B rivals or surpasses models 10$\times$-20$\times$ larger (e.g., GLM-4.6V-106B, Qwen3-VL-235B) and top-tier proprietary flagships like Gemini 2.5 Pro and Seed-1.5-VL. Delivering best-in-class performance, it records 92.2% on MMBench and 80.11% on MMMU, while excelling in complex reasoning with 94.43% on AIME2025 and 75.95% on MathVision. We release the full model suite to provide the community with a powerful, efficient, and reproducible baseline.

preprint2022arXiv

Dealing with Non-Stationarity in MARL via Trust-Region Decomposition

Non-stationarity is one thorny issue in cooperative multi-agent reinforcement learning (MARL). One of the reasons is the policy changes of agents during the learning process. Some existing works have discussed various consequences caused by non-stationarity with several kinds of measurement indicators. This makes the objectives or goals of existing algorithms are inevitably inconsistent and disparate. In this paper, we introduce a novel notion, the $δ$-measurement, to explicitly measure the non-stationarity of a policy sequence, which can be further proved to be bounded by the KL-divergence of consecutive joint policies. A straightforward but highly non-trivial way is to control the joint policies' divergence, which is difficult to estimate accurately by imposing the trust-region constraint on the joint policy. Although it has lower computational complexity to decompose the joint policy and impose trust-region constraints on the factorized policies, simple policy factorization like mean-field approximation will lead to more considerable policy divergence, which can be considered as the trust-region decomposition dilemma. We model the joint policy as a pairwise Markov random field and propose a trust-region decomposition network (TRD-Net) based on message passing to estimate the joint policy divergence more accurately. The Multi-Agent Mirror descent policy algorithm with Trust region decomposition, called MAMT, is established by adjusting the trust-region of the local policies adaptively in an end-to-end manner. MAMT can approximately constrain the consecutive joint policies' divergence to satisfy $δ$-stationarity and alleviate the non-stationarity problem. Our method can bring noticeable and stable performance improvement compared with baselines in cooperative tasks of different complexity.

preprint2022arXiv

Multi-Agent Path Finding with Prioritized Communication Learning

Multi-agent pathfinding (MAPF) has been widely used to solve large-scale real-world problems, e.g., automation warehouses. The learning-based, fully decentralized framework has been introduced to alleviate real-time problems and simultaneously pursue optimal planning policy. However, existing methods might generate significantly more vertex conflicts (or collisions), which lead to a low success rate or more makespan. In this paper, we propose a PrIoritized COmmunication learning method (PICO), which incorporates the \textit{implicit} planning priorities into the communication topology within the decentralized multi-agent reinforcement learning framework. Assembling with the classic coupled planners, the implicit priority learning module can be utilized to form the dynamic communication topology, which also builds an effective collision-avoiding mechanism. PICO performs significantly better in large-scale MAPF tasks in success rates and collision rates than state-of-the-art learning-based planners.

preprint2021arXiv

Structured Diversification Emergence via Reinforced Organization Control and Hierarchical Consensus Learning

When solving a complex task, humans will spontaneously form teams and to complete different parts of the whole task, respectively. Meanwhile, the cooperation between teammates will improve efficiency. However, for current cooperative MARL methods, the cooperation team is constructed through either heuristics or end-to-end blackbox optimization. In order to improve the efficiency of cooperation and exploration, we propose a structured diversification emergence MARL framework named {\sc{Rochico}} based on reinforced organization control and hierarchical consensus learning. {\sc{Rochico}} first learns an adaptive grouping policy through the organization control module, which is established by independent multi-agent reinforcement learning. Further, the hierarchical consensus module based on the hierarchical intentions with consensus constraint is introduced after team formation. Simultaneously, utilizing the hierarchical consensus module and a self-supervised intrinsic reward enhanced decision module, the proposed cooperative MARL algorithm {\sc{Rochico}} can output the final diversified multi-agent cooperative policy. All three modules are organically combined to promote the structured diversification emergence. Comparative experiments on four large-scale cooperation tasks show that {\sc{Rochico}} is significantly better than the current SOTA algorithms in terms of exploration efficiency and cooperation strength.

preprint2020arXiv

Electrical detection of the inverse Edelstein effect on the surface of SmB$_6$

We report the measurement of spin current induced charge accumulation, the inverse Edelstein effect (IEE), on the surface of a candidate topological Kondo insulator SmB6 single crystal. Robust surface conduction channel of SmB6 has been shown to exhibit large degree of spin-momentum locking, and spin polarized current through an external ferromagnetic contact induces the spin dependent charge accumulation on the surface of SmB6. The dependences of the IEE signal on the bias current, an external magnetic field direction and temperature are consistent with the anticlockwise spin texture for the surface band in SmB6 in the momentum space, and the direction and magnitude of the effect compared with the normal Edelstein signal are clearly explained by the Onsager reciprocal relation. Furthermore, we estimate spin-to-charge conversion efficiency, the IEE length, as 4.46 nm that is an order of magnitude larger than the efficiency found in other typical Rashba interfaces, implying that the Rashba contribution to the IEE signal could be small. Building upon existing reports on the surface charge and spin conduction nature on this material, our results provide additional evidence that the surface of SmB6 supports spin polarized conduction channel.

preprint2020arXiv

FDA3 : Federated Defense Against Adversarial Attacks for Cloud-Based IIoT Applications

Along with the proliferation of Artificial Intelligence (AI) and Internet of Things (IoT) techniques, various kinds of adversarial attacks are increasingly emerging to fool Deep Neural Networks (DNNs) used by Industrial IoT (IIoT) applications. Due to biased training data or vulnerable underlying models, imperceptible modifications on inputs made by adversarial attacks may result in devastating consequences. Although existing methods are promising in defending such malicious attacks, most of them can only deal with limited existing attack types, which makes the deployment of large-scale IIoT devices a great challenge. To address this problem, we present an effective federated defense approach named FDA3 that can aggregate defense knowledge against adversarial examples from different sources. Inspired by federated learning, our proposed cloud-based architecture enables the sharing of defense capabilities against different attacks among IIoT devices. Comprehensive experimental results show that the generated DNNs by our approach can not only resist more malicious attacks than existing attack-specific adversarial training methods, but also can prevent IIoT applications from new attacks.

preprint2020arXiv

Learning Structured Communication for Multi-agent Reinforcement Learning

This work explores the large-scale multi-agent communication mechanism under a multi-agent reinforcement learning (MARL) setting. We summarize the general categories of topology for communication structures in MARL literature, which are often manually specified. Then we propose a novel framework termed as Learning Structured Communication (LSC) by using a more flexible and efficient communication topology. Our framework allows for adaptive agent grouping to form different hierarchical formations over episodes, which is generated by an auxiliary task combined with a hierarchical routing protocol. Given each formed topology, a hierarchical graph neural network is learned to enable effective message information generation and propagation among inter- and intra-group communications. In contrast to existing communication mechanisms, our method has an explicit while learnable design for hierarchical communication. Experiments on challenging tasks show the proposed LSC enjoys high communication efficiency, scalability, and global cooperation capability.

preprint2019arXiv

Definitive Surface Magnetotransport Study of SmB$_{6}$

After the theoretical prediction that SmB$_6$ is a topological Kondo insulator, there has been an explosion of studies on the SmB$_6$ surface. However, there is not yet an agreement on even the most basic quantities such as the surface carrier density and mobility. In this paper, we carefully revisit Corbino disk magnetotransport studies to find those surface transport parameters. We first show that subsurface cracks exist in the SmB$_6$ crystals, arising both from surface preparation and during the crystal growth. We provide evidence that these hidden subsurface cracks are additional conduction channels, and the large disagreement between earlier surface SmB$_6$ studies may originate from previous interpretations not taking this extra conduction path into account. We provide an update of a more reliable magnetotransport data than the previous one (Phys. Rev. B 92, 115110) and find that the orders-of-magnitude large disagreements in carrier density and mobility come from the surface preparation and the transport geometry rather than the intrinsic sample quality. From this magnetotransport study, we find an updated estimate of the carrier density and mobility of 2.71$\times$10$^{13}$ (1/cm$^2$) and 104.5 (cm$^{2}$/V$\cdot$sec), respectively. We compare our results with other studies of the SmB$_6$ surface. By this comparison, we provide insight into the disagreements and agreements of the previously reported angle-resolved photoemission spectroscopy, scanning tunneling microscopy, and magnetotorque quantum oscillations measurements.

preprint2018arXiv

Imaging emergent heavy Dirac fermions of a topological Kondo insulator

Kondo insulators are primary candidates in the search for strongly correlated topological quantum phases, which may host topological order, fractionalization, and non-Abelian statistics. Within some Kondo insulators, the hybridization gap is predicted to protect a nontrivial topological invariant and to harbor emergent heavy Dirac fermion surface modes. We use high-energy-resolution spectroscopic imaging in real and momentum space on the Kondo insulator, SmB$_6$. On cooling through $T^*_Δ\approx$ 35 K we observe the opening of an insulating gap that expands to $Δ\approx$ 10 meV at 2 K. Within the gap, we image the formation of linearly dispersing surface states with effective masses reaching $m^* = (410\pm20)m_e$. We thus demonstrate existence of a strongly correlated topological Kondo insulator phase hosting the heaviest known Dirac fermions.