Source author record

Zhi-Xin Yang

Zhi-Xin Yang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision quant-ph eess.IV Neural and Evolutionary Computing physics.optics Robotics

Catalog footprint

What is connected

8works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

SpaAct: Spatially-Activated Transition Learning with Curriculum Adaptation for Vision-Language Navigation

Vision-and-Language Navigation (VLN) aims to enable an embodied agent to follow natural-language instructions and navigate to a target location in unseen 3D environments. We argue that adapting VLMs to VLN requires endowing them with two complementary capabilities for acquiring such awareness, namely backward action reasoning (why) and forward transition prediction~(how). Based on this insight, we propose SpaAct, a simple yet effective training framework that activates the dynamic spatial awareness in VLMs. Specifically, SpaAct introduces two spatial activation tasks: Action Retrospection, which asks the model to infer the executed action sequence from visual transitions, and Future Frame Selection, which forces the model to predict the visual transitions conditioned on history and action. These two objectives provide lightweight supervision on both backward action reasoning and forward transition prediction, encouraging the model to build dynamic spatial awareness in a VLM-friendly way. To further stabilize adaptation, we design TriPA, a Tri-factor Progressive Adaptive curriculum learning method that organizes training samples from easy to hard, allowing the model to gradually acquire navigation skills from basic locomotion to long-horizon reasoning. Experiments on standard VLN-CE benchmarks show that SpaAct consistently improves VLM-based navigation and achieves state-of-the-art performance. We will release the code and models to support future research.

preprint2026arXiv

VILTA: A VLM-in-the-Loop Adversary for Enhancing Driving Policy Robustness

The safe deployment of autonomous driving (AD) systems is fundamentally hindered by the long-tail problem, where rare yet critical driving scenarios are severely underrepresented in real-world data. Existing solutions including safety-critical scenario generation and closed-loop learning often rely on rule-based heuristics, resampling methods and generative models learned from offline datasets, limiting their ability to produce diverse and novel challenges. While recent works leverage Vision Language Models (VLMs) to produce scene descriptions that guide a separate, downstream model in generating hazardous trajectories for agents, such two-stage framework constrains the generative potential of VLMs, as the diversity of the final trajectories is ultimately limited by the generalization ceiling of the downstream algorithm. To overcome these limitations, we introduce VILTA (VLM-In-the-Loop Trajectory Adversary), a novel framework that integrates a VLM into the closed-loop training of AD agents. Unlike prior works, VILTA actively participates in the training loop by comprehending the dynamic driving environment and strategically generating challenging scenarios through direct, fine-grained editing of surrounding agents' future trajectories. This direct-editing approach fully leverages the VLM's powerful generalization capabilities to create a diverse curriculum of plausible yet challenging scenarios that extend beyond the scope of traditional methods. We demonstrate that our approach substantially enhances the safety and robustness of the resulting AD policy, particularly in its ability to navigate critical long-tail events.

preprint2022arXiv

Efficient Jacobian-Based Inverse Kinematics with Sim-to-Real Transfer of Soft Robots by Learning

This paper presents an efficient learning-based method to solve the inverse kinematic (IK) problem on soft robots with highly non-linear deformation. The major challenge of efficiently computing IK for such robots is due to the lack of analytical formulation for either forward or inverse kinematics. To address this challenge, we employ neural networks to learn both the mapping function of forward kinematics and also the Jacobian of this function. As a result, Jacobian-based iteration can be applied to solve the IK problem. A sim-to-real training transfer strategy is conducted to make this approach more practical. We first generate a large number of samples in a simulation environment for learning both the kinematic and the Jacobian networks of a soft robot design. Thereafter, a sim-to-real layer of differentiable neurons is employed to map the results of simulation to the physical hardware, where this sim-to-real layer can be learned from a very limited number of training samples generated on the hardware. The effectiveness of our approach has been verified on pneumatic-driven soft robots for path following and interactive positioning.

preprint2022arXiv

Group frame neural network of moving object ghost imaging combined with frame merging algorithm

The nature of multiple samples to extract correlation information limits the applications of ghost imaging of moving objects. A novel multi-to-one neural network is proposed and the concept of "batch frame" is introduced to improve the serial imaging method. The neural network extracts more correlation information from a small number of samples, thus reducing the sampling ratio of the ghost imaging technique. We combine the correlation characteristics between images to propose a frame merging algorithm, which eliminates the dynamic blur of high-speed moving objects and further improves the reconstruction quality of moving object images at a low sampling ratio. The experimental results are consistent with the simulation results.

preprint2022arXiv

PU-EVA: An Edge Vector based Approximation Solution for Flexible-scale Point Cloud Upsampling

High-quality point clouds have practical significance for point-based rendering, semantic understanding, and surface reconstruction. Upsampling sparse, noisy and nonuniform point clouds for a denser and more regular approximation of target objects is a desirable but challenging task. Most existing methods duplicate point features for upsampling, constraining the upsampling scales at a fixed rate. In this work, the flexible upsampling rates are achieved via edge vector based affine combinations, and a novel design of Edge Vector based Approximation for Flexible-scale Point clouds Upsampling (PU-EVA) is proposed. The edge vector based approximation encodes the neighboring connectivity via affine combinations based on edge vectors, and restricts the approximation error within the second-order term of Taylor's Expansion. The EVA upsampling decouples the upsampling scales with network architecture, achieving the flexible upsampling rates in one-time training. Qualitative and quantitative evaluations demonstrate that the proposed PU-EVA outperforms the state-of-the-art in terms of proximity-to-surface, distribution uniformity, and geometric details preservation.

preprint2020arXiv

AMPSO: Artificial Multi-Swarm Particle Swarm Optimization

In this paper we propose a novel artificial multi-swarm PSO which consists of an exploration swarm, an artificial exploitation swarm and an artificial convergence swarm. The exploration swarm is a set of equal-sized sub-swarms randomly distributed around the particles space, the exploitation swarm is artificially generated from a perturbation of the best particle of exploration swarm for a fixed period of iterations, and the convergence swarm is artificially generated from a Gaussian perturbation of the best particle in the exploitation swarm as it is stagnated. The exploration and exploitation operations are alternatively carried out until the evolution rate of the exploitation is smaller than a threshold or the maximum number of iterations is reached. An adaptive inertia weight strategy is applied to different swarms to guarantee their performances of exploration and exploitation. To guarantee the accuracy of the results, a novel diversity scheme based on the positions and fitness values of the particles is proposed to control the exploration, exploitation and convergence processes of the swarms. To mitigate the inefficiency issue due to the use of diversity, two swarm update techniques are proposed to get rid of lousy particles such that nice results can be achieved within a fixed number of iterations. The effectiveness of AMPSO is validated on all the functions in the CEC2015 test suite, by comparing with a set of comprehensive set of 16 algorithms, including the most recently well-performing PSO variants and some other non-PSO optimization algorithms.

preprint2020arXiv

Ground state cooling of magnomechanical resonator in PT-symmetric cavity magnomechanical system at room temperature

We propose to realize the ground state cooling of magnomechanical resonator in a parity-time (PT)-symmetric cavity magnomechanical system composed of a loss ferromagnetic sphere and a gain microwave cavity. In the scheme, the magnomechanical resonator can be cooled close to its ground state via the magnomechanical interaction, and it is found that the cooling effect in PT-symmetric system is much higher than that in non-PT-symmetric system. Resorting to the magnetic force noise spectrum, we investigate the final mean phonon number with experimentally feasible parameters and find surprisingly that the ground state cooling of magnomechanical resonator can be directly achieved at room temperature. Furthermore, we also illustrate that the ground state cooling can be flexibly controlled via the external magnetic field.

preprint2020arXiv

Magnon blockade in a PT-symmetric-like cavity magnomechanical system

We investigate the magnon blockade effect in a parity-time (PT) symmetric-like three-mode cavity magnomechanical system involving the magnon-photon and magnon-phonon interactions. In the broken and unbroken PT-symmetric regions, we respectively calculate the second-order correlation function analytically and numerically and further determine the optimal value of detuning. By adjusting different system parameters, we study the different blockade mechanisms and find that the perfect magnon blockade effect can be observed under the weak parameter mechanism. Our work paves a way to achieve the magnon blockade in experiment.

Zhi-Xin Yang

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

SpaAct: Spatially-Activated Transition Learning with Curriculum Adaptation for Vision-Language Navigation

VILTA: A VLM-in-the-Loop Adversary for Enhancing Driving Policy Robustness

Efficient Jacobian-Based Inverse Kinematics with Sim-to-Real Transfer of Soft Robots by Learning

Group frame neural network of moving object ghost imaging combined with frame merging algorithm

PU-EVA: An Edge Vector based Approximation Solution for Flexible-scale Point Cloud Upsampling

AMPSO: Artificial Multi-Swarm Particle Swarm Optimization

Ground state cooling of magnomechanical resonator in PT-symmetric cavity magnomechanical system at room temperature

Magnon blockade in a PT-symmetric-like cavity magnomechanical system