Source author record

Han Gao

Han Gao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

17works

20topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CoINS: Counterfactual Interactive Navigation via Skill-Aware VLM

Recent Vision-Language Models (VLMs) have demonstrated significant potential in robotic planning. However, they typically function as semantic reasoners, lacking an intrinsic understanding of the specific robot's physical capabilities. This limitation is particularly critical in interactive navigation, where robots must actively modify cluttered environments to create traversable paths. Existing VLM-based navigators are predominantly confined to passive obstacle avoidance, failing to reason about when and how to interact with objects to clear blocked paths. To bridge this gap, we propose Counterfactual Interactive Navigation via Skill-aware VLM (CoINS), a hierarchical framework that integrates skill-aware reasoning and robust low-level execution. Specifically, we fine-tune a VLM, named InterNav-VLM, which incorporates skill affordance and concrete constraint parameters into the input context and grounds them into a metric-scale environmental representation. By internalizing the logic of counterfactual reasoning through fine-tuning on the proposed InterNav dataset, the model learns to implicitly evaluate the causal effects of object removal on navigation connectivity, thereby determining interaction necessity and target selection. To execute the generated high-level plans, we develop a comprehensive skill library through reinforcement learning, specifically introducing traversability-oriented strategies to manipulate diverse objects for path clearance. A systematic benchmark in Isaac Sim is proposed to evaluate both the reasoning and execution aspects of interactive navigation. Extensive simulations and real-world experiments demonstrate that CoINS significantly outperforms representative baselines, achieving a 17\% higher overall success rate and over 80\% improvement in complex long-horizon scenarios compared to the best-performing baseline

preprint2025arXiv

AINav: Large Language Model-Based Adaptive Interactive Navigation

Robotic navigation in complex environments remains a critical research challenge. Traditional navigation methods focus on optimal trajectory generation within fixed free workspace, therefore struggling in environments lacking viable paths to the goal, such as disaster zones or cluttered warehouses. To address this problem, we propose AINav, an adaptive interactive navigation approach that proactively interacts with environments to create feasible paths to achieve originally unreachable goals. Specifically, we present a primitive skill tree for task planning with large language models (LLMs), facilitating effective reasoning to determine interaction objects and sequences. To ensure robust subtask execution, we adopt reinforcement learning to pre-train a comprehensive skill library containing versatile locomotion and interaction behaviors for motion planning. Furthermore, we introduce an adaptive replanning approach featuring two LLM-based modules: an advisor serving as a flexible replanning trigger and an arborist for autonomous plan adjustment. Integrated with the tree structure, the replanning mechanism allows for convenient node addition and pruning, enabling rapid plan adaptation in a priori unknown environments. Comprehensive simulations and experiments have demonstrated AINav's effectiveness and adaptivity in diverse scenarios. The supplementary video is available at: https://youtu.be/CjXm5KFx9AI.

preprint2025arXiv

ReSPIRe: Informative and Reusable Belief Tree Search for Robot Probabilistic Search and Tracking in Unknown Environments

Target search and tracking (SAT) is a fundamental problem for various robotic applications such as search and rescue and environmental exploration. This paper proposes an informative trajectory planning approach, namely ReSPIRe, for SAT in unknown cluttered environments under considerably inaccurate prior target information and limited sensing field of view. We first develop a novel sigma point-based approximation approach to fast and accurately estimate mutual information reward under non-Gaussian belief distributions, utilizing informative sampling in state and observation spaces to mitigate the computational intractability of integral calculation. To tackle significant uncertainty associated with inadequate prior target information, we propose the hierarchical particle structure in ReSPIRe, which not only extracts critical particles for global route guidance, but also adjusts the particle number adaptively for planning efficiency. Building upon the hierarchical structure, we develop the reusable belief tree search approach to build a policy tree for online trajectory planning under uncertainty, which reuses rollout evaluation to improve planning efficiency. Extensive simulations and real-world experiments demonstrate that ReSPIRe outperforms representative benchmark methods with smaller MI approximation error, higher search efficiency, and more stable tracking performance, while maintaining outstanding computational efficiency.

preprint2025arXiv

Training Report of TeleChat3-MoE

TeleChat3-MoE is the latest series of TeleChat large language models, featuring a Mixture-of-Experts (MoE) architecture with parameter counts ranging from 105 billion to over one trillion,trained end-to-end on Ascend NPU cluster. This technical report mainly presents the underlying training infrastructure that enables reliable and efficient scaling to frontier model sizes. We detail systematic methodologies for operator-level and end-to-end numerical accuracy verification, ensuring consistency across hardware platforms and distributed parallelism strategies. Furthermore, we introduce a suite of performance optimizations, including interleaved pipeline scheduling, attention-aware data scheduling for long-sequence training,hierarchical and overlapped communication for expert parallelism, and DVM-based operator fusion. A systematic parallelization framework, leveraging analytical estimation and integer linear programming, is also proposed to optimize multi-dimensional parallelism configurations. Additionally, we present methodological approaches to cluster-level optimizations, addressing host- and device-bound bottlenecks during large-scale training tasks. These infrastructure advancements yield significant throughput improvements and near-linear scaling on clusters comprising thousands of devices, providing a robust foundation for large-scale language model development on hardware ecosystems.

preprint2025arXiv

VLA-RAIL: A Real-Time Asynchronous Inference Linker for VLA Models and Robots

Vision-Language-Action (VLA) models have achieved remarkable breakthroughs in robotics, with the action chunk playing a dominant role in these advances. Given the real-time and continuous nature of robotic motion control, the strategies for fusing a queue of successive action chunks have a profound impact on the overall performance of VLA models. Existing methods suffer from jitter, stalling, or even pauses in robotic action execution, which not only limits the achievable execution speed but also reduces the overall success rate of task completion. This paper introduces VLA-RAIL (A Real-Time Asynchronous Inference Linker), a novel framework designed to address these issues by conducting model inference and robot motion control asynchronously and guaranteeing smooth, continuous, and high-speed action execution. The core contributions of the paper are two fold: a Trajectory Smoother that effectively filters out the noise and jitter in the trajectory of one action chunk using polynomial fitting and a Chunk Fuser that seamlessly align the current executing trajectory and the newly arrived chunk, ensuring position, velocity, and acceleration continuity between two successive action chunks. We validate the effectiveness of VLA-RAIL on a benchmark of dynamic simulation tasks and several real-world manipulation tasks. Experimental results demonstrate that VLA-RAIL significantly reduces motion jitter, enhances execution speed, and improves task success rates, which will become a key infrastructure for the large-scale deployment of VLA models.

preprint2022arXiv

Bone marrow sparing for cervical cancer radiotherapy on multimodality medical images

Cervical cancer threatens the health of women seriously. Radiotherapy is one of the main therapy methods but with high risk of acute hematologic toxicity. Delineating the bone marrow (BM) for sparing using computer tomography (CT) images to plan before radiotherapy can effectively avoid this risk. Comparing with magnetic resonance (MR) images, CT lacks the ability to express the activity of BM. Thus, in current clinical practice, medical practitioners manually delineate the BM on CT images by corresponding to MR images. However, the time?consuming delineating BM by hand cannot guarantee the accuracy due to the inconsistency of the CT-MR multimodal images. In this study, we propose a multimodal image oriented automatic registration method for pelvic BM sparing, which consists of three-dimensional bone point cloud reconstruction, a local spherical system iteration closest point registration for marking BM on CT images. Experiments on patient dataset reveal that our proposed method can enhance the multimodal image registration accuracy and efficiency for medical practitioners in sparing BM of cervical cancer radiotherapy. The method proposed in this contribution might also provide references for similar studies in other clinical application.

preprint2022arXiv

Depth-Assisted ResiDualGAN for Cross-Domain Aerial Images Semantic Segmentation

Unsupervised domain adaptation (UDA) is an approach to minimizing domain gap. Generative methods are common approaches to minimizing the domain gap of aerial images which improves the performance of the downstream tasks, e.g., cross-domain semantic segmentation. For aerial images, the digital surface model (DSM) is usually available in both the source domain and the target domain. Depth information in DSM brings external information to generative models. However, little research utilizes it. In this paper, depth-assisted ResiDualGAN (DRDG) is proposed where depth supervised loss (DSL), and depth cycle consistency loss (DCCL) are used to bring depth information into the generative model. Experimental results show that DRDG reaches state-of-the-art accuracy between generative methods in cross-domain semantic segmentation tasks.

preprint2022arXiv

Observation of Coexisting Dirac Bands and Moiré Flat Bands in Magic-Angle Twisted Trilayer Graphene

Moiré superlattices that consist of two or more layers of two-dimensional materials stacked together with a small twist angle have emerged as a tunable platform to realize various correlated and topological phases, such as Mott insulators, unconventional uperconductivity and quantum anomalous Hall effect. Recently, the magic-angle twisted trilayer graphene (MATTG) has shown both robust superconductivity similar to magic-angle twisted bilayer graphene (MATBG) and other unique properties, including the Pauli-limit violating and re-entrant superconductivity. These rich properties are deeply rooted in its electronic structure under the influence of distinct moiré potential and mirror symmetry. Here, combining nanometer-scale spatially resolved angle-resolved photoemission spectroscopy (nano-ARPES) and scanning tunneling microscopy/spectroscopy (STM/STS), we systematically measure the yet unexplored band structure of MATTG near charge neutrality. Our measurements reveal the coexistence of the distinct dispersive Dirac band with the emergent moiré flat band, showing nice agreement with the theoretical calculations. These results serve as a stepstone for further understanding of the unconventional superconductivity in MATTG.

preprint2022arXiv

Predicting Physics in Mesh-reduced Space with Temporal Attention

Graph-based next-step prediction models have recently been very successful in modeling complex high-dimensional physical systems on irregular meshes. However, due to their short temporal attention span, these models suffer from error accumulation and drift. In this paper, we propose a new method that captures long-term dependencies through a transformer-style temporal attention model. We introduce an encoder-decoder structure to summarize features and create a compact mesh representation of the system state, to allow the temporal model to operate on a low-dimensional mesh representations in a memory efficient manner. Our method outperforms a competitive GNN baseline on several complex fluid dynamics prediction tasks, from sonic shocks to vascular flow. We demonstrate stable rollouts without the need for training noise and show perfectly phase-stable predictions even for very long sequences. More broadly, we believe our approach paves the way to bringing the benefits of attention-based sequence models to solving high-dimensional complex physics tasks.

preprint2022arXiv

Sequence-to-Sequence Voice Reconstruction for Silent Speech in a Tonal Language

Silent Speech Decoding (SSD), based on articulatory neuromuscular activities, has become a prevalent task of Brain-Computer Interface (BCI) in recent years. Many works have been devoted to decoding surface electromyography (sEMG) from articulatory neuromuscular activities. However, restoring silent speech in tonal languages such as Mandarin Chinese is still difficult. This paper proposes an optimized Sequence-to-Sequence (Seq2Seq) approach to synthesize voice from the sEMG-based silent speech. We extract duration information to regulate the sEMG-based silent speech using the audio length. Then, we provide a deep-learning model with an encoder-decoder structure and a state-of-art vocoder to generate the audio waveform. Experiments based on six Mandarin Chinese speakers demonstrate that the proposed model can successfully decode silent speech in Mandarin Chinese and achieve a character error rate (CER) of 6.41% on average with human evaluation.

preprint2022arXiv

Towards Efficient Visual Simplification of Computational Graphs in Deep Neural Networks

A computational graph in a deep neural network (DNN) denotes a specific data flow diagram (DFD) composed of many tensors and operators. Existing toolkits for visualizing computational graphs are not applicable when the structure is highly complicated and large-scale (e.g., BERT [1]). To address this problem, we propose leveraging a suite of visual simplification techniques, including a cycle-removing method, a module-based edge-pruning algorithm, and an isomorphic subgraph stacking strategy. We design and implement an interactive visualization system that is suitable for computational graphs with up to 10 thousand elements. Experimental results and usage scenarios demonstrate that our tool reduces 60% elements on average and hence enhances the performance for recognizing and diagnosing DNN models. Our contributions are integrated into an open-source DNN visualization toolkit, namely, MindInsight [2].

preprint2021arXiv

Physics-informed graph neural Galerkin networks: A unified framework for solving PDE-governed forward and inverse problems

Despite the great promise of the physics-informed neural networks (PINNs) in solving forward and inverse problems, several technical challenges are present as roadblocks for more complex and realistic applications. First, most existing PINNs are based on point-wise formulation with fully-connected networks to learn continuous functions, which suffer from poor scalability and hard boundary enforcement. Second, the infinite search space over-complicates the non-convex optimization for network training. Third, although the convolutional neural network (CNN)-based discrete learning can significantly improve training efficiency, CNNs struggle to handle irregular geometries with unstructured meshes. To properly address these challenges, we present a novel discrete PINN framework based on graph convolutional network (GCN) and variational structure of PDE to solve forward and inverse partial differential equations (PDEs) in a unified manner. The use of a piecewise polynomial basis can reduce the dimension of search space and facilitate training and convergence. Without the need of tuning penalty parameters in classic PINNs, the proposed method can strictly impose boundary conditions and assimilate sparse data in both forward and inverse settings. The flexibility of GCNs is leveraged for irregular geometries with unstructured meshes. The effectiveness and merit of the proposed method are demonstrated over a variety of forward and inverse computational mechanics problems governed by both linear and nonlinear PDEs.

preprint2020arXiv

Context Detection for Advanced Self-Aware Navigation using Smartphone Sensors

Navigation and positioning systems dependent on both the operating environment and the behaviour of the host vehicle or user. The environment determines the type and quality of radio signals available for positioning and the behaviour can contribute additional information to the navigation solution. In order to operate across different contexts, a context-adaptive navigation solution introduces an element of self-awareness by detecting the operating context and configuring the positioning system accordingly. This paper presents the detection of both environmental and behavioural contexts as a whole, building the foundation of a context-adaptive navigation system. Behavioural contexts are classified using measurements from accelerometers, gyroscopes, magnetometers and the barometer by supervised machine learning algorithms, yielding an overall 95% classification accuracy. A connectivity dependent filter is then implemented to improve the behavioural detection results. Environmental contexts are detected from GNSS measurements. They are classified into indoor, intermediate and outdoor categories using a probabilistic support vector machine (SVM), followed by a hidden Markov model (HMM) used for time-domain filtering. As there will never be completely reliable context detection, the paper also shows how environment and behaviour association can contribute to reducing the chances of the context determination algorithms selecting an incorrect context. Finally, the proposed context-determination algorithms are tested in a series of multi-context scenarios.

preprint2020arXiv

Non-intrusive model reduction of large-scale, nonlinear dynamical systems using deep learning

Projection-based model reduction has become a popular approach to reduce the cost associated with integrating large-scale dynamical systems so they can be used in many-query settings such as optimization and uncertainty quantification. For nonlinear systems, significant cost reduction is only possible with an additional layer of approximation to reduce the computational bottleneck of evaluating the projected nonlinear terms. Prevailing methods to approximate the nonlinear terms are code intrusive, potentially requiring years of development time to integrate into an existing codebase, and have been known to lack parametric robustness. This work develops a non-intrusive method to efficiently and accurately approximate the expensive nonlinear terms that arise in reduced nonlinear dynamical system using deep neural networks. The neural network is trained using only the simulation data used to construct the reduced basis and evaluations of the nonlinear terms at these snapshots. Once trained, the neural network-based reduced-order model only requires forward and backward propagation through the network to evaluate the nonlinear term and its derivative, which are used to integrate the reduced dynamical system at a new parameter configuration. We provide two numerical experiments---the dynamical systems result from the semi-discretization of parametrized, nonlinear, hyperbolic partial differential equations---that show, in addition to non-intrusivity, the proposed approach provides more stable and accurate approximations to each dynamical system across a large number of training and testing points than the popular empirical interpolation method.

preprint2020arXiv

Photon Self-energy in Magnetized Chiral Plasma from Kinetic Theory

We study the photon self-energy in magnetized chiral plasma by solving the response of electromagnetic field perturbations in chiral kinetic theory with Landau level states. With lowest Landau level approximation and in collisionless limit, we find solutions for three particular perturbations: parallel electric field, static perpendicular electric and magnetic field, corresponding to chiral magnetic wave, drift state and tilted state, from which we extract components of photon self-energy in different kinematics. We show no solution is possible for more general field perturbations. We argue this is an artifact of the collisionless limit: while static solution corresponding to drift state and tilted state can be found, they cannot be realized dynamically without interaction between Landau levels. We also discuss possible manifestation of side-jump effect due to both boost and rotation, with the latter due to the presence of background magnetic field.

preprint2020arXiv

Structural Controllability of Undirected Diffusive Networks with Vector-Weighted Edges

In this paper, controllability of undirected networked systems with {diffusively coupled subsystems} is considered, where each subsystem is of {identically {\emph{fixed}}} general high-order single-input-multi-output dynamics. The underlying graph of the network topology is {\emph{vector-weighted}}, rather than scalar-weighted. The aim is to find conditions under which the networked system is structurally controllable, i.e., for almost all vector values for interaction links of the network topology, the corresponding system is controllable. It is proven that, the networked system is structurally controllable, if and only if each subsystem is controllable and observable, and the network topology is globally input-reachable. These conditions are further extended to the cases {with multi-input-multi-output subsystems and matrix-weighted edges,} or where both directed and undirected interaction links exist.

preprint2014arXiv

Color Filtering Localization for Three-Dimensional Underwater Acoustic Sensor Networks

Accurate localization for mobile nodes has been an important and fundamental problem in underwater acoustic sensor networks (UASNs). The detection information returned from a mobile node is meaningful only if its location is known. In this paper, we propose two localization algorithms based on color filtering technology called PCFL and ACFL. PCFL and ACFL aim at collaboratively accomplishing accurate localization of underwater mobile nodes with minimum energy expenditure. They both adopt the overlapping signal region of task anchors which can communicate with the mobile node directly as the current sampling area. PCFL employs the projected distances between each of the task projections and the mobile node, while ACFL adopts the direct distance between each of the task anchors and the mobile node. Also the proportion factor of distance is proposed to weight the RGB values. By comparing the nearness degrees of the RGB sequences between the samples and the mobile node, samples can be filtered out. And the normalized nearness degrees are considered as the weighted standards to calculate coordinates of the mobile nodes. The simulation results show that the proposed methods have excellent localization performance and can timely localize the mobile node. The average localization error of PCFL can decline by about 30.4% than the AFLA method.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2503.22942:author:6:han-gao

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2512.24157:author:11:han-gao

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2512.24673:author:6:han-gao

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2512.24680:author:3:han-gao

Imported May 21, 2026Synced May 21, 2026

3 works

Chang Liu

Researcher

Chang Liu contributes to research discovery and scholarly infrastructure.

Open to collaborate

3 works

Jian-Xun Wang

Researcher

Jian-Xun Wang contributes to research discovery and scholarly infrastructure.

Open to collaborate

3 works

Kangjie Zhou

Researcher

Kangjie Zhou contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Matthew J. Zahr

Researcher

Matthew J. Zahr contributes to research discovery and scholarly infrastructure.

Open to collaborate

Han Gao

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

CoINS: Counterfactual Interactive Navigation via Skill-Aware VLM

AINav: Large Language Model-Based Adaptive Interactive Navigation

ReSPIRe: Informative and Reusable Belief Tree Search for Robot Probabilistic Search and Tracking in Unknown Environments

Training Report of TeleChat3-MoE

VLA-RAIL: A Real-Time Asynchronous Inference Linker for VLA Models and Robots

Bone marrow sparing for cervical cancer radiotherapy on multimodality medical images

Depth-Assisted ResiDualGAN for Cross-Domain Aerial Images Semantic Segmentation

Observation of Coexisting Dirac Bands and Moiré Flat Bands in Magic-Angle Twisted Trilayer Graphene

Predicting Physics in Mesh-reduced Space with Temporal Attention

Sequence-to-Sequence Voice Reconstruction for Silent Speech in a Tonal Language

Towards Efficient Visual Simplification of Computational Graphs in Deep Neural Networks

Physics-informed graph neural Galerkin networks: A unified framework for solving PDE-governed forward and inverse problems

Context Detection for Advanced Self-Aware Navigation using Smartphone Sensors

Non-intrusive model reduction of large-scale, nonlinear dynamical systems using deep learning

Photon Self-energy in Magnetized Chiral Plasma from Kinetic Theory

Structural Controllability of Undirected Diffusive Networks with Vector-Weighted Edges

Color Filtering Localization for Three-Dimensional Underwater Acoustic Sensor Networks