Researcher profile

Yang Xing

Yang Xing contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

MedVL-SAM2: A unified 3D medical vision-language model for multimodal reasoning and prompt-driven segmentation

Recent progress in medical vision-language models (VLMs) has achieved strong performance on image-level text-centric tasks such as report generation and visual question answering (VQA). However, achieving fine-grained visual grounding and volumetric spatial reasoning in 3D medical VLMs remains challenging, particularly when aiming to unify these capabilities within a single, generalizable framework. To address this challenge, we proposed MedVL-SAM2, a unified 3D medical multimodal model that concurrently supports report generation, VQA, and multi-paradigm segmentation, including semantic, referring, and interactive segmentation. MedVL-SAM2 integrates image-level reasoning and pixel-level perception through a cohesive architecture tailored for 3D medical imaging, and incorporates a SAM2-based volumetric segmentation module to enable precise multi-granular spatial reasoning. The model is trained in a multi-stage pipeline: it is first pre-trained on a large-scale corpus of 3D CT image-text pairs to align volumetric visual features with radiology-language embeddings. It is then jointly optimized with both language-understanding and segmentation objectives using a comprehensive 3D CT segmentation dataset. This joint training enables flexible interaction via language, point, or box prompts, thereby unifying high-level visual reasoning with spatially precise localization. Our unified architecture delivers state-of-the-art performance across report generation, VQA, and multiple 3D segmentation tasks. Extensive analyses further show that the model provides reliable 3D visual grounding, controllable interactive segmentation, and robust cross-modal reasoning, demonstrating that high-level semantic reasoning and precise 3D localization can be jointly achieved within a unified 3D medical VLM.

preprint2022arXiv

Multi-task Driver Steering Behaviour Modeling Using Time-Series Transformer

Human intention prediction provides an augmented solution for the design of assistants and collaboration between the human driver and intelligent vehicles. In this study, a multi-task sequential learning framework is developed to predict future steering torques and steering postures based on the upper limb neuromuscular Electromyography (EMG) signals. A single-right-hand driving mode is particularly studied. For this driving mode, three different driving postures are also evaluated. Then, a multi-task time-series transformer network (MTS-Trans) is developed to predict the steering torques and driving postures. To evaluate the multi-task learning performance, four different frameworks are assessed. Twenty-one participants are involved in the driving simulator-based experiment. The proposed model achieved accurate prediction results on the future steering torque prediction and driving postures recognition for single-hand driving modes. The proposed system can contribute to the development of advanced driver steering assistant systems and ensure mutual understanding between human drivers and intelligent vehicles.

preprint2021arXiv

Human-Machine Adaptive Shared Control for Safe Automated Driving under Automation Degradation

In this paper, a human-machine adaptive shared control method is proposed for automated vehicles (AVs) under automation performance degradation. First, a novel risk assessment module is proposed to monitor driving behavior and evaluate automation performance degradation for AVs. Then, an adaptive control authority allocation module is developed. In the event of any performance degradation detection, the allocated control authority of the automation system is decreased based on the assessed risk to reduce the potential risk of vehicle motion. Consequently, the control authority allocated to the human driver is adaptively increased and thus requires more driver engagement in the control loop to compensate for the automation degradation and ensure AV safety. Experimental validation is conducted under different driving scenarios. The testing results show that the proposed approach is able to effectively compensate for the performance degradation of vehicle automation through the human-machine adaptive shared control, ensuring the safety of automated driving

preprint2020arXiv

A Unified Multi-scale and Multi-task Learning Framework for Driver Behaviors Reasoning

Mutual understanding between driver and vehicle is critically important to the design of intelligent vehicles and customized interaction interface. In this study, a unified driver behavior reasoning system toward multi-scale and multi-tasks behavior recognition is proposed. Specifically, a multi-scale driver behavior recognition system is designed to recognize both the driver's physical and mental states based on a deep encoder-decoder framework. This system can jointly recognize three driver behaviors with different time scales based on the shared encoder network. Driver body postures and mental behaviors include intention and emotion are studied and identified. The encoder network is designed based on a deep convolutional neural network (CNN), and several decoders for different driver states estimation are proposed with fully connected (FC) and long short-term memory (LSTM) based recurrent neural networks (RNN). The joint feature learning with the CNN encoder increases the computational efficiency and feature diversity, while the customized decoders enable an efficient multi-tasks inference. The proposed framework can be used as a solution to exploit the relationship between different driver states, and it is found that when drivers generate lane change intentions, their emotions usually keep neutral state and more focus on the task. Two naturalistic datasets are used to investigate the model performance, which is a local highway dataset, namely, CranData and one public dataset from Brain4Cars. The testing results on these two datasets show accurate performance and outperform existing methods on driver postures, intention, and emotion recognition.

preprint2020arXiv

An Integrated Framework of Decision Making and Motion Planning for Autonomous Vehicles Considering Social Behaviors

This paper presents a novel integrated approach to deal with the decision making and motion planning for lane-change maneuvers of autonomous vehicle (AV) considering social behaviors of surrounding traffic occupants. Reflected by driving styles and intentions of surrounding vehicles, the social behaviors are taken into consideration during the modelling process. Then, the Stackelberg Game theory is applied to solve the decision-making, which is formulated as a non-cooperative game problem. Besides, potential field is adopted in the motion planning model, which uses different potential functions to describe surrounding vehicles with different behaviors and road constrains. Then, Model Predictive Control (MPC) is utilized to predict the state and trajectory of the autonomous vehicle. Finally, the decision-making and motion planning is then integrated into a constrained multi-objective optimization problem. Three testing scenarios considering different social behaviors of surrounding vehicles are carried out to validate the performance of the proposed approach. Testing results show that the integrated approach is able to address different social interactions with other traffic participants, and make proper and safe decisions and planning for autonomous vehicles, demonstrating its feasibility and effectiveness.

preprint2020arXiv

Deep Convolutional Neural Network-based Bernoulli Heatmap for Head Pose Estimation

Head pose estimation is a crucial problem for many tasks, such as driver attention, fatigue detection, and human behaviour analysis. It is well known that neural networks are better at handling classification problems than regression problems. It is an extremely nonlinear process to let the network output the angle value directly for optimization learning, and the weight constraint of the loss function will be relatively weak. This paper proposes a novel Bernoulli heatmap for head pose estimation from a single RGB image. Our method can achieve the positioning of the head area while estimating the angles of the head. The Bernoulli heatmap makes it possible to construct fully convolutional neural networks without fully connected layers and provides a new idea for the output form of head pose estimation. A deep convolutional neural network (CNN) structure with multiscale representations is adopted to maintain high-resolution information and low-resolution information in parallel. This kind of structure can maintain rich, high-resolution representations. In addition, channelwise fusion is adopted to make the fusion weights learnable instead of simple addition with equal weights. As a result, the estimation is spatially more precise and potentially more accurate. The effectiveness of the proposed method is empirically demonstrated by comparing it with other state-of-the-art methods on public datasets.

preprint2020arXiv

Defining Digital Quadruplets in the Cyber-Physical-Social Space for Parallel Driving

Parallel driving is a novel framework to synthesize vehicle intelligence and transport automation. This article aims to define digital quadruplets in parallel driving. In the cyber-physical-social systems (CPSS), based on the ACP method, the names of the digital quadruplets are first given, which are descriptive, predictive, prescriptive and real vehicles. The objectives of the three virtual digital vehicles are interacting, guiding, simulating and improving with the real vehicles. Then, the three virtual components of the digital quadruplets are introduced in detail and their applications are also illustrated. Finally, the real vehicles in the parallel driving system and the research process of the digital quadruplets are depicted. The presented digital quadruplets in parallel driving are expected to make the future connected automated driving safety, efficiently and synergistically.

preprint2020arXiv

Driving Conditions-Driven Energy Management for Hybrid Electric Vehicles: A Review

Motivated by the concerns on transported fuel consumption and global air pollution, industrial engineers, and academic researchers have made many efforts to construct more efficient and environment-friendly vehicles. Hybrid electric vehicles (HEVs) are the representative ones because they can satisfy the power demand by coordinating energy supplements among different energy storage devices. To achieve this goal, energy management approaches are crucial technology, and driving cycles are the critical influence factor. Therefore, this paper aims to summarize driving cycle-driven energy management strategies (EMSs) for HEVs. First, the definition and significance of driving cycles in the energy management field are clarified, and the recent literature in this research domain is reviewed and revisited. In addition, according to the known information of driving cycles, the EMSs are divided into three categories, and the relevant study directions, such as standard driving cycles, long-term driving cycle generation (LT-DCG) and short-term driving cycle prediction (ST-DCP) are illuminated and analyzed. Furthermore, the existing database of driving cycles in highway and urban aspects are displayed and discussed. Finally, this article also elaborates on the future prospects of energy management technologies related to driving cycles. This paper focusing on helping the relevant researchers realize the state-of-the-art of HEVs energy management field and also recognize its future development direction.

preprint2020arXiv

Human-Machine Collaboration for Automated Vehicles via an Intelligent Two-Phase Haptic Interface

Prior to realizing fully autonomous driving, human intervention will be required periodically to guarantee vehicle safety. This fact poses a new challenge in human-machine interaction, particularly during control authority transition from the automated functionality to a human driver. This paper addresses this challenge by proposing an intelligent haptic interface based on a newly developed two-phase human-machine interaction model. The intelligent haptic torque is applied on the steering wheel and switches its functionality between predictive guidance and haptic assistance according to the varying state and control ability of human drivers, helping drivers gradually resume manual control during takeover. The developed approach is validated by conducting vehicle experiments with 26 human participants. The results suggest that the proposed method can effectively enhance the driving state recovery and control performance of human drivers during takeover compared with an existing approach, further improving the safety and smoothness of the human-machine interaction in automated vehicles.

preprint2020arXiv

Interaction-Aware Trajectory Prediction of Connected Vehicles using CNN-LSTM Networks

Predicting the future trajectory of a surrounding vehicle in congested traffic is one of the basic abilities of an autonomous vehicle. In congestion, a vehicle's future movement is the result of its interaction with surrounding vehicles. A vehicle in congestion may have many neighbors in a relatively short distance, while only a small part of neighbors affect its future trajectory mostly. In this work, An interaction-aware method which predicts the future trajectory of an ego vehicle considering its interaction with eight surrounding vehicles is proposed. The dynamics of vehicles are encoded by LSTMs with shared weights, and the interaction is extracted with a simple CNN. The proposed model is trained and tested on trajectories extracted from the publicly accessible NGSIM US-101 dataset. Quantitative experimental results show that the proposed model outperforms previous models in terms of root-mean-square error (RMSE). Results visualization shows that the model is able to predict future trajectory induced by lane change before the vehicle operate obvious lateral movement to initiate lane changing.

preprint2020arXiv

Multi-modal Sensor Fusion-Based Deep Neural Network for End-to-end Autonomous Driving with Scene Understanding

This study aims to improve the performance and generalization capability of end-to-end autonomous driving with scene understanding leveraging deep learning and multimodal sensor fusion techniques. The designed end-to-end deep neural network takes as input the visual image and associated depth information in an early fusion level and outputs the pixel-wise semantic segmentation as scene understanding and vehicle control commands concurrently. The end-to-end deep learning-based autonomous driving model is tested in high-fidelity simulated urban driving conditions and compared with the benchmark of CoRL2017 and NoCrash. The testing results show that the proposed approach is of better performance and generalization ability, achieving a 100% success rate in static navigation tasks in both training and unobserved situations, as well as better success rates in other tasks than the prior models. A further ablation study shows that the model with the removal of multimodal sensor fusion or scene understanding pales in the new environment because of the false perception. The results verify that the performance of our model is improved by the synergy of multimodal sensor fusion with scene understanding subtask, demonstrating the feasibility and effectiveness of the developed deep neural network with multimodal sensor fusion.