Researcher profile

Mao Shan

Mao Shan contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2022arXiv

A Novel Probabilistic V2X Data Fusion Framework for Cooperative Perception

The paper addresses the vehicle-to-X (V2X) data fusion for cooperative or collective perception (CP). This emerging and promising intelligent transportation systems (ITS) technology has enormous potential for improving efficiency and safety of road transportation. Recent advances in V2X communication primarily address the definition of V2X messages and data dissemination amongst ITS stations (ITS-Ss) in a traffic environment. Yet, a largely unsolved problem is how a connected vehicle (CV) can efficiently and consistently fuse its local perception information with the data received from other ITS-Ss. In this paper, we present a novel data fusion framework to fuse the local and V2X perception data for CP that considers the presence of cross-correlation. The proposed approach is validated through comprehensive results obtained from numerical simulation, CARLA simulation, and real-world experimentation that incorporates V2X-enabled intelligent platforms. The real-world experiment includes a CV, a connected and automated vehicle (CAV), and an intelligent roadside unit (IRSU) retrofitted with vision and lidar sensors. We also demonstrate how the fused CP information can improve the awareness of vulnerable road users (VRU) for CV/CAV, and how this information can be considered in path planning/decision making within the CAV to facilitate safe interactions.

preprint2022arXiv

Viewer-Centred Surface Completion for Unsupervised Domain Adaptation in 3D Object Detection

Every autonomous driving dataset has a different configuration of sensors, originating from distinct geographic regions and covering various scenarios. As a result, 3D detectors tend to overfit the datasets they are trained on. This causes a drastic decrease in accuracy when the detectors are trained on one dataset and tested on another. We observe that lidar scan pattern differences form a large component of this reduction in performance. We address this in our approach, SEE-VCN, by designing a novel viewer-centred surface completion network (VCN) to complete the surfaces of objects of interest within an unsupervised domain adaptation framework, SEE. With SEE-VCN, we obtain a unified representation of objects across datasets, allowing the network to focus on learning geometry, rather than overfitting on scan patterns. By adopting a domain-invariant representation, SEE-VCN can be classed as a multi-target domain adaptation approach where no annotations or re-training is required to obtain 3D detections for new scan patterns. Through extensive experiments, we show that our approach outperforms previous domain adaptation methods in multiple domain adaptation settings. Our code and data are available at https://github.com/darrenjkt/SEE-VCN.

preprint2021arXiv

Demonstrations of Cooperative Perception: Safety and Robustness in Connected and Automated Vehicle Operations

Cooperative perception, or collective perception (CP) is an emerging and promising technology for intelligent transportation systems (ITS). It enables an ITS station (ITS-S) to share its local perception information with others by means of vehicle-to-X (V2X) communication, thereby achieving improved efficiency and safety in road transportation. In this paper, we present our recent progress on the development of a connected and automated vehicle (CAV) and intelligent roadside unit (IRSU). We present three different experiments to demonstrate the use of CP service within intelligent infrastructure to improve awareness of vulnerable road users (VRU) and thus safety for CAVs in various traffic scenarios. We demonstrate in the experiments that a connected vehicle (CV) can "see" a pedestrian around the corners. More importantly, we demonstrate how CAVs can autonomously and safely interact with walking and running pedestrians, relying only on the CP information from the IRSU through vehicle-to-infrastructure (V2I) communication. This is one of the first demonstrations of urban vehicle automation using only CP information. We also address in the paper the handling of collective perception messages (CPMs) received from the IRSU, and passing them through a pipeline of CP information coordinate transformation with uncertainty, multiple road user tracking, and eventually path planning/decision making within the CAV. The experimental results were obtained with manually driven CV, fully autonomous CAV, and an IRSU retrofitted with vision and laser sensors and a road user tracking system.

preprint2020arXiv

Camera-Lidar Integration: Probabilistic sensor fusion for semantic mapping

An automated vehicle operating in an urban environment must be able to perceive and recognise object/obstacles in a three-dimensional world while navigating in a constantly changing environment. In order to plan and execute accurate sophisticated driving maneuvers, a high-level contextual understanding of the surroundings is essential. Due to the recent progress in image processing, it is now possible to obtain high definition semantic information in 2D from monocular cameras, though cameras cannot reliably provide the highly accurate 3D information provided by lasers. The fusion of these two sensor modalities can overcome the shortcomings of each individual sensor, though there are a number of important challenges that need to be addressed in a probabilistic manner. In this paper, we address the common, yet challenging, lidar/camera/semantic fusion problems which are seldom approached in a wholly probabilistic manner. Our approach is capable of using a multi-sensor platform to build a three-dimensional semantic voxelized map that considers the uncertainty of all of the processes involved. We present a probabilistic pipeline that incorporates uncertainties from the sensor readings (cameras, lidar, IMU and wheel encoders), compensation for the motion of the vehicle, and heuristic label probabilities for the semantic images. We also present a novel and efficient viewpoint validation algorithm to check for occlusions from the camera frames. A probabilistic projection is performed from the camera images to the lidar point cloud. Each labelled lidar scan then feeds into an octree map building algorithm that updates the class probabilities of the map voxels every time a new observation is available. We validate our approach using a set of qualitative and quantitative experimental tests on the USyd Dataset.

preprint2020arXiv

Long-term map maintenance pipeline for autonomous vehicles

For autonomous vehicles to operate persistently in a typical urban environment, it is essential to have high accuracy position information. This requires a mapping and localisation system that can adapt to changes over time. A localisation approach based on a single-survey map will not be suitable for long-term operation as it does not incorporate variations in the environment. In this paper, we present new algorithms to maintain a featured-based map. A map maintenance pipeline is proposed that can continuously update a map with the most relevant features taking advantage of the changes in the surroundings. Our pipeline detects and removes transient features based on their geometrical relationships with the vehicle's pose. Newly identified features became part of a new feature map and are assessed by the pipeline as candidates for the localisation map. By purging out-of-date features and adding newly detected features, we continually update the prior map to more accurately represent the most recent environment. We have validated our approach using the USyd Campus Dataset, which includes more than 18 months of data. The results presented demonstrate that our maintenance pipeline produces a resilient map which can provide sustained localisation performance over time.

preprint2020arXiv

Probabilistic Crowd GAN: Multimodal Pedestrian Trajectory Prediction using a Graph Vehicle-Pedestrian Attention Network

Understanding and predicting the intention of pedestrians is essential to enable autonomous vehicles and mobile robots to navigate crowds. This problem becomes increasingly complex when we consider the uncertainty and multimodality of pedestrian motion, as well as the implicit interactions between members of a crowd, including any response to a vehicle. Our approach, Probabilistic Crowd GAN, extends recent work in trajectory prediction, combining Recurrent Neural Networks (RNNs) with Mixture Density Networks (MDNs) to output probabilistic multimodal predictions, from which likely modal paths are found and used for adversarial training. We also propose the use of Graph Vehicle-Pedestrian Attention Network (GVAT), which models social interactions and allows input of a shared vehicle feature, showing that inclusion of this module leads to improved trajectory prediction both with and without the presence of a vehicle. Through evaluation on various datasets, we demonstrate improvements on the existing state of the art methods for trajectory prediction and illustrate how the true multimodal and uncertain nature of crowd interactions can be directly modelled.

preprint2020arXiv

Probabilistic Egocentric Motion Correction of Lidar Point Cloud and Projection to Camera Images for Moving Platforms

The fusion of sensor data from heterogeneous sensors is crucial for robust perception in various robotics applications that involve moving platforms, for instance, autonomous vehicle navigation. In particular, combining camera and lidar sensors enables the projection of precise range information of the surrounding environment onto visual images. It also makes it possible to label each lidar point with visual segmentation/classification results for 3D mapping, which facilitates a higher level understanding of the scene. The task is however considered non-trivial due to intrinsic and extrinsic sensor calibration, and the distortion of lidar points resulting from the ego-motion of the platform. Despite the existence of many lidar ego-motion correction methods, the errors in the correction process due to uncertainty in ego-motion estimation are not possible to remove completely. It is thus essential to consider the problem a probabilistic process where the ego-motion estimation uncertainty is modelled and considered consistently. The paper investigates the probabilistic lidar ego-motion correction and lidar-to-camera projection, where both the uncertainty in the ego-motion estimation and time jitter in sensory measurements are incorporated. The proposed approach is validated both in simulation and using real-world data collected from an electric vehicle retrofitted with wide-angle cameras and a 16-beam scanning lidar.

preprint2020arXiv

Semantic sensor fusion: from camera to sparse lidar information

To navigate through urban roads, an automated vehicle must be able to perceive and recognize objects in a three-dimensional environment. A high-level contextual understanding of the surroundings is necessary to plan and execute accurate driving maneuvers. This paper presents an approach to fuse different sensory information, Light Detection and Ranging (lidar) scans and camera images. The output of a convolutional neural network (CNN) is used as classifier to obtain the labels of the environment. The transference of semantic information between the labelled image and the lidar point cloud is performed in four steps: initially, we use heuristic methods to associate probabilities to all the semantic classes contained in the labelled images. Then, the lidar points are corrected to compensate for the vehicle's motion given the difference between the timestamps of each lidar scan and camera image. In a third step, we calculate the pixel coordinate for the corresponding camera image. In the last step we perform the transfer of semantic information from the heuristic probability images to the lidar frame, while removing the lidar information that is not visible to the camera. We tested our approach in the Usyd Dataset \cite{usyd_dataset}, obtaining qualitative and quantitative results that demonstrate the validity of our probabilistic sensory fusion approach.