Source author record

Liangjun Zhang

Liangjun Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Robotics eess.IV eess.SY Systems and Control

Catalog footprint

What is connected

12works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Generalized Continuous Collision Detection Framework of Polynomial Trajectory for Mobile Robots in Cluttered Environments

In this paper, we introduce a generalized continuous collision detection (CCD) framework for the mobile robot along the polynomial trajectory in cluttered environments including various static obstacle models. Specifically, we find that the collision conditions between robots and obstacles could be transformed into a set of polynomial inequalities, whose roots can be efficiently solved by the proposed solver. In addition, we test different types of mobile robots with various kinematic and dynamic constraints in our generalized CCD framework and validate that it allows the provable collision checking and can compute the exact time of impact. Furthermore, we combine our architecture with the path planner in the navigation system. Benefiting from our CCD method, the mobile robot is able to work safely in some challenging scenarios.

preprint2022arXiv

Autonomous Wheel Loader Trajectory Tracking Control Using LPV-MPC

In this paper, we present a systematic approach for high-performance and efficient trajectory tracking control of autonomous wheel loaders. With the nonlinear dynamic model of a wheel loader, nonlinear model predictive control (MPC) is used in offline trajectory planning to obtain a high-performance state-control trajectory while satisfying the state and control constraints. In tracking control, the nonlinear model is embedded into a Linear Parameter Varying (LPV) model and the LPV-MPC strategy is used to achieve fast online computation and good tracking performance. To demonstrate the effectiveness and the advantages of the LPV-MPC, we test and compare three model predictive control strategies in the high-fidelity simulation environment. With the planned trajectory, three tracking control strategies LPV-MPC, nonlinear MPC, and LTI-MPC are simulated and compared in the perspectives of computational burden and tracking performance. The LPV-MPC can achieve better performance than conventional LTI-MPC because more accurate nominal system dynamics are captured in the LPV model. In addition, LPV-MPC achieves slightly worse tracking performance but tremendously improved computational efficiency than nonlinear MPC. A video with loading cycles completed by our autonomous wheel loader in the simulation environment can be found here: https://youtu.be/QbNfS_wZKKA.

preprint2022arXiv

Excavation Reinforcement Learning Using Geometric Representation

Excavation of irregular rigid objects in clutter, such as fragmented rocks and wood blocks, is very challenging due to their complex interaction dynamics and highly variable geometries. In this paper, we adopt reinforcement learning (RL) to tackle this challenge and learn policies to plan for a sequence of excavation trajectories for irregular rigid objects, given point clouds of excavation scenes. Moreover, we separately learn a compact representation of the point cloud on geometric tasks that do not require human labeling. We show that using the representation reduces training time for RL, while achieving similar asymptotic performance compare to an end-to-end RL algorithm. When using a policy trained in simulation directly on a real scene, we show that the policy trained with the representation outperforms end-to-end RL. To our best knowledge, this paper presents the first application of RL to plan a sequence of excavation trajectories of irregular rigid objects in clutter.

preprint2022arXiv

PCW-Net: Pyramid Combination and Warping Cost Volume for Stereo Matching

Existing deep learning based stereo matching methods either focus on achieving optimal performances on the target dataset while with poor generalization for other datasets or focus on handling the cross-domain generalization by suppressing the domain sensitive features which results in a significant sacrifice on the performance. To tackle these problems, we propose PCW-Net, a Pyramid Combination and Warping cost volume-based network to achieve good performance on both cross-domain generalization and stereo matching accuracy on various benchmarks. In particular, our PCW-Net is designed for two purposes. First, we construct combination volumes on the upper levels of the pyramid and develop a cost volume fusion module to integrate them for initial disparity estimation. Multi-scale receptive fields can be covered by fusing multi-scale combination volumes, thus, domain-invariant features can be extracted. Second, we construct the warping volume at the last level of the pyramid for disparity refinement. The proposed warping volume can narrow down the residue searching range from the initial disparity searching range to a fine-grained one, which can dramatically alleviate the difficulty of the network to find the correct residue in an unconstrained residue searching space. When training on synthetic datasets and generalizing to unseen real datasets, our method shows strong cross-domain generalization and outperforms existing state-of-the-arts with a large margin. After fine-tuning on the real datasets, our method ranks first on KITTI 2012, second on KITTI 2015, and first on the Argoverse among all published methods as of 7, March 2022. The code will be available at https://github.com/gallenszl/PCWNet.

preprint2022arXiv

ProposalContrast: Unsupervised Pre-training for LiDAR-based 3D Object Detection

Existing approaches for unsupervised point cloud pre-training are constrained to either scene-level or point/voxel-level instance discrimination. Scene-level methods tend to lose local details that are crucial for recognizing the road objects, while point/voxel-level methods inherently suffer from limited receptive field that is incapable of perceiving large objects or context environments. Considering region-level representations are more suitable for 3D object detection, we devise a new unsupervised point cloud pre-training framework, called ProposalContrast, that learns robust 3D representations by contrasting region proposals. Specifically, with an exhaustive set of region proposals sampled from each point cloud, geometric point relations within each proposal are modeled for creating expressive proposal representations. To better accommodate 3D detection properties, ProposalContrast optimizes with both inter-cluster and inter-proposal separation, i.e., sharpening the discriminativeness of proposal representations across semantic classes and object instances. The generalizability and transferability of ProposalContrast are verified on various 3D detectors (i.e., PV-RCNN, CenterPoint, PointPillars and PointRCNN) and datasets (i.e., KITTI, Waymo and ONCE).

preprint2022arXiv

Semi-supervised 3D Object Detection with Proficient Teachers

Dominated point cloud-based 3D object detectors in autonomous driving scenarios rely heavily on the huge amount of accurately labeled samples, however, 3D annotation in the point cloud is extremely tedious, expensive and time-consuming. To reduce the dependence on large supervision, semi-supervised learning (SSL) based approaches have been proposed. The Pseudo-Labeling methodology is commonly used for SSL frameworks, however, the low-quality predictions from the teacher model have seriously limited its performance. In this work, we propose a new Pseudo-Labeling framework for semi-supervised 3D object detection, by enhancing the teacher model to a proficient one with several necessary designs. First, to improve the recall of pseudo labels, a Spatialtemporal Ensemble (STE) module is proposed to generate sufficient seed boxes. Second, to improve the precision of recalled boxes, a Clusteringbased Box Voting (CBV) module is designed to get aggregated votes from the clustered seed boxes. This also eliminates the necessity of sophisticated thresholds to select pseudo labels. Furthermore, to reduce the negative influence of wrongly pseudo-labeled samples during the training, a soft supervision signal is proposed by considering Box-wise Contrastive Learning (BCL). The effectiveness of our model is verified on both ONCE and Waymo datasets. For example, on ONCE, our approach significantly improves the baseline by 9.51 mAP. Moreover, with half annotations, our model outperforms the oracle model with full annotations on Waymo.

preprint2022arXiv

Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary

With the advance of deep learning technology, automatic video generation from audio or text has become an emerging and promising research topic. In this paper, we present a novel approach to synthesize video from the text. The method builds a phoneme-pose dictionary and trains a generative adversarial network (GAN) to generate video from interpolated phoneme poses. Compared to audio-driven video generation algorithms, our approach has a number of advantages: 1) It only needs a fraction of the training data used by an audio-driven approach; 2) It is more flexible and not subject to vulnerability due to speaker variation; 3) It significantly reduces the preprocessing, training and inference time. We perform extensive experiments to compare the proposed method with state-of-the-art talking face generation methods on a benchmark dataset and datasets of our own. The results demonstrate the effectiveness and superiority of our approach.

preprint2022arXiv

TNS: Terrain Traversability Mapping and Navigation System for Autonomous Excavators

We present a terrain traversability mapping and navigation system (TNS) for autonomous excavator applications in an unstructured environment. We use an efficient approach to extract terrain features from RGB images and 3D point clouds and incorporate them into a global map for planning and navigation. Our system can adapt to changing environments and update the terrain information in real-time. Moreover, we present a novel dataset, the Complex Worksite Terrain (CWT) dataset, which consists of RGB images from construction sites with seven categories based on navigability. Our novel algorithms improve the mapping accuracy over previous SOTA methods by 4.17-30.48% and reduce MSE on the traversability map by 13.8-71.4%. We have combined our mapping approach with planning and control modules in an autonomous excavator navigation system and observe 49.3% improvement in the overall success rate. Based on TNS, we demonstrate the first autonomous excavator that can navigate through unstructured environments consisting of deep pits, steep hills, rock piles, and other complex terrain features.

preprint2021arXiv

Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation

Holistically understanding an object and its 3D movable parts through visual perception models is essential for enabling an autonomous agent to interact with the world. For autonomous driving, the dynamics and states of vehicle parts such as doors, the trunk, and the bonnet can provide meaningful semantic information and interaction states, which are essential to ensuring the safety of the self-driving vehicle. Existing visual perception models mainly focus on coarse parsing such as object bounding box detection or pose estimation and rarely tackle these situations. In this paper, we address this important autonomous driving problem by solving three critical issues. First, to deal with data scarcity, we propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images before reconstructing human-vehicle interaction (VHI) scenarios. Our approach is fully automatic without any human interaction, which can generate a large number of vehicles in uncommon states (VUS) for training deep neural networks (DNNs). Second, to perform fine-grained vehicle perception, we present a multi-task network for VUS parsing and a multi-stream network for VHI parsing. Third, to quantitatively evaluate the effectiveness of our data augmentation approach, we build the first VUS dataset in real traffic scenarios (e.g., getting on/out or placing/removing luggage). Experimental results show that our approach advances other baseline methods in 2D detection and instance segmentation by a big margin (over 8%). In addition, our network yields large improvements in discovering and understanding these uncommon cases. Moreover, we have released the source code, the dataset, and the trained model on Github (https://github.com/zongdai/EditingForDNN).

preprint2021arXiv

IAFA: Instance-aware Feature Aggregation for 3D Object Detection from a Single Image

3D object detection from a single image is an important task in Autonomous Driving (AD), where various approaches have been proposed. However, the task is intrinsically ambiguous and challenging as single image depth estimation is already an ill-posed problem. In this paper, we propose an instance-aware approach to aggregate useful information for improving the accuracy of 3D object detection with the following contributions. First, an instance-aware feature aggregation (IAFA) module is proposed to collect local and global features for 3D bounding boxes regression. Second, we empirically find that the spatial attention module can be well learned by taking coarse-level instance annotations as a supervision signal. The proposed module has significantly boosted the performance of the baseline method on both 3D detection and 2D bird-eye's view of vehicle detection among all three categories. Third, our proposed method outperforms all single image-based approaches (even these methods trained with depth as auxiliary inputs) and achieves state-of-the-art 3D detection performance on the KITTI benchmark.

preprint2020arXiv

PerMO: Perceiving More at Once from a Single Image for Autonomous Driving

We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image for autonomous driving. Our approach combines the strengths of deep learning and the elegance of traditional techniques from part-based deformable model representation to produce high-quality 3D models in the presence of severe occlusions. We present a new part-based deformable vehicle model that is used for instance segmentation and automatically generate a dataset that contains dense correspondences between 2D images and 3D models. We also present a novel end-to-end deep neural network to predict dense 2D/3D mapping and highlight its benefits. Based on the dense mapping, we are able to compute precise 6-DoF poses and 3D reconstruction results at almost interactive rates on a commodity GPU. We have integrated these algorithms with an autonomous driving system. In practice, our method outperforms the state-of-the-art methods for all major vehicle parsing tasks: 2D instance segmentation by 4.4 points (mAP), 6-DoF pose estimation by 9.11 points, and 3D detection by 1.37. Moreover, we have released all of the source code, dataset, and the trained model on Github.

preprint2020arXiv

Time Variable Minimum Torque Trajectory Optimization for Autonomous Excavator

In this paper, we present a minimal torque and time variable trajectory optimization method for autonomous excavator considering the soil-tool interaction. The method formulates the excavation motion generation as a trajectory optimization problem and takes into account geometric, kinematic and dynamics constraints. To generate time-efficient trajectory and improve the overall optimization efficiency, we propose a time variable trajectory optimization mechanism so that the time intervals between the keypoints along the trajectory subject to the optimization. As a result, the method uses few keypoints and reduces the total number of optimization variables. We further introduce a soil-tool interaction force model, which considers the geometric shape of the bucket and the physical properties of the soil. The experimental result on a high fidelity dynamic simulator shows our method can generate feasible trajectories, which satisfy excavation task constraints and are adaptive to different soil conditions.

Liangjun Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

A Generalized Continuous Collision Detection Framework of Polynomial Trajectory for Mobile Robots in Cluttered Environments

Autonomous Wheel Loader Trajectory Tracking Control Using LPV-MPC

Excavation Reinforcement Learning Using Geometric Representation

PCW-Net: Pyramid Combination and Warping Cost Volume for Stereo Matching

ProposalContrast: Unsupervised Pre-training for LiDAR-based 3D Object Detection

Semi-supervised 3D Object Detection with Proficient Teachers

Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary

TNS: Terrain Traversability Mapping and Navigation System for Autonomous Excavators

Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation

IAFA: Instance-aware Feature Aggregation for 3D Object Detection from a Single Image

PerMO: Perceiving More at Once from a Single Image for Autonomous Driving

Time Variable Minimum Torque Trajectory Optimization for Autonomous Excavator