Researcher profile

Xibin Song

Xibin Song contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2022arXiv

A Representation Separation Perspective to Correspondences-free Unsupervised 3D Point Cloud Registration

3D point cloud registration in remote sensing field has been greatly advanced by deep learning based methods, where the rigid transformation is either directly regressed from the two point clouds (correspondences-free approaches) or computed from the learned correspondences (correspondences-based approaches). Existing correspondences-free methods generally learn the holistic representation of the entire point cloud, which is fragile for partial and noisy point clouds. In this paper, we propose a correspondences-free unsupervised point cloud registration (UPCR) method from the representation separation perspective. First, we model the input point cloud as a combination of pose-invariant representation and pose-related representation. Second, the pose-related representation is used to learn the relative pose wrt a "latent canonical shape" for the source and target point clouds respectively. Third, the rigid transformation is obtained from the above two learned relative poses. Our method not only filters out the disturbance in pose-invariant representation but also is robust to partial-to-partial point clouds or noise. Experiments on benchmark datasets demonstrate that our unsupervised method achieves comparable if not better performance than state-of-the-art supervised registration methods.

preprint2022arXiv

End-to-end Learning the Partial Permutation Matrix for Robust 3D Point Cloud Registration

Even though considerable progress has been made in deep learning-based 3D point cloud processing, how to obtain accurate correspondences for robust registration remains a major challenge because existing hard assignment methods cannot deal with outliers naturally. Alternatively, the soft matching-based methods have been proposed to learn the matching probability rather than hard assignment. However, in this paper, we prove that these methods have an inherent ambiguity causing many deceptive correspondences. To address the above challenges, we propose to learn a partial permutation matching matrix, which does not assign corresponding points to outliers, and implements hard assignment to prevent ambiguity. However, this proposal poses two new problems, i.e., existing hard assignment algorithms can only solve a full rank permutation matrix rather than a partial permutation matrix, and this desired matrix is defined in the discrete space, which is non-differentiable. In response, we design a dedicated soft-to-hard (S2H) matching procedure within the registration pipeline consisting of two steps: solving the soft matching matrix (S-step) and projecting this soft matrix to the partial permutation matrix (H-step). Specifically, we augment the profit matrix before the hard assignment to solve an augmented permutation matrix, which is cropped to achieve the final partial permutation matrix. Moreover, to guarantee end-to-end learning, we supervise the learned partial permutation matrix but propagate the gradient to the soft matrix instead. Our S2H matching procedure can be easily integrated with existing registration frameworks, which has been verified in representative frameworks including DCP, RPMNet, and DGR. Extensive experiments have validated our method, which creates a new state-of-the-art performance for robust 3D point cloud registration. The code will be made public.

preprint2022arXiv

PCW-Net: Pyramid Combination and Warping Cost Volume for Stereo Matching

Existing deep learning based stereo matching methods either focus on achieving optimal performances on the target dataset while with poor generalization for other datasets or focus on handling the cross-domain generalization by suppressing the domain sensitive features which results in a significant sacrifice on the performance. To tackle these problems, we propose PCW-Net, a Pyramid Combination and Warping cost volume-based network to achieve good performance on both cross-domain generalization and stereo matching accuracy on various benchmarks. In particular, our PCW-Net is designed for two purposes. First, we construct combination volumes on the upper levels of the pyramid and develop a cost volume fusion module to integrate them for initial disparity estimation. Multi-scale receptive fields can be covered by fusing multi-scale combination volumes, thus, domain-invariant features can be extracted. Second, we construct the warping volume at the last level of the pyramid for disparity refinement. The proposed warping volume can narrow down the residue searching range from the initial disparity searching range to a fine-grained one, which can dramatically alleviate the difficulty of the network to find the correct residue in an unconstrained residue searching space. When training on synthetic datasets and generalizing to unseen real datasets, our method shows strong cross-domain generalization and outperforms existing state-of-the-arts with a large margin. After fine-tuning on the real datasets, our method ranks first on KITTI 2012, second on KITTI 2015, and first on the Argoverse among all published methods as of 7, March 2022. The code will be available at https://github.com/gallenszl/PCWNet.

preprint2021arXiv

IAFA: Instance-aware Feature Aggregation for 3D Object Detection from a Single Image

3D object detection from a single image is an important task in Autonomous Driving (AD), where various approaches have been proposed. However, the task is intrinsically ambiguous and challenging as single image depth estimation is already an ill-posed problem. In this paper, we propose an instance-aware approach to aggregate useful information for improving the accuracy of 3D object detection with the following contributions. First, an instance-aware feature aggregation (IAFA) module is proposed to collect local and global features for 3D bounding boxes regression. Second, we empirically find that the spatial attention module can be well learned by taking coarse-level instance annotations as a supervision signal. The proposed module has significantly boosted the performance of the baseline method on both 3D detection and 2D bird-eye's view of vehicle detection among all three categories. Third, our proposed method outperforms all single image-based approaches (even these methods trained with depth as auxiliary inputs) and achieves state-of-the-art 3D detection performance on the KITTI benchmark.

preprint2020arXiv

Channel Attention based Iterative Residual Learning for Depth Map Super-Resolution

Despite the remarkable progresses made in deep-learning based depth map super-resolution (DSR), how to tackle real-world degradation in low-resolution (LR) depth maps remains a major challenge. Existing DSR model is generally trained and tested on synthetic dataset, which is very different from what would get from a real depth sensor. In this paper, we argue that DSR models trained under this setting are restrictive and not effective in dealing with real-world DSR tasks. We make two contributions in tackling real-world degradation of different depth sensors. First, we propose to classify the generation of LR depth maps into two types: non-linear downsampling with noise and interval downsampling, for which DSR models are learned correspondingly. Second, we propose a new framework for real-world DSR, which consists of four modules : 1) An iterative residual learning module with deep supervision to learn effective high-frequency components of depth maps in a coarse-to-fine manner; 2) A channel attention strategy to enhance channels with abundant high-frequency components; 3) A multi-stage fusion module to effectively re-exploit the results in the coarse-to-fine process; and 4) A depth refinement module to improve the depth map by TGV regularization and input loss. Extensive experiments on benchmarking datasets demonstrate the superiority of our method over current state-of-the-art DSR methods.

preprint2020arXiv

Learning 2D-3D Correspondences To Solve The Blind Perspective-n-Point Problem

Conventional absolute camera pose via a Perspective-n-Point (PnP) solver often assumes that the correspondences between 2D image pixels and 3D points are given. When the correspondences between 2D and 3D points are not known a priori, the task becomes the much more challenging blind PnP problem. This paper proposes a deep CNN model which simultaneously solves for both the 6-DoF absolute camera pose and 2D--3D correspondences. Our model comprises three neural modules connected in sequence. First, a two-stream PointNet-inspired network is applied directly to both the 2D image keypoints and the 3D scene points in order to extract discriminative point-wise features harnessing both local and contextual information. Second, a global feature matching module is employed to estimate a matchability matrix among all 2D--3D pairs. Third, the obtained matchability matrix is fed into a classification module to disambiguate inlier matches. The entire network is trained end-to-end, followed by a robust model fitting (P3P-RANSAC) at test time only to recover the 6-DoF camera pose. Extensive tests on both real and simulated data have shown that our method substantially outperforms existing approaches, and is capable of processing thousands of points a second with the state-of-the-art accuracy.

preprint2020arXiv

PerMO: Perceiving More at Once from a Single Image for Autonomous Driving

We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image for autonomous driving. Our approach combines the strengths of deep learning and the elegance of traditional techniques from part-based deformable model representation to produce high-quality 3D models in the presence of severe occlusions. We present a new part-based deformable vehicle model that is used for instance segmentation and automatically generate a dataset that contains dense correspondences between 2D images and 3D models. We also present a novel end-to-end deep neural network to predict dense 2D/3D mapping and highlight its benefits. Based on the dense mapping, we are able to compute precise 6-DoF poses and 3D reconstruction results at almost interactive rates on a commodity GPU. We have integrated these algorithms with an autonomous driving system. In practice, our method outperforms the state-of-the-art methods for all major vehicle parsing tasks: 2D instance segmentation by 4.4 points (mAP), 6-DoF pose estimation by 9.11 points, and 3D detection by 1.37. Moreover, we have released all of the source code, dataset, and the trained model on Github.

preprint2020arXiv

Time Variable Minimum Torque Trajectory Optimization for Autonomous Excavator

In this paper, we present a minimal torque and time variable trajectory optimization method for autonomous excavator considering the soil-tool interaction. The method formulates the excavation motion generation as a trajectory optimization problem and takes into account geometric, kinematic and dynamics constraints. To generate time-efficient trajectory and improve the overall optimization efficiency, we propose a time variable trajectory optimization mechanism so that the time intervals between the keypoints along the trajectory subject to the optimization. As a result, the method uses few keypoints and reduces the total number of optimization variables. We further introduce a soil-tool interaction force model, which considers the geometric shape of the bucket and the physical properties of the soil. The experimental result on a high fidelity dynamic simulator shows our method can generate feasible trajectories, which satisfy excavation task constraints and are adaptive to different soil conditions.