Researcher profile

Rong Xiong

Rong Xiong contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
26works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

26 published item(s)

preprint2023arXiv

One RING to Rule Them All: Radon Sinogram for Place Recognition, Orientation and Translation Estimation

LiDAR-based global localization is a fundamental problem for mobile robots. It consists of two stages, place recognition and pose estimation, which yields the current orientation and translation, using only the current scan as query and a database of map scans. Inspired by the definition of a recognized place, we consider that a good global localization solution should keep the pose estimation accuracy with a lower place density. Following this idea, we propose a novel framework towards sparse place-based global localization, which utilizes a unified and learning-free representation, Radon sinogram (RING), for all sub-tasks. Based on the theoretical derivation, a translation invariant descriptor and an orientation invariant metric are proposed for place recognition, achieving certifiable robustness against arbitrary orientation and large translation between query and map scan. In addition, we also utilize the property of RING to propose a global convergent solver for both orientation and translation estimation, arriving at global localization. Evaluation of the proposed RING based framework validates the feasibility and demonstrates a superior performance even under a lower place density.

preprint2022arXiv

Depth-Independent Depth Completion via Least Square Estimation

The depth completion task aims to complete a per-pixel dense depth map from a sparse depth map. In this paper, we propose an efficient least square based depth-independent method to complete the sparse depth map utilizing the RGB image and the sparse depth map in two independent stages. In this way can we decouple the neural network and the sparse depth input, so that when some features of the sparse depth map change, such as the sparsity, our method can still produce a promising result. Moreover, due to the positional encoding and linear procession in our pipeline, we can easily produce a super-resolution dense depth map of high quality. We also test the generalization of our method on different datasets compared to some state-of-the-art algorithms. Experiments on the benchmark show that our method produces competitive performance.

preprint2022arXiv

DXQ-Net: Differentiable LiDAR-Camera Extrinsic Calibration Using Quality-aware Flow

Accurate LiDAR-camera extrinsic calibration is a precondition for many multi-sensor systems in mobile robots. Most calibration methods rely on laborious manual operations and calibration targets. While working online, the calibration methods should be able to extract information from the environment to construct the cross-modal data association. Convolutional neural networks (CNNs) have powerful feature extraction ability and have been used for calibration. However, most of the past methods solve the extrinsic as a regression task, without considering the geometric constraints involved. In this paper, we propose a novel end-to-end extrinsic calibration method named DXQ-Net, using a differentiable pose estimation module for generalization. We formulate a probabilistic model for LiDAR-camera calibration flow, yielding a prediction of uncertainty to measure the quality of LiDAR-camera data association. Testing experiments illustrate that our method achieves a competitive with other methods for the translation component and state-of-the-art performance for the rotation component. Generalization experiments illustrate that the generalization performance of our method is significantly better than other deep learning-based methods.

preprint2022arXiv

Efficient Distance-Optimal Tethered Path Planning in Planar Environments: The Workspace Convexity

The main contribution of this paper is the proof of the convexity of the omni-directional tethered robot workspace (namely, the set of all tether-length-admissible robot configurations), as well as a set of distance-optimal tethered path planning algorithms that leverage the workspace convexity. The workspace is proven to be topologically a simply-connected subset and geometrically a convex subset of the set of all configurations. As a direct result, the tether-length-admissible optimal path between two configurations is proven exactly the untethered collision-free locally shortest path in the homotopy specified by the concatenation of the tether curve of the given configurations, which can be simply constructed by performing an untethered path shortening process in the 2D environment instead of a path searching process in the pre-calculated workspace. The convexity is an intrinsic property to the tethered robot kinematics, thus has universal impacts on all high-level distance-optimal tethered path planning tasks: The most time-consuming workspace pre-calculation (WP) process is replaced with a goal configuration pre-calculation (GCP) process, and the homotopy-aware path searching process is replaced with untethered path shortening processes. Motivated by the workspace convexity, efficient algorithms to solve the following problems are naturally proposed: (a) The optimal tethered reconfiguration (TR) planning problem is solved by a locally untethered path shortening (UPS) process, (b) The classic optimal tethered path (TP) planning problem (from a starting configuration to a goal location whereby the target tether state is not assigned) is solved by a GCP process and $n$ UPS processes, where $n$ is the number of tether-length-admissible configurations that visit the goal location, (c) The optimal tethered motion to visit a sequence of multiple goal locations, referred to as

preprint2022arXiv

Efficient Object Manipulation to an Arbitrary Goal Pose: Learning-based Anytime Prioritized Planning

We focus on the task of object manipulation to an arbitrary goal pose, in which a robot is supposed to pick an assigned object to place at the goal position with a specific orientation. However, limited by the execution space of the manipulator with gripper, one-step picking, moving and releasing might be failed, where a reorientation object pose is required as a transition. In this paper, we propose a learning-driven anytime prioritized search-based solver to find a feasible solution with low path cost in a short time. In our work, the problem is formulated as a hierarchical learning problem, with the high level finding a reorientation object pose, and the low level planning paths between adjacent grasps. We learn an offline-training path cost estimator to predict approximate path planning costs, which serve as pseudo rewards to allow for pre-training the high-level planner without interacting with the simulator. To deal with the problem of distribution mismatch of the cost net and the actual execution cost space, a refined training stage is conducted with simulation interaction. A series of experiments carried out in simulation and real world indicate that our system can achieve better performances in the object manipulation task with less time and less cost.

preprint2022arXiv

Efficient Search of the k Shortest Non-Homotopic Paths by Eliminating Non-k-Optimal Topologies

An efficient algorithm to solve the $k$ shortest non-homotopic path planning ($k$-SNPP) problem in a 2D environment is proposed in this paper. Motivated by accelerating the inefficient exploration of the homotopy-augmented space of the 2D environment, our fundamental idea is to identify the non-$k$-optimal path topologies as early as possible and terminate the pathfinding along them. This is a non-trivial practice because it has to be done at an intermediate state of the path planning process when locally shortest paths have not been fully constructed. In other words, the paths to be compared have not rendezvoused at the goal location, which makes the homotopy theory, modelling the spatial relationship among the paths having the same endpoint, not applicable. This paper is the first work that develops a systematic distance-based topology simplification mechanism to solve the $k$-SNPP task, whose core contribution is to assert the distance-based order of non-homotopic locally shortest paths before constructing them. If the order can be predicted, then those path topologies having more than $k$ better topologies are proven free of the desired $k$ paths and thus can be safely discarded during the path planning process. To this end, a hierarchical topological tree is proposed as an implementation of the mechanism, whose nodes are proven to expand in non-homotopic directions and edges (collision-free path segments) are proven locally shortest. With efficient criteria that observe the order relations between partly constructed locally shortest paths being imparted into the tree, the tree nodes that expand in non-$k$-optimal topologies will not be expanded. As a result, the computational time for solving the $k$-SNPP problem is reduced by near two orders of magnitude.

preprint2022arXiv

FEJ-VIRO: A Consistent First-Estimate Jacobian Visual-Inertial-Ranging Odometry

In recent years, Visual-Inertial Odometry (VIO) has achieved many significant progresses. However, VIO methods suffer from localization drift over long trajectories. In this paper, we propose a First-Estimates Jacobian Visual-Inertial-Ranging Odometry (FEJ-VIRO) to reduce the localization drifts of VIO by incorporating ultra-wideband (UWB) ranging measurements into the VIO framework \textit{consistently}. Considering that the initial positions of UWB anchors are usually unavailable, we propose a long-short window structure to initialize the UWB anchors' positions as well as the covariance for state augmentation. After initialization, the FEJ-VIRO estimates the UWB anchors' positions simultaneously along with the robot poses. We further analyze the observability of the visual-inertial-ranging estimators and proved that there are \textit{four} unobservable directions in the ideal case, while one of them vanishes in the actual case due to the gain of spurious information. Based on these analyses, we leverage the FEJ technique to enforce the unobservable directions, hence reducing inconsistency of the estimator. Finally, we validate our analysis and evaluate the proposed FEJ-VIRO with both simulation and real-world experiments.

preprint2022arXiv

Kinematic Motion Retargeting via Neural Latent Optimization for Learning Sign Language

Motion retargeting from a human demonstration to a robot is an effective way to reduce the professional requirements and workload of robot programming, but faces the challenges resulting from the differences between humans and robots. Traditional optimization-based methods are time-consuming and rely heavily on good initialization, while recent studies using feedforward neural networks suffer from poor generalization to unseen motions. Moreover, they neglect the topological information in human skeletons and robot structures. In this paper, we propose a novel neural latent optimization approach to address these problems. Latent optimization utilizes a decoder to establish a mapping between the latent space and the robot motion space. Afterward, the retargeting results that satisfy robot constraints can be obtained by searching for the optimal latent vector. Alongside with latent optimization, neural initialization exploits an encoder to provide a better initialization for faster and better convergence of optimization. Both the human skeleton and the robot structure are modeled as graphs to make better use of topological information. We perform experiments on retargeting Chinese sign language, which involves two arms and two hands, with additional requirements on the relative relationships among joints. Experiments include retargeting various human demonstrations to YuMi, NAO, and Pepper in the simulation environment and to YuMi in the real-world environment. Both efficiency and accuracy of the proposed method are verified.

preprint2022arXiv

Learning to Fill the Seam by Vision: Sub-millimeter Peg-in-hole on Unseen Shapes in Real World

In the peg insertion task, human pays attention to the seam between the peg and the hole and tries to fill it continuously with visual feedback. By imitating the human behavior, we design architectures with position and orientation estimators based on the seam representation for pose alignment, which proves to be general to the unseen peg geometries. By putting the estimators into the closed-loop control with reinforcement learning, we further achieve a higher or comparable success rate, efficiency, and robustness compared with the baseline methods. The policy is trained totally in simulation without any manual intervention. To achieve sim-to-real, a learnable segmentation module with automatic data collecting and labeling can be easily trained to decouple the perception and the policy, which helps the model trained in simulation quickly adapt to the real world with negligible effort. Results are presented in simulation and on a physical robot. Code, videos, and supplemental material are available at https://github.com/xieliang555/SFN.git

preprint2022arXiv

Map-based Visual-Inertial Localization: Consistency and Complexity

Drift-free localization is essential for autonomous vehicles. In this paper, we address the problem by proposing a filter-based framework, which integrates the visual-inertial odometry and the measurements of the features in the pre-built map. In this framework, the transformation between the odometry frame and the map frame is augmented into the state and estimated on the fly. Besides, we maintain only the keyframe poses in the map and employ Schmidt extended Kalman filter to update the state partially, so that the uncertainty of the map information can be consistently considered with low computational cost. Moreover, we theoretically demonstrate that the ever-changing linearization points of the estimated state can introduce spurious information to the augmented system and make the original four-dimensional unobservable subspace vanish, leading to inconsistent estimation in practice. To relieve this problem, we employ first-estimate Jacobian (FEJ) to maintain the correct observability properties of the augmented system. Furthermore, we introduce an observability-constrained updating method to compensate for the significant accumulated error after the long-term absence (can be 3 minutes and 1 km) of map-based measurements. Through simulations, the consistent estimation of our proposed algorithm is validated. Through real-world experiments, we demonstrate that our proposed algorithm runs successfully on four kinds of datasets with the lower computational cost (20% time-saving) and the better estimation accuracy (45% trajectory error reduction) compared with the baseline algorithm VINS-Fusion, whereas VINS-Fusion fails to give bounded localization performance on three of four datasets because of its inconsistent estimation.

preprint2022arXiv

Toward Consistent and Efficient Map-based Visual-inertial Localization: Theory Framework and Filter Design

This paper focuses on designing a consistent and efficient filter for map-based visual-inertial localization. First, we propose a new Lie group with its algebra, based on which a novel invariant extended Kalman filter (invariant EKF) is designed. We theoretically prove that, when we do not consider the uncertainty of the map information, the proposed invariant EKF can naturally maintain the correct observability properties of the system. To consider the uncertainty of the map information, we introduce a Schmidt filter. With the Schmidt filter, the uncertainty of the map information can be taken into consideration to avoid over-confident estimation while the computation cost only increases linearly with the size of the map keyframes. In addition, we introduce an easily implemented observability-constrained technique because directly combining the invariant EKF with the Schmidt filter cannot maintain the correct observability properties of the system that considers the uncertainty of the map information. Finally, we validate our proposed system's high consistency, accuracy, and efficiency via extensive simulations and real-world experiments.

preprint2022arXiv

Towards Two-view 6D Object Pose Estimation: A Comparative Study on Fusion Strategy

Current RGB-based 6D object pose estimation methods have achieved noticeable performance on datasets and real world applications. However, predicting 6D pose from single 2D image features is susceptible to disturbance from changing of environment and textureless or resemblant object surfaces. Hence, RGB-based methods generally achieve less competitive results than RGBD-based methods, which deploy both image features and 3D structure features. To narrow down this performance gap, this paper proposes a framework for 6D object pose estimation that learns implicit 3D information from 2 RGB images. Combining the learned 3D information and 2D image features, we establish more stable correspondence between the scene and the object models. To seek for the methods best utilizing 3D information from RGB inputs, we conduct an investigation on three different approaches, including Early- Fusion, Mid-Fusion, and Late-Fusion. We ascertain the Mid- Fusion approach is the best approach to restore the most precise 3D keypoints useful for object pose estimation. The experiments show that our method outperforms state-of-the-art RGB-based methods, and achieves comparable results with RGBD-based methods.

preprint2022arXiv

Translation Invariant Global Estimation of Heading Angle Using Sinogram of LiDAR Point Cloud

Global point cloud registration is an essential module for localization, of which the main difficulty exists in estimating the rotation globally without initial value. With the aid of gravity alignment, the degree of freedom in point cloud registration could be reduced to 4DoF, in which only the heading angle is required for rotation estimation. In this paper, we propose a fast and accurate global heading angle estimation method for gravity-aligned point clouds. Our key idea is that we generate a translation invariant representation based on Radon Transform, allowing us to solve the decoupled heading angle globally with circular cross-correlation. Besides, for heading angle estimation between point clouds with different distributions, we implement this heading angle estimator as a differentiable module to train a feature extraction network end- to-end. The experimental results validate the effectiveness of the proposed method in heading angle estimation and show better performance compared with other methods.

preprint2021arXiv

Deep Samplable Observation Model for Global Localization and Kidnapping

Global localization and kidnapping are two challenging problems in robot localization. The popular method, Monte Carlo Localization (MCL) addresses the problem by iteratively updating a set of particles with a "sampling-weighting" loop. Sampling is decisive to the performance of MCL [1]. However, traditional MCL can only sample from a uniform distribution over the state space. Although variants of MCL propose different sampling models, they fail to provide an accurate distribution or generalize across scenes. To better deal with these problems, we present a distribution proposal model, named Deep Samplable Observation Model (DSOM). DSOM takes a map and a 2D laser scan as inputs and outputs a conditional multimodal probability distribution of the pose, making the samples more focusing on the regions with higher likelihood. With such samples, the convergence is expected to be more effective and efficient. Considering that the learning-based sampling model may fail to capture the true pose sometimes, we furthermore propose the Adaptive Mixture MCL (AdaM MCL), which deploys a trusty mechanism to adaptively select updating mode for each particle to tolerate this situation. Equipped with DSOM, AdaM MCL can achieve more accurate estimation, faster convergence and better scalability compared to previous methods in both synthetic and real scenes. Even in real environments with long-term changing, AdaM MCL is able to localize the robot using DSOM trained only by simulation observations from a SLAM map or a blueprint map.

preprint2021arXiv

DiSCO: Differentiable Scan Context with Orientation

Global localization is essential for robot navigation, of which the first step is to retrieve a query from the map database. This problem is called place recognition. In recent years, LiDAR scan based place recognition has drawn attention as it is robust against the appearance change. In this paper, we propose a LiDAR-based place recognition method, named Differentiable Scan Context with Orientation (DiSCO), which simultaneously finds the scan at a similar place and estimates their relative orientation. The orientation can further be used as the initial value for the down-stream local optimal metric pose estimation, improving the pose estimation especially when a large orientation between the current scan and retrieved scan exists. Our key idea is to transform the feature into the frequency domain. We utilize the magnitude of the spectrum as the place signature, which is theoretically rotation-invariant. In addition, based on the differentiable phase correlation, we can efficiently estimate the global optimal relative orientation using the spectrum. With such structural constraints, the network can be learned in an end-to-end manner, and the backbone is fully shared by the two tasks, achieving interpretability and light weight. Finally, DiSCO is validated on three datasets with long-term outdoor conditions, showing better performance than the compared methods.

preprint2021arXiv

Dynamic Movement Primitive based Motion Retargeting for Dual-Arm Sign Language Motions

We aim to develop an efficient programming method for equipping service robots with the skill of performing sign language motions. This paper addresses the problem of transferring complex dual-arm sign language motions characterized by the coordination among arms and hands from human to robot, which is seldom considered in previous studies of motion retargeting techniques. In this paper, we propose a novel motion retargeting method that leverages graph optimization and Dynamic Movement Primitives (DMPs) for this problem. We employ DMPs in a leader-follower manner to parameterize the original trajectories while preserving motion rhythm and relative movements between human body parts, and adopt a three-step optimization procedure to find deformed trajectories for robot motion planning while ensuring feasibility for robot execution. Experimental results of several Chinese Sign Language (CSL) motions have been successfully performed on ABB's YuMi dual-arm collaborative robot (14-DOF) with two 6-DOF Inspire-Robotics' multi-fingered hands, a system with 26 DOFs in total.

preprint2021arXiv

Imitation Learning of Hierarchical Driving Model: from Continuous Intention to Continuous Trajectory

One of the challenges to reduce the gap between the machine and the human level driving is how to endow the system with the learning capacity to deal with the coupled complexity of environments, intentions, and dynamics. In this paper, we propose a hierarchical driving model with explicit model of continuous intention and continuous dynamics, which decouples the complexity in the observation-to-action reasoning in the human driving data. Specifically, the continuous intention module takes the route planning map obtained by GPS and IMU, perception from a RGB camera and LiDAR as input to generate a potential map encoded with obstacles and intentions being expressed as grid based potentials. Then, the potential map is regarded as a condition, together with the current dynamics, to generate a continuous trajectory as output by a continuous function approximator network, whose derivatives can be used for supervision without additional parameters. Finally, we validate our method on both datasets and simulator, demonstrating that our method has higher prediction accuracy of displacement and velocity and generates smoother trajectories. The method is also deployed on the real vehicle with loop latency, validating its effectiveness. To the best of our knowledge, this is the first work to produce the driving trajectory using a continuous function approximator network.

preprint2021arXiv

Learn to Differ: Sim2Real Small Defection Segmentation Network

Recent studies on deep-learning-based small defection segmentation approaches are trained in specific settings and tend to be limited by fixed context. Throughout the training, the network inevitably learns the representation of the background of the training data before figuring out the defection. They underperform in the inference stage once the context changed and can only be solved by training in every new setting. This eventually leads to the limitation in practical robotic applications where contexts keep varying. To cope with this, instead of training a network context by context and hoping it to generalize, why not stop misleading it with any limited context and start training it with pure simulation? In this paper, we propose the network SSDS that learns a way of distinguishing small defections between two images regardless of the context, so that the network can be trained once for all. A small defection detection layer utilizing the pose sensitivity of phase correlation between images is introduced and is followed by an outlier masking layer. The network is trained on randomly generated simulated data with simple shapes and is generalized across the real world. Finally, SSDS is validated on real-world collected data and demonstrates the ability that even when trained in cheap simulation, SSDS can still find small defections in the real world showing the effectiveness and its potential for practical applications.

preprint2021arXiv

RaLL: End-to-end Radar Localization on Lidar Map Using Differentiable Measurement Model

Compared to the onboard camera and laser scanner, radar sensor provides lighting and weather invariant sensing, which is naturally suitable for long-term localization under adverse conditions. However, radar data is sparse and noisy, resulting in challenges for radar mapping. On the other hand, the most popular available map currently is built by lidar. In this paper, we propose an end-to-end deep learning framework for Radar Localization on Lidar Map (RaLL) to bridge the gap, which not only achieves the robust radar localization but also exploits the mature lidar mapping technique, thus reducing the cost of radar mapping. We first embed both sensor modals into a common feature space by a neural network. Then multiple offsets are added to the map modal for exhaustive similarity evaluation against the current radar modal, yielding the regression of the current pose. Finally, we apply this differentiable measurement model to a Kalman Filter (KF) to learn the whole sequential localization process in an end-to-end manner. \textit{The whole learning system is differentiable with the network based measurement model at the front-end and KF at the back-end.} To validate the feasibility and effectiveness, we employ multi-session multi-scene datasets collected from the real world, and the results demonstrate that our proposed system achieves superior performance over $90km$ driving, even in generalization scenarios where the model training is in UK, while testing in South Korea. We also release the source code publicly.

preprint2021arXiv

REDE: End-to-end Object 6D Pose Robust Estimation Using Differentiable Outliers Elimination

Object 6D pose estimation is a fundamental task in many applications. Conventional methods solve the task by detecting and matching the keypoints, then estimating the pose. Recent efforts bringing deep learning into the problem mainly overcome the vulnerability of conventional methods to environmental variation due to the hand-crafted feature design. However, these methods cannot achieve end-to-end learning and good interpretability at the same time. In this paper, we propose REDE, a novel end-to-end object pose estimator using RGB-D data, which utilizes network for keypoint regression, and a differentiable geometric pose estimator for pose error back-propagation. Besides, to achieve better robustness when outlier keypoint prediction occurs, we further propose a differentiable outliers elimination method that regresses the candidate result and the confidence simultaneously. Via confidence weighted aggregation of multiple candidates, we can reduce the effect from the outliers in the final estimation. Finally, following the conventional method, we apply a learnable refinement process to further improve the estimation. The experimental results on three benchmark datasets show that REDE slightly outperforms the state-of-the-art approaches and is more robust to object occlusion.

preprint2020arXiv

2-Entity RANSAC for robust visual localization in changing environment

Visual localization has attracted considerable attention due to its low-cost and stable sensor, which is desired in many applications, such as autonomous driving, inspection robots and unmanned aerial vehicles. However, current visual localization methods still struggle with environmental changes across weathers and seasons, as there is significant appearance variation between the map and the query image. The crucial challenge in this situation is that the percentage of outliers, i.e. incorrect feature matches, is high. In this paper, we derive minimal closed form solutions for 3D-2D localization with the aid of inertial measurements, using only 2 pairs of point matches or 1 pair of point match and 1 pair of line match. These solutions are further utilized in the proposed 2-entity RANSAC, which is more robust to outliers as both line and point features can be used simultaneously and the number of matches required for pose calculation is reduced. Furthermore, we introduce three feature sampling strategies with different advantages, enabling an automatic selection mechanism. With the mechanism, our 2-entity RANSAC can be adaptive to the environments with different distribution of feature types in different segments. Finally, we evaluate the method on both synthetic and real-world datasets, validating its performance and effectiveness in inter-session scenarios.

preprint2020arXiv

Cellular Decomposition for Non-repetitive Coverage Task with Minimum Discontinuities

A mechanism to derive non-repetitive coverage path solutions with a proven minimal number of discontinuities is proposed in this work, with the aim to avoid unnecessary, costly end effector lift-offs for manipulators. The problem is motivated by the automatic polishing of an object. Due to the non-bijective mapping between the workspace and the joint-space, a continuous coverage path in the workspace may easily be truncated in the joint-space, incuring undesirable end effector lift-offs. Inversely, there may be multiple configuration choices to cover the same point of a coverage path through the solution of the Inverse Kinematics. The solution departs from the conventional local optimisation of the coverage path shape in task space, or choosing appropriate but possibly disconnected configurations, to instead explicitly explore the leaast number of discontinuous motions through the analysis of the structure of valid configurations in joint-space. The two novel contributions of this paper include proof that the least number of path discontinuities is predicated on the surrounding environment, independent from the choice of the actual coverage path; thus has a minimum. And an efficient finite cellular decomposition method to optimally divide the workspace into the minimum number of cells, each traversable without discontinuties by any arbitrary coverage path within. Extensive simulation examples and real-world results on a 5 DoF manipulator are presented to prove the validity of the proposed strategy in realistic settings.

preprint2020arXiv

Efficient two step optimization for large embedded deformation graph based SLAM

Embedded deformation nodes based formulation has been widely applied in deformable geometry and graphical problems. Though being promising in stereo (or RGBD) sensor based SLAM applications, it remains challenging to keep constant speed in deformation nodes parameter estimation when model grows larger. In practice, the processing time grows rapidly in accordance with the expansion of maps. In this paper, we propose an approach to decouple nodes of deformation graph in large scale dense deformable SLAM and keep the estimation time to be constant. We observe that only partial deformable nodes in the graph are connected to visible points. Based on this fact, sparsity of original Hessian matrix is utilized to split parameter estimation in two independent steps. With this new technique, we achieve faster parameter estimation with amortized computation complexity reduced from O(n^2) to closing O(1). As a result, the computation cost barely increases as the map keeps growing. Based on our strategy, computational bottleneck in large scale embedded deformation graph based applications will be greatly mitigated. The effectiveness is validated by experiments, featuring large scale deformation scenarios.

preprint2020arXiv

Globally optimal consensus maximization for robust visual inertial localization in point and line map

Map based visual inertial localization is a crucial step to reduce the drift in state estimation of mobile robots. The underlying problem for localization is to estimate the pose from a set of 3D-2D feature correspondences, of which the main challenge is the presence of outliers, especially in changing environment. In this paper, we propose a robust solution based on efficient global optimization of the consensus maximization problem, which is insensitive to high percentage of outliers. We first introduce translation invariant measurements (TIMs) for both points and lines to decouple the consensus maximization problem into rotation and translation subproblems, allowing for a two-stage solver with reduced solution dimensions. Then we show that (i) the rotation can be calculated by minimizing TIMs using only 1-dimensional branch-and-bound (BnB), (ii) the translation can be found by running 1-dimensional search for three times with prioritized progressive voting. Compared with the popular randomized solver, our solver achieves deterministic global convergence without depending on an initial value. While compared with existing BnB based methods, ours is exponentially faster. Finally, by evaluating the performance on both simulation and real-world datasets, our approach gives accurate pose even when there are 90\% outliers (only 2 inliers).

preprint2020arXiv

Learning hierarchical behavior and motion planning for autonomous driving

Learning-based driving solution, a new branch for autonomous driving, is expected to simplify the modeling of driving by learning the underlying mechanisms from data. To improve the tactical decision-making for learning-based driving solution, we introduce hierarchical behavior and motion planning (HBMP) to explicitly model the behavior in learning-based solution. Due to the coupled action space of behavior and motion, it is challenging to solve HBMP problem using reinforcement learning (RL) for long-horizon driving tasks. We transform HBMP problem by integrating a classical sampling-based motion planner, of which the optimal cost is regarded as the rewards for high-level behavior learning. As a result, this formulation reduces action space and diversifies the rewards without losing the optimality of HBMP. In addition, we propose a sharable representation for input sensory data across simulation platforms and real-world environment, so that models trained in a fast event-based simulator, SUMO, can be used to initialize and accelerate the RL training in a dynamics based simulator, CARLA. Experimental results demonstrate the effectiveness of the method. Besides, the model is successfully transferred to the real-world, validating the generalization capability.

preprint2019arXiv

Champion Team Paper: Dynamic Passing-Shooting Algorithm Based on CUDA of The RoboCup SSL 2019 Champion

ZJUNlict became the Small Size League Champion of RoboCup 2019 with 6 victories and 1 tie for their 7 games. The overwhelming ability of ball-handling and passing allows ZJUNlict to greatly threaten its opponent and almost kept its goal clear without being threatened. This paper presents the core technology of its ball-handling and robot movement which consist of hardware optimization, dynamic passing and shooting strategy, and multi-agent cooperation and formation. We first describe the mechanical optimization on the placement of the capacitors, the redesign of the damping system of the dribbler and the electrical optimization on the replacement of the core chip. We then describe our passing point algorithm. The passing and shooting strategy can be separated into two different parts, where we search the passing point on SBIP-DPPS and evaluate the point based on the ball model. The statements and the conclusion should be supported by the performances and log of games on Small Size League RoboCup 2019.