Source author record

Sebastian Scherer

Sebastian Scherer appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Robotics Computer Vision Machine Learning eess.SY Systems and Control

Catalog footprint

What is connected

20works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Design, Modeling and Control for a Tilt-rotor VTOL UAV in the Presence of Actuator Failure

Enabling vertical take-off and landing while providing the ability to fly long ranges opens the door to a wide range of new real-world aircraft applications while improving many existing tasks. Tiltrotor vertical take-off and landing (VTOL) unmanned aerial vehicles (UAVs) are a better choice than fixed-wing and multirotor aircraft for such applications. Prior works on these aircraft have addressed aerodynamic performance, design, modeling, and control. However, a less explored area is the study of their potential fault tolerance due to their inherent redundancy, which allows them to tolerate some degree of actuation failure. This paper introduces tolerance to several types of actuator failures in a tiltrotor VTOL aircraft. We discuss the design and modeling of a custom tiltrotor VTOL UAV, which is a combination of a fixed-wing aircraft and a quadrotor with tilting rotors, where the four propellers can be rotated individually. Then, we analyze the feasible wrench space the vehicle can generate and design the dynamic control allocation so that the system can adapt to actuator failures, benefiting from the configuration redundancy. The proposed approach is lightweight and is implemented as an extension to an already-existing flight control stack. Extensive experiments validate that the system can maintain the controlled flight under different actuator failures. To the best of our knowledge, this work is the first study of the tiltrotor VTOL's fault-tolerance that exploits the configuration redundancy. The source code and simulation can be accessed at https://theairlab.org/vtol.

preprint2022arXiv

AirCode: A Robust Object Encoding Method

Object encoding and identification are crucial for many robotic tasks such as autonomous exploration and semantic relocalization. Existing works heavily rely on the tracking of detected objects but have difficulty recalling revisited objects precisely. In this paper, we propose a novel object encoding method, which is named as AirCode, based on a graph of key-points. To be robust to the number of key-points detected, we propose a feature sparse encoding and object dense encoding method to ensure that each key-point can only affect a small part of the object descriptors, leading it to be robust to viewpoint changes, scaling, occlusion, and even object deformation. In the experiments, we show that it achieves superior performance for object identification than the state-of-the-art algorithms and is able to provide reliable semantic relocalization. It is a plug-and-play module and we expect that it will play an important role in various applications.

preprint2022arXiv

ALITA: A Large-scale Incremental Dataset for Long-term Autonomy

For long-term autonomy, most place recognition methods are mainly evaluated on simplified scenarios or simulated datasets, which cannot provide solid evidence to evaluate the readiness for current Simultaneous Localization and Mapping (SLAM). In this paper, we present a long-term place recognition dataset for use in mobile localization under large-scale dynamic environments. This dataset includes a campus-scale track and a city-scale track: 1) the campus-track focuses the long-term property, we record LiDAR device and an omnidirectional camera on 10 trajectories, and each trajectory are repeatly recorded 8 times under variant illumination conditions. 2) the city-track focuses the large-scale property, we mount the LiDAR device on the vehicle and traversing through a 120km trajectories, which contains open streets, residential areas, natural terrains, etc. They includes 200 hours of raw data of all kinds scenarios within urban environments. The ground truth position for both tracks are provided on each trajectory, which is obtained from the Global Position System with an additional General ICP based point cloud refinement. To simplify the evaluation procedure, we also provide the Python-API with a set of place recognition metrics is proposed to quickly load our dataset and evaluate the recognition performance against different methods. This dataset targets at finding methods with high place recognition accuracy and robustness, and providing real robotic system with long-term autonomy. The dataset and the provided tools can be accessed from https://github.com/MetaSLAM/ALITA.

preprint2022arXiv

ALTO: A Large-Scale Dataset for UAV Visual Place Recognition and Localization

We present the ALTO dataset, a vision-focused dataset for the development and benchmarking of Visual Place Recognition and Localization methods for Unmanned Aerial Vehicles. The dataset is composed of two long (approximately 150km and 260km) trajectories flown by a helicopter over Ohio and Pennsylvania, and it includes high precision GPS-INS ground truth location data, high precision accelerometer readings, laser altimeter readings, and RGB downward facing camera imagery. In addition, we provide reference imagery over the flight paths, which makes this dataset suitable for VPR benchmarking and other tasks common in Localization, such as image registration and visual odometry. To the author's knowledge, this is the largest real-world aerial-vehicle dataset of this kind. Our dataset is available at https://github.com/MetaSLAM/ALTO.

preprint2022arXiv

BioSLAM: A Bio-inspired Lifelong Memory System for General Place Recognition

We present BioSLAM, a lifelong SLAM framework for learning various new appearances incrementally and maintaining accurate place recognition for previously visited areas. Unlike humans, artificial neural networks suffer from catastrophic forgetting and may forget the previously visited areas when trained with new arrivals. For humans, researchers discover that there exists a memory replay mechanism in the brain to keep the neuron active for previous events. Inspired by this discovery, BioSLAM designs a gated generative replay to control the robot's learning behavior based on the feedback rewards. Specifically, BioSLAM provides a novel dual-memory mechanism for maintenance: 1) a dynamic memory to efficiently learn new observations and 2) a static memory to balance new-old knowledge. When combined with a visual-/LiDAR- based SLAM system, the complete processing pipeline can help the agent incrementally update the place recognition ability, robust to the increasing complexity of long-term place recognition. We demonstrate BioSLAM in two incremental SLAM scenarios. In the first scenario, a LiDAR-based agent continuously travels through a city-scale environment with a 120km trajectory and encounters different types of 3D geometries (open streets, residential areas, commercial buildings). We show that BioSLAM can incrementally update the agent's place recognition ability and outperform the state-of-the-art incremental approach, Generative Replay, by 24%. In the second scenario, a LiDAR-vision-based agent repeatedly travels through a campus-scale area on a 4.5km trajectory. BioSLAM can guarantee the place recognition accuracy to outperform 15\% over the state-of-the-art approaches under different appearances. To our knowledge, BioSLAM is the first memory-enhanced lifelong SLAM system to help incremental place recognition in long-term navigation tasks.

preprint2022arXiv

General Place Recognition Survey: Towards the Real-world Autonomy Age

Place recognition is the fundamental module that can assist Simultaneous Localization and Mapping (SLAM) in loop-closure detection and re-localization for long-term navigation. The place recognition community has made astonishing progress over the last $20$ years, and this has attracted widespread research interest and application in multiple fields such as computer vision and robotics. However, few methods have shown promising place recognition performance in complex real-world scenarios, where long-term and large-scale appearance changes usually result in failures. Additionally, there is a lack of an integrated framework amongst the state-of-the-art methods that can handle all of the challenges in place recognition, which include appearance changes, viewpoint differences, robustness to unknown areas, and efficiency in real-world applications. In this work, we survey the state-of-the-art methods that target long-term localization and discuss future directions and opportunities. We start by investigating the formulation of place recognition in long-term autonomy and the major challenges in real-world environments. We then review the recent works in place recognition for different sensor modalities and current strategies for dealing with various place recognition challenges. Finally, we review the existing datasets for long-term localization and introduce our datasets and evaluation API for different approaches. This paper can be a tutorial for researchers new to the place recognition community and those who care about long-term robotics autonomy. We also provide our opinion on the frequently asked question in robotics: Do robots need accurate localization for long-term autonomy? A summary of this work and our datasets and evaluation API is publicly available to the robotics community at: https://github.com/MetaSLAM/GPRS.

preprint2022arXiv

iSimLoc: Visual Global Localization for Previously Unseen Environments with Simulated Images

The visual camera is an attractive device in beyond visual line of sight (B-VLOS) drone operation, since they are low in size, weight, power, and cost, and can provide redundant modality to GPS failures. However, state-of-the-art visual localization algorithms are unable to match visual data that have a significantly different appearance due to illuminations or viewpoints. This paper presents iSimLoc, a condition/viewpoint consistent hierarchical global re-localization approach. The place features of iSimLoc can be utilized to search target images under changing appearances and viewpoints. Additionally, our hierarchical global re-localization module refines in a coarse-to-fine manner, allowing iSimLoc to perform a fast and accurate estimation. We evaluate our method on one dataset with appearance variations and one dataset that focuses on demonstrating large-scale matching over a long flight in complicated environments. On our two datasets, iSimLoc achieves 88.7\% and 83.8\% successful retrieval rates with 1.5s inferencing time, compared to 45.8% and 39.7% using the next best method. These results demonstrate robust localization in a range of environments.

preprint2022arXiv

Lifelong Graph Learning

Graph neural networks (GNN) are powerful models for many graph-structured tasks. Existing models often assume that the complete structure of the graph is available during training. In practice, however, graph-structured data is usually formed in a streaming fashion so that learning a graph continuously is often necessary. In this paper, we bridge GNN and lifelong learning by converting a continual graph learning problem to a regular graph learning problem so GNN can inherit the lifelong learning techniques developed for convolutional neural networks (CNN). We propose a new topology, the feature graph, which takes features as new nodes and turns nodes into independent graphs. This successfully converts the original problem of node classification to graph classification. In the experiments, we demonstrate the efficiency and effectiveness of feature graph networks (FGN) by continuously learning a sequence of classical graph datasets. We also show that FGN achieves superior performance in two applications, i.e., lifelong human action recognition with wearable devices and feature matching. To the best of our knowledge, FGN is the first method to bridge graph learning and lifelong learning via a novel graph topology. Source code is available at https://github.com/wang-chen/LGL

preprint2022arXiv

Mission-level Robustness with Rapidly-deployed, Autonomous Aerial Vehicles by Carnegie Mellon Team Tartan at MBZIRC 2020

For robotic systems to succeed in high risk, real-world situations, they have to be quickly deployable and robust to environmental changes, under-performing hardware, and mission subtask failures. These robots are often designed to consider a single sequence of mission events, with complex algorithms lowering individual subtask failure rates under some critical constraints. Our approach utilizes common techniques in vision and control, and encodes robustness into mission structure through outcome monitoring and recovery strategies. In addition, our system infrastructure enables rapid deployment and requires no central communication. This report also includes lessons in rapid field robotic development and testing. We developed and evaluated our systems through real-robot experiments at an outdoor test site in Pittsburgh, Pennsylvania, USA, as well as in the 2020 Mohamed Bin Zayed International Robotics Challenge. All competition trials were completed in fully autonomous mode without RTK-GPS. Our system placed fourth in Challenge 2 and seventh in the Grand Challenge, with notable achievements such as popping five balloons (Challenge 1), successfully picking and placing a block (Challenge 2), and dispensing the most water onto an outdoor, real fire with an autonomous UAV (Challenge 3).

preprint2022arXiv

Present and Future of SLAM in Extreme Underground Environments

This paper reports on the state of the art in underground SLAM by discussing different SLAM strategies and results across six teams that participated in the three-year-long SubT competition. In particular, the paper has four main goals. First, we review the algorithms, architectures, and systems adopted by the teams; particular emphasis is put on lidar-centric SLAM solutions (the go-to approach for virtually all teams in the competition), heterogeneous multi-robot operation (including both aerial and ground robots), and real-world underground operation (from the presence of obscurants to the need to handle tight computational constraints). We do not shy away from discussing the dirty details behind the different SubT SLAM systems, which are often omitted from technical papers. Second, we discuss the maturity of the field by highlighting what is possible with the current SLAM systems and what we believe is within reach with some good systems engineering. Third, we outline what we believe are fundamental open problems, that are likely to require further research to break through. Finally, we provide a list of open-source SLAM implementations and datasets that have been produced during the SubT challenge and related efforts, and constitute a useful resource for researchers and practitioners.

preprint2022arXiv

Robotic Interestingness via Human-Informed Few-Shot Object Detection

Interestingness recognition is crucial for decision making in autonomous exploration for mobile robots. Previous methods proposed an unsupervised online learning approach that can adapt to environments and detect interesting scenes quickly, but lack the ability to adapt to human-informed interesting objects. To solve this problem, we introduce a human-interactive framework, AirInteraction, that can detect human-informed objects via few-shot online learning. To reduce the communication bandwidth, we first apply an online unsupervised learning algorithm on the unmanned vehicle for interestingness recognition and then only send the potential interesting scenes to a base-station for human inspection. The human operator is able to draw and provide bounding box annotations for particular interesting objects, which are sent back to the robot to detect similar objects via few-shot learning. Only using few human-labeled examples, the robot can learn novel interesting object categories during the mission and detect interesting scenes that contain the objects. We evaluate our method on various interesting scene recognition datasets. To the best of our knowledge, it is the first human-informed few-shot object detection framework for autonomous exploration.

preprint2022arXiv

Robust Modeling and Controls for Racing on the Edge

Race cars are routinely driven to the edge of their handling limits in dynamic scenarios well above 200mph. Similar challenges are posed in autonomous racing, where a software stack, instead of a human driver, interacts within a multi-agent environment. For an Autonomous Racing Vehicle (ARV), operating at the edge of handling limits and acting safely in these dynamic environments is still an unsolved problem. In this paper, we present a baseline controls stack for an ARV capable of operating safely up to 140mph. Additionally, limitations in the current approach are discussed to highlight the need for improved dynamics modeling and learning.

preprint2022arXiv

TartanDrive: A Large-Scale Dataset for Learning Off-Road Dynamics Models

We present TartanDrive, a large scale dataset for learning dynamics models for off-road driving. We collected a dataset of roughly 200,000 off-road driving interactions on a modified Yamaha Viking ATV with seven unique sensing modalities in diverse terrains. To the authors' knowledge, this is the largest real-world multi-modal off-road driving dataset, both in terms of number of interactions and sensing modalities. We also benchmark several state-of-the-art methods for model-based reinforcement learning from high-dimensional observations on this dataset. We find that extending these models to multi-modality leads to significant performance on off-road dynamics prediction, especially in more challenging terrains. We also identify some shortcomings with current neural network architectures for the off-road driving task. Our dataset is available at https://github.com/castacks/tartan_drive.

preprint2022arXiv

VTOL Failure Detection and Recovery by Utilizing Redundancy

Offering vertical take-off and landing (VTOL) capabilities and the ability to travel great distances are crucial for Urban Air Mobility (UAM) vehicles. These capabilities make hybrid VTOLs the clear front-runners among UAM platforms. On the other hand, concerns regarding the safety and reliability of autonomous aircraft have grown in response to the recent growth in aerial vehicle usage. As a result, monitoring the aircraft status to report any failures and recovering to prevent the loss of control when a failure happens are becoming increasingly important. Hybrid VTOLs can withstand some degree of actuator failure due to their intrinsic redundancy. Their aerodynamic performance, design, modeling, and control have all been addressed in the previous studies. However, research on their potential fault tolerance is still a less investigated field. In this workshop, we will present a summary of our work on aircraft fault detection and the recovery of our hybrid VTOL. First, we will go over our real-time aircraft-independent system for detecting actuator failures and abnormal behaviors. Then, in the context of our custom tiltrotor VTOL aircraft design, we talk about our optimization-based control allocation system, which utilizes the vehicle's configuration redundancy to recover from different actuation failures. Finally, we explore the ideas of how these parts can work together to provide a fail-safe system. We present our simulation and real-life experiments.

preprint2020arXiv

A Robust Laser-Inertial Odometry and Mapping Method for Large-Scale Highway Environments

In this paper, we propose a novel laser-inertial odometry and mapping method to achieve real-time, low-drift and robust pose estimation in large-scale highway environments. The proposed method is mainly composed of four sequential modules, namely scan pre-processing module, dynamic object detection module, laser-inertial odometry module and laser mapping module. Scan pre-processing module uses inertial measurements to compensate the motion distortion of each laser scan. Then, the dynamic object detection module is used to detect and remove dynamic objects from each laser scan by applying CNN segmentation network. After obtaining the undistorted point cloud without moving objects, the laser inertial odometry module uses an Error State Kalman Filter to fuse the data of laser and IMU and output the coarse pose estimation at high frequency. Finally, the laser mapping module performs a fine processing step and the "Frame-to-Model" scan matching strategy is used to create a static global map. We compare the performance of our method with two state-ofthe-art methods, LOAM and SuMa, using KITTI dataset and real highway scene dataset. Experiment results show that our method performs better than the state-of-the-art methods in real highway environments and achieves competitive accuracy on the KITTI dataset.

preprint2020arXiv

Deep-Learning Assisted High-Resolution Binocular Stereo Depth Reconstruction

This work presents dense stereo reconstruction using high-resolution images for infrastructure inspections. The state-of-the-art stereo reconstruction methods, both learning and non-learning ones, consume too much computational resource on high-resolution data. Recent learning-based methods achieve top ranks on most benchmarks. However, they suffer from the generalization issue due to lack of task-specific training data. We propose to use a less resource demanding non-learning method, guided by a learning-based model, to handle high-resolution images and achieve accurate stereo reconstruction. The deep-learning model produces an initial disparity prediction with uncertainty for each pixel of the down-sampled stereo image pair. The uncertainty serves as a self-measurement of its generalization ability and the per-pixel searching range around the initially predicted disparity. The downstream process performs a modified version of the Semi-Global Block Matching method with the up-sampled per-pixel searching range. The proposed deep-learning assisted method is evaluated on the Middlebury dataset and high-resolution stereo images collected by our customized binocular stereo camera. The combination of learning and non-learning methods achieves better performance on 12 out of 15 cases of the Middlebury dataset. In our infrastructure inspection experiments, the average 3D reconstruction error is less than 0.004m.

preprint2020arXiv

Learning Visuomotor Policies for Aerial Navigation Using Cross-Modal Representations

Machines are a long way from robustly solving open-world perception-control tasks, such as first-person view (FPV) aerial navigation. While recent advances in end-to-end Machine Learning, especially Imitation and Reinforcement Learning appear promising, they are constrained by the need of large amounts of difficult-to-collect labeled real-world data. Simulated data, on the other hand, is easy to generate, but generally does not render safe behaviors in diverse real-life scenarios. In this work we propose a novel method for learning robust visuomotor policies for real-world deployment which can be trained purely with simulated data. We develop rich state representations that combine supervised and unsupervised environment data. Our approach takes a cross-modal perspective, where separate modalities correspond to the raw camera data and the system states relevant to the task, such as the relative pose of gates to the drone in the case of drone racing. We feed both data modalities into a novel factored architecture, which learns a joint low-dimensional embedding via Variational Auto Encoders. This compact representation is then fed into a control policy, which we trained using imitation learning with expert trajectories in a simulator. We analyze the rich latent spaces learned with our proposed representations, and show that the use of our cross-modal architecture significantly improves control policy performance as compared to end-to-end learning or purely unsupervised feature extractors. We also present real-world results for drone navigation through gates in different track configurations and environmental conditions. Our proposed method, which runs fully onboard, can successfully generalize the learned representations and policies across simulation and reality, significantly outperforming baseline approaches. Supplementary video: https://youtu.be/VKc3A5HlUU8

preprint2020arXiv

Monocular Camera Localization in Prior LiDAR Maps with 2D-3D Line Correspondences

Light-weight camera localization in existing maps is essential for vision-based navigation. Currently, visual and visual-inertial odometry (VO\&VIO) techniques are well-developed for state estimation but with inevitable accumulated drifts and pose jumps upon loop closure. To overcome these problems, we propose an efficient monocular camera localization method in prior LiDAR maps using direct 2D-3D line correspondences. To handle the appearance differences and modality gaps between LiDAR point clouds and images, geometric 3D lines are extracted offline from LiDAR maps while robust 2D lines are extracted online from video sequences. With the pose prediction from VIO, we can efficiently obtain coarse 2D-3D line correspondences. Then the camera poses and 2D-3D correspondences are iteratively optimized by minimizing the projection error of correspondences and rejecting outliers. Experimental results on the EurocMav dataset and our collected dataset demonstrate that the proposed method can efficiently estimate camera poses without accumulated drifts or pose jumps in structured environments.

preprint2020arXiv

RGB-D SLAM in Dynamic Environments Using Point Correlations

In this paper, a simultaneous localization and mapping (SLAM) method that eliminates the influence of moving objects in dynamic environments is proposed. This method utilizes the correlation between map points to separate points that are part of the static scene and points that are part of different moving objects into different groups. A sparse graph is first created using Delaunay triangulation from all map points. In this graph, the vertices represent map points, and each edge represents the correlation between adjacent points. If the relative position between two points remains consistent over time, there is correlation between them, and they are considered to be moving together rigidly. If not, they are considered to have no correlation and to be in separate groups. After the edges between the uncorrelated points are removed during point-correlation optimization, the remaining graph separates the map points of the moving objects from the map points of the static scene. The largest group is assumed to be the group of reliable static map points. Finally, motion estimation is performed using only these points. The proposed method was implemented for RGB-D sensors, evaluated with a public RGB-D benchmark, and tested in several additional challenging environments. The experimental results demonstrate that robust and accurate performance can be achieved by the proposed SLAM method in both slightly and highly dynamic environments. Compared with other state-of-the-art methods, the proposed method can provide competitive accuracy with good real-time performance.

preprint2020arXiv

TartanAir: A Dataset to Push the Limits of Visual SLAM

We present a challenging dataset, the TartanAir, for robot navigation tasks and more. The data is collected in photo-realistic simulation environments with the presence of moving objects, changing light and various weather conditions. By collecting data in simulations, we are able to obtain multi-modal sensor data and precise ground truth labels such as the stereo RGB image, depth image, segmentation, optical flow, camera poses, and LiDAR point cloud. We set up large numbers of environments with various styles and scenes, covering challenging viewpoints and diverse motion patterns that are difficult to achieve by using physical data collection platforms. In order to enable data collection at such a large scale, we develop an automatic pipeline, including mapping, trajectory sampling, data processing, and data verification. We evaluate the impact of various factors on visual SLAM algorithms using our data. The results of state-of-the-art algorithms reveal that the visual SLAM problem is far from solved. Methods that show good performance on established datasets such as KITTI do not perform well in more difficult scenarios. Although we use the simulation, our goal is to push the limits of Visual SLAM algorithms in the real world by providing a challenging benchmark for testing new methods, while also using a large diverse training data for learning-based methods. Our dataset is available at \url{http://theairlab.org/tartanair-dataset}.

Sebastian Scherer

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

Design, Modeling and Control for a Tilt-rotor VTOL UAV in the Presence of Actuator Failure

AirCode: A Robust Object Encoding Method

ALITA: A Large-scale Incremental Dataset for Long-term Autonomy

ALTO: A Large-Scale Dataset for UAV Visual Place Recognition and Localization

BioSLAM: A Bio-inspired Lifelong Memory System for General Place Recognition

General Place Recognition Survey: Towards the Real-world Autonomy Age

iSimLoc: Visual Global Localization for Previously Unseen Environments with Simulated Images

Lifelong Graph Learning

Mission-level Robustness with Rapidly-deployed, Autonomous Aerial Vehicles by Carnegie Mellon Team Tartan at MBZIRC 2020

Present and Future of SLAM in Extreme Underground Environments

Robotic Interestingness via Human-Informed Few-Shot Object Detection

Robust Modeling and Controls for Racing on the Edge

TartanDrive: A Large-Scale Dataset for Learning Off-Road Dynamics Models

VTOL Failure Detection and Recovery by Utilizing Redundancy

A Robust Laser-Inertial Odometry and Mapping Method for Large-Scale Highway Environments

Deep-Learning Assisted High-Resolution Binocular Stereo Depth Reconstruction

Learning Visuomotor Policies for Aerial Navigation Using Cross-Modal Representations

Monocular Camera Localization in Prior LiDAR Maps with 2D-3D Line Correspondences

RGB-D SLAM in Dynamic Environments Using Point Correlations

TartanAir: A Dataset to Push the Limits of Visual SLAM