Researcher profile

Max Schwarz

Max Schwarz contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2022arXiv

ConvPoseCNN2: Prediction and Refinement of Dense 6D Object Poses

Object pose estimation is a key perceptual capability in robotics. We propose a fully-convolutional extension of the PoseCNN method, which densely predicts object translations and orientations. This has several advantages such as improving the spatial resolution of the orientation predictions -- useful in highly-cluttered arrangements, significant reduction in parameters by avoiding full connectivity, and fast inference. We propose and discuss several aggregation methods for dense orientation predictions that can be applied as a post-processing step, such as averaging and clustering techniques. We demonstrate that our method achieves the same accuracy as PoseCNN on the challenging YCB-Video dataset and provide a detailed ablation study of several variants of our method. Finally, we demonstrate that the model can be further improved by inserting an iterative refinement module into the middle of the network, which enforces consistency of the prediction.

preprint2022arXiv

FaDIV-Syn: Fast Depth-Independent View Synthesis using Soft Masks and Implicit Blending

Novel view synthesis is required in many robotic applications, such as VR teleoperation and scene reconstruction. Existing methods are often too slow for these contexts, cannot handle dynamic scenes, and are limited by their explicit depth estimation stage, where incorrect depth predictions can lead to large projection errors. Our proposed method runs in real time on live streaming data and avoids explicit depth estimation by efficiently warping input images into the target frame for a range of assumed depth planes. The resulting plane sweep volume (PSV) is directly fed into our network, which first estimates soft PSV masks in a self-supervised manner, and then directly produces the novel output view. This improves efficiency and performance on transparent, reflective, thin, and feature-less scene parts. FaDIV-Syn can perform both interpolation and extrapolation tasks at 540p in real-time and outperforms state-of-the-art extrapolation methods on the large-scale RealEstate10k dataset. We thoroughly evaluate ablations, such as removing the Soft-Masking network, training from fewer examples as well as generalization to higher resolutions and stronger depth discretization. Our implementation is available.

preprint2022arXiv

Predicting Physical Object Properties from Video

We present a novel approach to estimating physical properties of objects from video. Our approach consists of a physics engine and a correction estimator. Starting from the initial observed state, object behavior is simulated forward in time. Based on the simulated and observed behavior, the correction estimator then determines refined physical parameters for each object. The method can be iterated for increased precision. Our approach is generic, as it allows for the use of an arbitrary - not necessarily differentiable - physics engine and correction estimator. For the latter, we evaluate both gradient-free hyperparameter optimization and a deep convolutional neural network. We demonstrate faster and more robust convergence of the learned method in several simulated 2D scenarios focusing on bin situations.

preprint2022arXiv

Target Chase, Wall Building, and Fire Fighting: Autonomous UAVs of Team NimbRo at MBZIRC 2020

The Mohamed Bin Zayed International Robotics Challenge (MBZIRC) 2020 posed diverse challenges for unmanned aerial vehicles (UAVs). We present our four tailored UAVs, specifically developed for individual aerial-robot tasks of MBZIRC, including custom hardware- and software components. In Challenge 1, a target UAV is pursued using a high-efficiency, onboard object detection pipeline to capture a ball from the target UAV. A second UAV uses a similar detection method to find and pop balloons scattered throughout the arena. For Challenge 2, we demonstrate a larger UAV capable of autonomous aerial manipulation: Bricks are found and tracked from camera images. Subsequently, they are approached, picked, transported, and placed on a wall. Finally, in Challenge 3, our UAV autonomously finds fires using LiDAR and thermal cameras. It extinguishes the fires with an onboard fire extinguisher. While every robot features task-specific subsystems, all UAVs rely on a standard software stack developed for this particular and future competitions. We present our mostly open-source software solutions, including tools for system configuration, monitoring, robust wireless communication, high-level control, and agile trajectory generation. For solving the MBZIRC 2020 tasks, we advanced the state of the art in multiple research areas like machine vision and trajectory generation. We present our scientific contributions that constitute the foundation for our algorithms and systems and analyze the results from the MBZIRC competition 2020 in Abu Dhabi, where our systems reached second place in the Grand Challenge. Furthermore, we discuss lessons learned from our participation in this complex robotic challenge.

preprint2020arXiv

A VR System for Immersive Teleoperation and Live Exploration with a Mobile Robot

Applications like disaster management and industrial inspection often require experts to enter contaminated places. To circumvent the need for physical presence, it is desirable to generate a fully immersive individual live teleoperation experience. However, standard video-based approaches suffer from a limited degree of immersion and situation awareness due to the restriction to the camera view, which impacts the navigation. In this paper, we present a novel VR-based practical system for immersive robot teleoperation and scene exploration. While being operated through the scene, a robot captures RGB-D data that is streamed to a SLAM-based live multi-client telepresence system. Here, a global 3D model of the already captured scene parts is reconstructed and streamed to the individual remote user clients where the rendering for e.g. head-mounted display devices (HMDs) is performed. We introduce a novel lightweight robot client component which transmits robot-specific data and enables a quick integration into existing robotic systems. This way, in contrast to first-person exploration systems, the operators can explore and navigate in the remote site completely independent of the current position and view of the capturing robot, complementing traditional input devices for teleoperation. We provide a proof-of-concept implementation and demonstrate the capabilities as well as the performance of our system regarding interactive object measurements and bandwidth-efficient data streaming and visualization. Furthermore, we show its benefits over purely video-based teleoperation in a user study revealing a higher degree of situation awareness and a more precise navigation in challenging environments.

preprint2020arXiv

Stillleben: Realistic Scene Synthesis for Deep Learning in Robotics

Training data is the key ingredient for deep learning approaches, but difficult to obtain for the specialized domains often encountered in robotics. We describe a synthesis pipeline capable of producing training data for cluttered scene perception tasks such as semantic segmentation, object detection, and correspondence or pose estimation. Our approach arranges object meshes in physically realistic, dense scenes using physics simulation. The arranged scenes are rendered using high-quality rasterization with randomized appearance and material parameters. Noise and other transformations introduced by the camera sensors are simulated. Our pipeline can be run online during training of a deep neural network, yielding applications in life-long learning and in iterative render-and-compare approaches. We demonstrate the usability by learning semantic segmentation on the challenging YCB-Video dataset without actually using any training frames, where our method achieves performance comparable to a conventionally trained model. Additionally, we show successful application in a real-world regrasping system.

preprint2020arXiv

Visual Descriptor Learning from Monocular Video

Correspondence estimation is one of the most widely researched and yet only partially solved area of computer vision with many applications in tracking, mapping, recognition of objects and environment. In this paper, we propose a novel way to estimate dense correspondence on an RGB image where visual descriptors are learned from video examples by training a fully convolutional network. Most deep learning methods solve this by training the network with a large set of expensive labeled data or perform labeling through strong 3D generative models using RGB-D videos. Our method learns from RGB videos using contrastive loss, where relative labeling is estimated from optical flow. We demonstrate the functionality in a quantitative analysis on rendered videos, where ground truth information is available. Not only does the method perform well on test data with the same background, it also generalizes to situations with a new background. The descriptors learned are unique and the representations determined by the network are global. We further show the applicability of the method to real-world videos.