Source author record

Michael Kaess

Michael Kaess appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Robotics Computer Vision Computational Geometry eess.SP

Catalog footprint

What is connected

8works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Acoustic Neural 3D Reconstruction Under Pose Drift

We consider the problem of optimizing neural implicit surfaces for 3D reconstruction using acoustic images collected with drifting sensor poses. The accuracy of current state-of-the-art 3D acoustic modeling algorithms is highly dependent on accurate pose estimation; small errors in sensor pose can lead to severe reconstruction artifacts. In this paper, we propose an algorithm that jointly optimizes the neural scene representation and sonar poses. Our algorithm does so by parameterizing the 6DoF poses as learnable parameters and backpropagating gradients through the neural renderer and implicit representation. We validated our algorithm on both real and simulated datasets. It produces high-fidelity 3D reconstructions even under significant pose drift.

preprint2022arXiv

LEO: Learning Energy-based Models in Factor Graph Optimization

We address the problem of learning observation models end-to-end for estimation. Robots operating in partially observable environments must infer latent states from multiple sensory inputs using observation models that capture the joint distribution between latent states and observations. This inference problem can be formulated as an objective over a graph that optimizes for the most likely sequence of states using all previous measurements. Prior work uses observation models that are either known a-priori or trained on surrogate losses independent of the graph optimizer. In this paper, we propose a method to directly optimize end-to-end tracking performance by learning observation models with the graph optimizer in the loop. This direct approach may appear, however, to require the inference algorithm to be fully differentiable, which many state-of-the-art graph optimizers are not. Our key insight is to instead formulate the problem as that of energy-based learning. We propose a novel approach, LEO, for learning observation models end-to-end with graph optimizers that may be non-differentiable. LEO alternates between sampling trajectories from the graph posterior and updating the model to match these samples to ground truth trajectories. We propose a way to generate such samples efficiently using incremental Gauss-Newton solvers. We compare LEO against baselines on datasets drawn from two distinct tasks: navigation and real-world planar pushing. We show that LEO is able to learn complex observation models with lower errors and fewer samples. Supplementary video: https://youtu.be/YqzlUPudfkA

preprint2022arXiv

Long-term Visual Map Sparsification with Heterogeneous GNN

We address the problem of map sparsification for long-term visual localization. For map sparsification, a commonly employed assumption is that the pre-build map and the later captured localization query are consistent. However, this assumption can be easily violated in the dynamic world. Additionally, the map size grows as new data accumulate through time, causing large data overhead in the long term. In this paper, we aim to overcome the environmental changes and reduce the map size at the same time by selecting points that are valuable to future localization. Inspired by the recent progress in Graph Neural Network(GNN), we propose the first work that models SfM maps as heterogeneous graphs and predicts 3D point importance scores with a GNN, which enables us to directly exploit the rich information in the SfM map graph. Two novel supervisions are proposed: 1) a data-fitting term for selecting valuable points to future localization based on training queries; 2) a K-Cover term for selecting sparse points with full map coverage. The experiments show that our method selected map points on stable and widely visible structures and outperformed baselines in localization performance.

preprint2022arXiv

PatchGraph: In-hand tactile tracking with learned surface normals

We address the problem of tracking 3D object poses from touch during in-hand manipulations. Specifically, we look at tracking small objects using vision-based tactile sensors that provide high-dimensional tactile image measurements at the point of contact. While prior work has relied on a-priori information about the object being localized, we remove this requirement. Our key insight is that an object is composed of several local surface patches, each informative enough to achieve reliable object tracking. Moreover, we can recover the geometry of this local patch online by extracting local surface normal information embedded in each tactile image. We propose a novel two-stage approach. First, we learn a mapping from tactile images to surface normals using an image translation network. Second, we use these surface normals within a factor graph to both reconstruct a local patch map and use it to infer 3D object poses. We demonstrate reliable object tracking for over $100$ contact sequences across unique shapes with four objects in simulation and two objects in the real-world. Supplementary video: https://youtu.be/FHks--haOGY

preprint2022arXiv

Revisiting LiDAR Registration and Reconstruction: A Range Image Perspective

Spinning LiDAR data are prevalent for 3D vision tasks. Since LiDAR data is presented in the form of point clouds, expensive 3D operations are usually required. This paper revisits spinning LiDAR scan formation and presents a cylindrical range image representation with a ray-wise projection/unprojection model. It is built upon raw scans and supports lossless conversion from 2D to 3D, allowing fast 2D operations, including 2D index-based neighbor search and downsampling. We then propose, to the best of our knowledge, the first multi-scale registration and dense signed distance function (SDF) reconstruction system for LiDAR range images. We further collect a dataset of indoor and outdoor LiDAR scenes in the posed range image format. A comprehensive evaluation of registration and reconstruction is conducted on the proposed dataset and the KITTI dataset. Experiments demonstrate that our approach outperforms surface reconstruction baselines and achieves similar performance to state-of-the-art LiDAR registration methods, including a modern learning-based registration approach. Thanks to the simplicity, our registration runs at 100Hz and SDF reconstruction in real time. The dataset and a modularized C++/Python toolbox will be released.

preprint2022arXiv

ShapeMap 3-D: Efficient shape mapping through dense touch and vision

Knowledge of 3-D object shape is of great importance to robot manipulation tasks, but may not be readily available in unstructured environments. While vision is often occluded during robot-object interaction, high-resolution tactile sensors can give a dense local perspective of the object. However, tactile sensors have limited sensing area and the shape representation must faithfully approximate non-contact areas. In addition, a key challenge is efficiently incorporating these dense tactile measurements into a 3-D mapping framework. In this work, we propose an incremental shape mapping method using a GelSight tactile sensor and a depth camera. Local shape is recovered from tactile images via a learned model trained in simulation. Through efficient inference on a spatial factor graph informed by a Gaussian process, we build an implicit surface representation of the object. We demonstrate visuo-tactile mapping in both simulated and real-world experiments, to incrementally build 3-D reconstructions of household objects.

preprint2020arXiv

An Efficient Planar Bundle Adjustment Algorithm

This paper presents an efficient algorithm for the least-squares problem using the point-to-plane cost, which aims to jointly optimize depth sensor poses and plane parameters for 3D reconstruction. We call this least-squares problem \textbf{Planar Bundle Adjustment} (PBA), due to the similarity between this problem and the original Bundle Adjustment (BA) in visual reconstruction. As planes ubiquitously exist in the man-made environment, they are generally used as landmarks in SLAM algorithms for various depth sensors. PBA is important to reduce drift and improve the quality of the map. However, directly adopting the well-established BA framework in visual reconstruction will result in a very inefficient solution for PBA. This is because a 3D point only has one observation at a camera pose. In contrast, a depth sensor can record hundreds of points in a plane at a time, which results in a very large nonlinear least-squares problem even for a small-scale space. Fortunately, we find that there exist a special structure of the PBA problem. We introduce a reduced Jacobian matrix and a reduced residual vector, and prove that they can replace the original Jacobian matrix and residual vector in the generally adopted Levenberg-Marquardt (LM) algorithm. This significantly reduces the computational cost. Besides, when planes are combined with other features for 3D reconstruction, the reduced Jacobian matrix and residual vector can also replace the corresponding parts derived from planes. Our experimental results verify that our algorithm can significantly reduce the computational time compared to the solution using the traditional BA framework. Besides, our algorithm is faster, more accuracy, and more robust to initialization errors compared to the start-of-the-art solution using the plane-to-plane cost

preprint2016arXiv

The Manifold Particle Filter for State Estimation on High-dimensional Implicit Manifolds

We estimate the state a noisy robot arm and underactuated hand using an Implicit Manifold Particle Filter (MPF) informed by touch sensors. As the robot touches the world, its state space collapses to a contact manifold that we represent implicitly using a signed distance field. This allows us to extend the MPF to higher (six or more) dimensional state spaces. Earlier work (which explicitly represents the contact manifold) only shows the MPF in two or three dimensions. Through a series of experiments, we show that the implicit MPF converges faster and is more accurate than a conventional particle filter during periods of persistent contact. We present three methods of sampling the implicit contact manifold, and compare them in experiments.

Michael Kaess

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Acoustic Neural 3D Reconstruction Under Pose Drift

LEO: Learning Energy-based Models in Factor Graph Optimization

Long-term Visual Map Sparsification with Heterogeneous GNN

PatchGraph: In-hand tactile tracking with learned surface normals

Revisiting LiDAR Registration and Reconstruction: A Range Image Perspective

ShapeMap 3-D: Efficient shape mapping through dense touch and vision

An Efficient Planar Bundle Adjustment Algorithm

The Manifold Particle Filter for State Estimation on High-dimensional Implicit Manifolds