Source author record

Andrew J. Davison

Andrew J. Davison appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Robotics Machine Learning Artificial Intelligence Distributed, Parallel, and Cluster Computing Graphics Computational Geometry Neural and Evolutionary Computing Performance

Catalog footprint

What is connected

21works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Radiant Foam Rendering on a Graph Processor

Many emerging many-core accelerators replace a single large device memory with hundreds to thousands of lightweight cores, each owning only a small local SRAM and exchanging data via explicit on-chip communication. This organization offers high aggregate bandwidth, but it breaks a key assumption behind many volumetric rendering techniques: that rays can randomly access a large, unified scene representation. Rendering efficiently on such hardware therefore requires distributing both data and computation, keeping ray traversal mostly local, and structuring communication into predictable routes. We present a fully in-SRAM, distributed renderer for the Radiant Foam Voronoi-cell volumetric representation on the Graphcore Mk2 IPU(Intelligence Processing Unit), a many-core accelerator with tile-local SRAM and explicit inter-tile communication. Our system shards the scene across tiles and forwards rays between shards through a hierarchical routing overlay, enabling ray marching entirely from on-chip SRAM with predictable communication. On Mip-NeRF~360 scenes, the system attains near-interactive throughput of approximately 1 fps at 640x480 with image and depth map quality close to the original GPU-based Radiant Foam implementation, while keeping all scene data and ray state in on-chip SRAM. Beyond demonstrating feasibility, we analyze routing, memory, and scheduling bottlenecks that inform how future distributed-memory accelerators can better support irregular, data-movement-heavy rendering workloads.

preprint2024arXiv

Fit-NGP: Fitting Object Models to Neural Graphics Primitives

Accurate 3D object pose estimation is key to enabling many robotic applications that involve challenging object interactions. In this work, we show that the density field created by a state-of-the-art efficient radiance field reconstruction method is suitable for highly accurate and robust pose estimation for objects with known 3D models, even when they are very small and with challenging reflective surfaces. We present a fully automatic object pose estimation system based on a robot arm with a single wrist-mounted camera, which can scan a scene from scratch, detect and estimate the 6-Degrees of Freedom (DoF) poses of multiple objects within a couple of minutes of operation. Small objects such as bolts and nuts are estimated with accuracy on order of 1mm.

preprint2022arXiv

Auto-Lambda: Disentangling Dynamic Task Relationships

Understanding the structure of multiple related tasks allows for multi-task learning to improve the generalisation ability of one or all of them. However, it usually requires training each pairwise combination of tasks together in order to capture task relationships, at an extremely high computational cost. In this work, we learn task relationships via an automated weighting framework, named Auto-Lambda. Unlike previous methods where task relationships are assumed to be fixed, Auto-Lambda is a gradient-based meta learning framework which explores continuous, dynamic task relationships via task-specific weightings, and can optimise any choice of combination of tasks through the formulation of a meta-loss; where the validation loss automatically influences task weightings throughout training. We apply the proposed framework to both multi-task and auxiliary learning problems in computer vision and robotics, and show that Auto-Lambda achieves state-of-the-art performance, even when compared to optimisation strategies designed specifically for each problem and data domain. Finally, we observe that Auto-Lambda can discover interesting learning behaviors, leading to new insights in multi-task learning. Code is available at https://github.com/lorenmt/auto-lambda.

preprint2022arXiv

Bootstrapping Semantic Segmentation with Regional Contrast

We present ReCo, a contrastive learning framework designed at a regional level to assist learning in semantic segmentation. ReCo performs semi-supervised or supervised pixel-level contrastive learning on a sparse set of hard negative pixels, with minimal additional memory footprint. ReCo is easy to implement, being built on top of off-the-shelf segmentation networks, and consistently improves performance in both semi-supervised and supervised semantic segmentation methods, achieving smoother segmentation boundaries and faster convergence. The strongest effect is in semi-supervised learning with very few labels. With ReCo, we achieve high-quality semantic segmentation models, requiring only 5 examples of each semantic class. Code is available at https://github.com/lorenmt/reco.

preprint2022arXiv

Coarse-to-Fine Q-attention: Efficient Learning for Visual Robotic Manipulation via Discretisation

We present a coarse-to-fine discretisation method that enables the use of discrete reinforcement learning approaches in place of unstable and data-inefficient actor-critic methods in continuous robotics domains. This approach builds on the recently released ARM algorithm, which replaces the continuous next-best pose agent with a discrete one, with coarse-to-fine Q-attention. Given a voxelised scene, coarse-to-fine Q-attention learns what part of the scene to 'zoom' into. When this 'zooming' behaviour is applied iteratively, it results in a near-lossless discretisation of the translation space, and allows the use of a discrete action, deep Q-learning method. We show that our new coarse-to-fine algorithm achieves state-of-the-art performance on several difficult sparsely rewarded RLBench vision-based robotics tasks, and can train real-world policies, tabula rasa, in a matter of minutes, with as little as 3 demonstrations.

preprint2022arXiv

Incremental Abstraction in Distributed Probabilistic SLAM Graphs

Scene graphs represent the key components of a scene in a compact and semantically rich way, but are difficult to build during incremental SLAM operation because of the challenges of robustly identifying abstract scene elements and optimising continually changing, complex graphs. We present a distributed, graph-based SLAM framework for incrementally building scene graphs based on two novel components. First, we propose an incremental abstraction framework in which a neural network proposes abstract scene elements that are incorporated into the factor graph of a feature-based monocular SLAM system. Scene elements are confirmed or rejected through optimisation and incrementally replace the points yielding a more dense, semantic and compact representation. Second, enabled by our novel routing procedure, we use Gaussian Belief Propagation (GBP) for distributed inference on a graph processor. The time per iteration of GBP is structure-agnostic and we demonstrate the speed advantages over direct methods for inference of heterogeneous factor graphs. We run our system on real indoor datasets using planar abstractions and recover the major planes with significant compression.

preprint2022arXiv

Learning to Complete Object Shapes for Object-level Mapping in Dynamic Scenes

In this paper, we propose a novel object-level mapping system that can simultaneously segment, track, and reconstruct objects in dynamic scenes. It can further predict and complete their full geometries by conditioning on reconstructions from depth inputs and a category-level shape prior with the aim that completed object geometry leads to better object reconstruction and tracking accuracy. For each incoming RGB-D frame, we perform instance segmentation to detect objects and build data associations between the detection and the existing object maps. A new object map will be created for each unmatched detection. For each matched object, we jointly optimise its pose and latent geometry representations using geometric residual and differential rendering residual towards its shape prior and completed geometry. Our approach shows better tracking and reconstruction performance compared to methods using traditional volumetric mapping or learned shape prior approaches. We evaluate its effectiveness by quantitatively and qualitatively testing it in both synthetic and real-world sequences.

preprint2022arXiv

Q-attention: Enabling Efficient Learning for Vision-based Robotic Manipulation

Despite the success of reinforcement learning methods, they have yet to have their breakthrough moment when applied to a broad range of robotic manipulation tasks. This is partly due to the fact that reinforcement learning algorithms are notoriously difficult and time consuming to train, which is exacerbated when training from images rather than full-state inputs. As humans perform manipulation tasks, our eyes closely monitor every step of the process with our gaze focusing sequentially on the objects being manipulated. With this in mind, we present our Attention-driven Robotic Manipulation (ARM) algorithm, which is a general manipulation algorithm that can be applied to a range of sparse-rewarded tasks, given only a small number of demonstrations. ARM splits the complex task of manipulation into a 3 stage pipeline: (1) a Q-attention agent extracts relevant pixel locations from RGB and point cloud inputs, (2) a next-best pose agent that accepts crops from the Q-attention agent and outputs poses, and (3) a control agent that takes the goal pose and outputs joint actions. We show that current learning algorithms fail on a range of RLBench tasks, whilst ARM is successful.

preprint2022arXiv

ReorientBot: Learning Object Reorientation for Specific-Posed Placement

Robots need the capability of placing objects in arbitrary, specific poses to rearrange the world and achieve various valuable tasks. Object reorientation plays a crucial role in this as objects may not initially be oriented such that the robot can grasp and then immediately place them in a specific goal pose. In this work, we present a vision-based manipulation system, ReorientBot, which consists of 1) visual scene understanding with pose estimation and volumetric reconstruction using an onboard RGB-D camera; 2) learned waypoint selection for successful and efficient motion generation for reorientation; 3) traditional motion planning to generate a collision-free trajectory from the selected waypoints. We evaluate our method using the YCB objects in both simulation and the real world, achieving 93% overall success, 81% improvement in success rate, and 22% improvement in execution time compared to a heuristic approach. We demonstrate extended multi-object rearrangement showing the general capability of the system.

preprint2022arXiv

SafePicking: Learning Safe Object Extraction via Object-Level Mapping

Robots need object-level scene understanding to manipulate objects while reasoning about contact, support, and occlusion among objects. Given a pile of objects, object recognition and reconstruction can identify the boundary of object instances, giving important cues as to how the objects form and support the pile. In this work, we present a system, SafePicking, that integrates object-level mapping and learning-based motion planning to generate a motion that safely extracts occluded target objects from a pile. Planning is done by learning a deep Q-network that receives observations of predicted poses and a depth-based heightmap to output a motion trajectory, trained to maximize a safety metric reward. Our results show that the observation fusion of poses and depth-sensing gives both better performance and robustness to the model. We evaluate our methods using the YCB objects in both simulation and the real world, achieving safe object extraction from piles.

preprint2022arXiv

Simultaneous Localisation and Mapping with Quadric Surfaces

There are many possibilities for how to represent the map in simultaneous localisation and mapping (SLAM). While sparse, keypoint-based SLAM systems have achieved impressive levels of accuracy and robustness, their maps may not be suitable for many robotic tasks. Dense SLAM systems are capable of producing dense reconstructions, but can be computationally expensive and, like sparse systems, lack higher-level information about the structure of a scene. Human-made environments contain a lot of structure, and we seek to take advantage of this by enabling the use of quadric surfaces as features in SLAM systems. We introduce a minimal representation for quadric surfaces and show how this can be included in a least-squares formulation. We also show how our representation can be easily extended to include additional constraints on quadrics such as those found in quadrics of revolution. Finally, we introduce a proof-of-concept SLAM system using our representation, and provide some experimental results using an RGB-D dataset.

preprint2021arXiv

End-to-End Egospheric Spatial Memory

Spatial memory, or the ability to remember and recall specific locations and objects, is central to autonomous agents' ability to carry out tasks in real environments. However, most existing artificial memory modules are not very adept at storing spatial information. We propose a parameter-free module, Egospheric Spatial Memory (ESM), which encodes the memory in an ego-sphere around the agent, enabling expressive 3D representations. ESM can be trained end-to-end via either imitation or reinforcement learning, and improves both training efficiency and final performance against other memory baselines on both drone and manipulator visuomotor control tasks. The explicit egocentric geometry also enables us to seamlessly combine the learned controller with other non-learned modalities, such as local obstacle avoidance. We further show applications to semantic segmentation on the ScanNet dataset, where ESM naturally combines image-level and map-level inference modalities. Through our broad set of experiments, we show that ESM provides a general computation graph for embodied spatial reasoning, and the module forms a bridge between real-time mapping systems and differentiable memory architectures. Implementation at: https://github.com/ivy-dl/memory.

preprint2020arXiv

Bundle Adjustment on a Graph Processor

Graph processors such as Graphcore's Intelligence Processing Unit (IPU) are part of the major new wave of novel computer architecture for AI, and have a general design with massively parallel computation, distributed on-chip memory and very high inter-core communication bandwidth which allows breakthrough performance for message passing algorithms on arbitrary graphs. We show for the first time that the classical computer vision problem of bundle adjustment (BA) can be solved extremely fast on a graph processor using Gaussian Belief Propagation. Our simple but fully parallel implementation uses the 1216 cores on a single IPU chip to, for instance, solve a real BA problem with 125 keyframes and 1919 points in under 40ms, compared to 1450ms for the Ceres CPU library. Further code optimisation will surely increase this difference on static problems, but we argue that the real promise of graph processing is for flexible in-place optimisation of general, dynamically changing factor graphs representing Spatial AI problems. We give indications of this with experiments showing the ability of GBP to efficiently solve incremental SLAM problems, and deal with robust cost functions and different types of factors.

preprint2020arXiv

DeepFactors: Real-Time Probabilistic Dense Monocular SLAM

The ability to estimate rich geometry and camera motion from monocular imagery is fundamental to future interactive robotics and augmented reality applications. Different approaches have been proposed that vary in scene geometry representation (sparse landmarks, dense maps), the consistency metric used for optimising the multi-view problem, and the use of learned priors. We present a SLAM system that unifies these methods in a probabilistic framework while still maintaining real-time performance. This is achieved through the use of a learned compact depth map representation and reformulating three different types of errors: photometric, reprojection and geometric, which we make use of within standard factor graph software. We evaluate our system on trajectory estimation and depth reconstruction on real-world sequences and present various examples of estimated dense geometry.

preprint2020arXiv

MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

Robots and other smart devices need efficient object-based scene representations from their on-board vision systems to reason about contact, physics and occlusion. Recognized precise object models will play an important role alongside non-parametric reconstructions of unrecognized structures. We present a system which can estimate the accurate poses of multiple known objects in contact and occlusion from real-time, embodied multi-view vision. Our approach makes 3D object pose proposals from single RGB-D views, accumulates pose estimates and non-parametric occupancy information from multiple views as the camera moves, and performs joint optimization to estimate consistent, non-intersecting poses for multiple objects in contact. We verify the accuracy and robustness of our approach experimentally on 2 object datasets: YCB-Video, and our own challenging Cluttered YCB-Video. We demonstrate a real-time robotics application where a robot arm precisely and orderly disassembles complicated piles of objects, using only on-board RGB-D vision.

preprint2016arXiv

Comparative Design Space Exploration of Dense and Semi-Dense SLAM

SLAM has matured significantly over the past few years, and is beginning to appear in serious commercial products. While new SLAM systems are being proposed at every conference, evaluation is often restricted to qualitative visualizations or accuracy estimation against a ground truth. This is due to the lack of benchmarking methodologies which can holistically and quantitatively evaluate these systems. Further investigation at the level of individual kernels and parameter spaces of SLAM pipelines is non-existent, which is absolutely essential for systems research and integration. We extend the recently introduced SLAMBench framework to allow comparing two state-of-the-art SLAM pipelines, namely KinectFusion and LSD-SLAM, along the metrics of accuracy, energy consumption, and processing frame rate on two different hardware platforms, namely a desktop and an embedded device. We also analyze the pipelines at the level of individual kernels and explore their algorithmic and hardware design spaces for the first time, yielding valuable insights.

preprint2016arXiv

Deep Learning a Grasp Function for Grasping under Gripper Pose Uncertainty

This paper presents a new method for parallel-jaw grasping of isolated objects from depth images, under large gripper pose uncertainty. Whilst most approaches aim to predict the single best grasp pose from an image, our method first predicts a score for every possible grasp pose, which we denote the grasp function. With this, it is possible to achieve grasping robust to the gripper's pose uncertainty, by smoothing the grasp function with the pose uncertainty function. Therefore, if the single best pose is adjacent to a region of poor grasp quality, that pose will no longer be chosen, and instead a pose will be chosen which is surrounded by a region of high grasp quality. To learn this function, we train a Convolutional Neural Network which takes as input a single depth image of an object, and outputs a score for each grasp pose across the image. Training data for this is generated by use of physics simulation and depth image simulation with 3D object meshes, to enable acquisition of sufficient data without requiring exhaustive real-world experiments. We evaluate with both synthetic and real experiments, and show that the learned grasp score is more robust to gripper pose uncertainty than when this uncertainty is not accounted for.

preprint2016arXiv

Effective Backscatter Approximation for Photometry in Murky Water

Shading-based approaches like Photometric Stereo assume that the image formation model can be effectively optimized for the scene normals. However, in murky water this is a very challenging problem. The light from artificial sources is not only reflected by the scene but it is also scattered by the medium particles, yielding the backscatter component. Backscatter corresponds to a complex term with several unknown variables, and makes the problem of normal estimation hard. In this work, we show that instead of trying to optimize the complex backscatter model or use previous unrealistic simplifications, we can approximate the per-pixel backscatter signal directly from the captured images. Our method is based on the observation that backscatter is saturated beyond a certain distance, i.e. it becomes scene-depth independent, and finally corresponds to a smoothly varying signal which depends strongly on the light position with respect to each pixel. Our backscatter approximation method facilitates imaging and scene reconstruction in murky water when the illumination is artificial as in Photometric Stereo. Specifically, we show that it allows accurate scene normal estimation and offers potentials like single image restoration. We evaluate our approach using numerical simulations and real experiments within both the controlled environment of a big water-tank and real murky port-waters.

preprint2016arXiv

Pairwise Decomposition of Image Sequences for Active Multi-View Recognition

A multi-view image sequence provides a much richer capacity for object recognition than from a single image. However, most existing solutions to multi-view recognition typically adopt hand-crafted, model-based geometric methods, which do not readily embrace recent trends in deep learning. We propose to bring Convolutional Neural Networks to generic multi-view recognition, by decomposing an image sequence into a set of image pairs, classifying each pair independently, and then learning an object classifier by weighting the contribution of each pair. This allows for recognition over arbitrary camera trajectories, without requiring explicit training over the potentially infinite number of camera paths and lengths. Building these pairwise relationships then naturally extends to the next-best-view problem in an active recognition framework. To achieve this, we train a second Convolutional Neural Network to map directly from an observed image to next viewpoint. Finally, we incorporate this into a trajectory optimisation task, whereby the best recognition confidence is sought for a given trajectory length. We present state-of-the-art results in both guided and unguided multi-view recognition on the ModelNet dataset, and show how our method can be used with depth images, greyscale images, or both.

preprint2015arXiv

Interactive 3D Face Stylization Using Sculptural Abstraction

Sculptors often deviate from geometric accuracy in order to enhance the appearance of their sculpture. These subtle stylizations may emphasize anatomy, draw the viewer's focus to characteristic features of the subject, or symbolize textures that might not be accurately reproduced in a particular sculptural medium, while still retaining fidelity to the unique proportions of an individual. In this work we demonstrate an interactive system for enhancing face geometry using a class of stylizations based on visual decomposition into abstract semantic regions, which we call sculptural abstraction. We propose an interactive two-scale optimization framework for stylization based on sculptural abstraction, allowing real-time adjustment of both global and local parameters. We demonstrate this system's effectiveness in enhancing physical 3D prints of scans from various sources.

preprint2015arXiv

Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM

Real-time dense computer vision and SLAM offer great potential for a new level of scene modelling, tracking and real environmental interaction for many types of robot, but their high computational requirements mean that use on mass market embedded platforms is challenging. Meanwhile, trends in low-cost, low-power processing are towards massive parallelism and heterogeneity, making it difficult for robotics and vision researchers to implement their algorithms in a performance-portable way. In this paper we introduce SLAMBench, a publicly-available software framework which represents a starting point for quantitative, comparable and validatable experimental research to investigate trade-offs in performance, accuracy and energy consumption of a dense RGB-D SLAM system. SLAMBench provides a KinectFusion implementation in C++, OpenMP, OpenCL and CUDA, and harnesses the ICL-NUIM dataset of synthetic RGB-D sequences with trajectory and scene ground truth for reliable accuracy comparison of different implementation and algorithms. We present an analysis and breakdown of the constituent algorithmic elements of KinectFusion, and experimentally investigate their execution time on a variety of multicore and GPUaccelerated platforms. For a popular embedded platform, we also present an analysis of energy efficiency for different configuration alternatives.

Andrew J. Davison

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

Radiant Foam Rendering on a Graph Processor

Fit-NGP: Fitting Object Models to Neural Graphics Primitives

Auto-Lambda: Disentangling Dynamic Task Relationships

Bootstrapping Semantic Segmentation with Regional Contrast

Coarse-to-Fine Q-attention: Efficient Learning for Visual Robotic Manipulation via Discretisation

Incremental Abstraction in Distributed Probabilistic SLAM Graphs

Learning to Complete Object Shapes for Object-level Mapping in Dynamic Scenes

Q-attention: Enabling Efficient Learning for Vision-based Robotic Manipulation

ReorientBot: Learning Object Reorientation for Specific-Posed Placement

SafePicking: Learning Safe Object Extraction via Object-Level Mapping

Simultaneous Localisation and Mapping with Quadric Surfaces

End-to-End Egospheric Spatial Memory

Bundle Adjustment on a Graph Processor

DeepFactors: Real-Time Probabilistic Dense Monocular SLAM

MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

Comparative Design Space Exploration of Dense and Semi-Dense SLAM

Deep Learning a Grasp Function for Grasping under Gripper Pose Uncertainty

Effective Backscatter Approximation for Photometry in Murky Water

Pairwise Decomposition of Image Sequences for Active Multi-View Recognition

Interactive 3D Face Stylization Using Sculptural Abstraction

Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM