Researcher profile

Daniele De Martini

Daniele De Martini contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
15works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

15 published item(s)

preprint2026arXiv

Fixed External Cameras as Common Prior Maps for Active 3D Scene Graph Generation

Commonly available prior information, such as BIM models, floor plans, and remote sensing images, can provide valuable geometric and semantic context for autonomous robotic systems. In this paper, we treat observations from fixed external RGB cameras as Common Prior Maps (CPMs): wide-field views of the environment that initialize a semantic and geometric scene prior before any robot motion begins. We present an RGB-only framework for active, incremental 3D scene graph (3DSG) generation that seamlessly fuses observations from both onboard robot cameras and fixed external cameras within a single hardware-agnostic pipeline. By relying solely on RGB observations processed by a feed-forward 3D reconstruction model, the system treats all cameras - onboard or external - identically, requiring no hardware modifications. A graph-based active semantic exploration framework then directly leverages the partial scene graph to guide the robot toward regions of high semantic uncertainty, progressively completing and refining the prior. Experiments demonstrate that bootstrapping the scene graph with even a single external camera increases initial object recall by up to +79%, and that the richer context of the prior significantly improves the efficiency of subsequent active exploration.

preprint2026arXiv

RGB-only Active 3D Scene Graph Generation for Indoor Mobile Robots

Current approaches to 3D scene graph generation rely on dedicated depth sensors, such as LiDAR or RGB-D cameras, for metric 3D reconstruction. This limits deployment to specialized robotic platforms and excludes settings where only RGB cameras are available, such as fixed external infrastructure. Existing pipelines also typically operate on passively collected observation trajectories, rather than selecting viewpoints based on the partially built scene representation, and therefore fail to effectively exploit the semantic and spatial information encoded within the graph during exploration. This paper presents a fully visual framework for the active, incremental construction of 3D scene graphs from RGB input only, addressing both limitations. The proposed approach unifies perception and planning around a shared structured representation that captures object semantics, 3D geometry, relational context, and information from multiple viewpoints. Because the framework is hardware-agnostic and relies only on RGB observations, it can incorporate inputs from both onboard robot cameras and fixed external cameras within the same representation. Experiments on the Replica dataset show that the RGB-only pipeline achieves F1-score parity with baselines using ground-truth depth. Active exploration experiments on ReplicaCAD further show that semantic-driven viewpoint selection detects more than twice as many objects as a geometric frontier-based baseline under the same exploration budget. Finally, the external-camera setting demonstrates that complementary RGB views can effectively bootstrap the scene graph and improve contextual understanding at no additional exploration cost.

preprint2022arXiv

BoxGraph: Semantic Place Recognition and Pose Estimation from 3D LiDAR

This paper is about extremely robust and lightweight localisation using LiDAR point clouds based on instance segmentation and graph matching. We model 3D point clouds as fully-connected graphs of semantically identified components where each vertex corresponds to an object instance and encodes its shape. Optimal vertex association across graphs allows for full 6-Degree-of-Freedom (DoF) pose estimation and place recognition by measuring similarity. This representation is very concise, condensing the size of maps by a factor of 25 against the state-of-the-art, requiring only 3kB to represent a 1.4MB laser scan. We verify the efficacy of our system on the SemanticKITTI dataset, where we achieve a new state-of-the-art in place recognition, with an average of 88.4% recall at 100% precision where the next closest competitor follows with 64.9%. We also show accurate metric pose estimation performance - estimating 6-DoF pose with median errors of 10 cm and 0.33 deg.

preprint2022arXiv

Depth-SIMS: Semi-Parametric Image and Depth Synthesis

In this paper we present a compositing image synthesis method that generates RGB canvases with well aligned segmentation maps and sparse depth maps, coupled with an in-painting network that transforms the RGB canvases into high quality RGB images and the sparse depth maps into pixel-wise dense depth maps. We benchmark our method in terms of structural alignment and image quality, showing an increase in mIoU over SOTA by 3.7 percentage points and a highly competitive FID. Furthermore, we analyse the quality of the generated data as training data for semantic segmentation and depth completion, and show that our approach is more suited for this purpose than other methods.

preprint2022arXiv

Fast-MbyM: Leveraging Translational Invariance of the Fourier Transform for Efficient and Accurate Radar Odometry

Masking By Moving (MByM), provides robust and accurate radar odometry measurements through an exhaustive correlative search across discretised pose candidates. However, this dense search creates a significant computational bottleneck which hinders real-time performance when high-end GPUs are not available. Utilising the translational invariance of the Fourier Transform, in our approach, f-MByM, we decouple the search for angle and translation. By maintaining end-to-end differentiability a neural network is used to mask scans and trained by supervising pose prediction directly. Training faster and with less memory, utilising a decoupled search allows f-MByM to achieve significant run-time performance improvements on a CPU (168%) and to run in real-time on embedded devices, in stark contrast to MByM. Throughout, our approach remains accurate and competitive with the best radar odometry variants available in the literature -- achieving an end-point drift of 2.01% in translation and 6.3deg/km on the Oxford Radar RobotCar Dataset.

preprint2022arXiv

Sampling, Communication, and Prediction Co-Design for Synchronizing the Real-World Device and Digital Model in Metaverse

The metaverse has the potential to revolutionize the next generation of the Internet by supporting highly interactive services with the help of Mixed Reality (MR) technologies; still, to provide a satisfactory experience for users, the synchronization between the physical world and its digital models is crucial. This work proposes a sampling, communication and prediction co-design framework to minimize the communication load subject to a constraint on tracking the Mean Squared Error (MSE) between a real-world device and its digital model in the metaverse. To optimize the sampling rate and the prediction horizon, we exploit expert knowledge and develop a constrained Deep Reinforcement Learning (DRL) algorithm, named Knowledge-assisted Constrained Twin-Delayed Deep Deterministic (KC-TD3) policy gradient algorithm. We validate our framework on a prototype composed of a real-world robotic arm and its digital model. Compared with existing approaches: (1) When the tracking error constraint is stringent (MSE=0.002 degrees), our policy degenerates into the policy in the sampling-communication co-design framework. (2) When the tracking error constraint is mild (MSE=0.007 degrees), our policy degenerates into the policy in the prediction-communication co-design framework. (3) Our framework achieves a better trade-off between the average MSE and the average communication load compared with a communication system without sampling and prediction. For example, the average communication load can be reduced up to 87% when the track error constraint is 0.002 degrees. (4) Our policy outperforms the benchmark with the static sampling rate and prediction horizon optimized by exhaustive search, in terms of the tail probability of the tracking error. Furthermore, with the assistance of expert knowledge, the proposed algorithm KC-TD3 achieves better convergence time, stability, and final policy performance.

preprint2022arXiv

What Goes Around: Leveraging a Constant-curvature Motion Constraint in Radar Odometry

This paper presents a method that leverages vehicle motion constraints to refine data associations in a point-based radar odometry system. By using the strong prior on how a non-holonomic robot is constrained to move smoothly through its environment, we develop the necessary framework to estimate ego-motion from a single landmark association rather than considering all of these correspondences at once. This allows for informed outlier detection of poor matches that are a dominant source of pose estimate error. By refining the subset of matched landmarks, we see an absolute decrease of 2.15% (from 4.68% to 2.53%) in translational error, approximately halving the error in odometry (reducing by 45.94%) than when using the full set of correspondences. This contribution is relevant to other point-based odometry implementations that rely on a range sensor and provides a lightweight and interpretable means of incorporating vehicle dynamics for ego-motion estimation.

preprint2021arXiv

CPG-ACTOR: Reinforcement Learning for Central Pattern Generators

Central Pattern Generators (CPGs) have several properties desirable for locomotion: they generate smooth trajectories, are robust to perturbations and are simple to implement. Although conceptually promising, we argue that the full potential of CPGs has so far been limited by insufficient sensory-feedback information. This paper proposes a new methodology that allows tuning CPG controllers through gradient-based optimization in a Reinforcement Learning (RL) setting. To the best of our knowledge, this is the first time CPGs have been trained in conjunction with a MultilayerPerceptron (MLP) network in a Deep-RL context. In particular, we show how CPGs can directly be integrated as the Actor in an Actor-Critic formulation. Additionally, we demonstrate how this change permits us to integrate highly non-linear feedback directly from sensory perception to reshape the oscillators' dynamics. Our results on a locomotion task using a single-leg hopper demonstrate that explicitly using the CPG as the Actor rather than as part of the environment results in a significant increase in the reward gained over time (6x more) compared with previous approaches. Furthermore, we show that our method without feedback reproduces results similar to prior work with feedback. Finally, we demonstrate how our closed-loop CPG progressively improves the hopping behaviour for longer training epochs relying only on basic reward functions.

preprint2021arXiv

Fool Me Once: Robust Selective Segmentation via Out-of-Distribution Detection with Contrastive Learning

In this work, we train a network to simultaneously perform segmentation and pixel-wise Out-of-Distribution (OoD) detection, such that the segmentation of unknown regions of scenes can be rejected. This is made possible by leveraging an OoD dataset with a novel contrastive objective and data augmentation scheme. By combining data including unknown classes in the training data, a more robust feature representation can be learned with known classes represented distinctly from those unknown. When presented with unknown classes or conditions, many current approaches for segmentation frequently exhibit high confidence in their inaccurate segmentations and cannot be trusted in many operational environments. We validate our system on a real-world dataset of unusual driving scenes, and show that by selectively segmenting scenes based on what is predicted as OoD, we can increase the segmentation accuracy by an IoU of 0.2 with respect to alternative techniques.

preprint2020arXiv

Keep off the Grass: Permissible Driving Routes from Radar with Weak Audio Supervision

Reliable outdoor deployment of mobile robots requires the robust identification of permissible driving routes in a given environment. The performance of LiDAR and vision-based perception systems deteriorates significantly if certain environmental factors are present e.g. rain, fog, darkness. Perception systems based on FMCW scanning radar maintain full performance regardless of environmental conditions and with a longer range than alternative sensors. Learning to segment a radar scan based on driveability in a fully supervised manner is not feasible as labelling each radar scan on a bin-by-bin basis is both difficult and time-consuming to do by hand. We therefore weakly supervise the training of the radar-based classifier through an audio-based classifier that is able to predict the terrain type underneath the robot. By combining odometry, GPS and the terrain labels from the audio classifier, we are able to construct a terrain labelled trajectory of the robot in the environment which is then used to label the radar scans. Using a curriculum learning procedure, we then train a radar segmentation network to generalise beyond the initial labelling and to detect all permissible driving routes in the environment.

preprint2020arXiv

Kidnapped Radar: Topological Radar Localisation using Rotationally-Invariant Metric Learning

This paper presents a system for robust, large-scale topological localisation using Frequency-Modulated Continuous-Wave (FMCW) scanning radar. We learn a metric space for embedding polar radar scans using CNN and NetVLAD architectures traditionally applied to the visual domain. However, we tailor the feature extraction for more suitability to the polar nature of radar scan formation using cylindrical convolutions, anti-aliasing blurring, and azimuth-wise max-pooling; all in order to bolster the rotational invariance. The enforced metric space is then used to encode a reference trajectory, serving as a map, which is queried for nearest neighbours (NNs) for recognition of places at run-time. We demonstrate the performance of our topological localisation system over the course of many repeat forays using the largest radar-focused mobile autonomy dataset released to date, totalling 280 km of urban driving, a small portion of which we also use to learn the weights of the modified architecture. As this work represents a novel application for FMCW radar, we analyse the utility of the proposed method via a comprehensive set of metrics which provide insight into the efficacy when used in a realistic system, showing improved performance over the root architecture even in the face of random rotational perturbation.

preprint2020arXiv

Look Around You: Sequence-based Radar Place Recognition with Learned Rotational Invariance

This paper details an application which yields significant improvements to the adeptness of place recognition with Frequency-Modulated Continuous-Wave radar - a commercially promising sensor poised for exploitation in mobile autonomy. We show how a rotationally-invariant metric embedding for radar scans can be integrated into sequence-based trajectory matching systems typically applied to videos taken by visual sensors. Due to the complete horizontal field of view inherent to the radar scan formation process, we show how this off-the-shelf sequence-based trajectory matching system can be manipulated to detect place matches when the vehicle is travelling down a previously visited stretch of road in the opposite direction. We demonstrate the efficacy of the approach on 26 km of challenging urban driving taken from the largest radar-focused urban autonomy dataset released to date -- showing a boost of 30% in recall at high levels of precision over a nearest neighbour approach.

preprint2020arXiv

RSL-Net: Localising in Satellite Images From a Radar on the Ground

This paper is about localising a vehicle in an overhead image using FMCW radar mounted on a ground vehicle. FMCW radar offers extraordinary promise and efficacy for vehicle localisation. It is impervious to all weather types and lighting conditions. However the complexity of the interactions between millimetre radar wave and the physical environment makes it a challenging domain. Infrastructure-free large-scale radar-based localisation is in its infancy. Typically here a map is built and suitable techniques, compatible with the nature of sensor, are brought to bear. In this work we eschew the need for a radar-based map; instead we simply use an overhead image -- a resource readily available everywhere. This paper introduces a method that not only naturally deals with the complexity of the signal type but does so in the context of cross modal processing.

preprint2020arXiv

RSS-Net: Weakly-Supervised Multi-Class Semantic Segmentation with FMCW Radar

This paper presents an efficient annotation procedure and an application thereof to end-to-end, rich semantic segmentation of the sensed environment using FMCW scanning radar. We advocate radar over the traditional sensors used for this task as it operates at longer ranges and is substantially more robust to adverse weather and illumination conditions. We avoid laborious manual labelling by exploiting the largest radar-focused urban autonomy dataset collected to date, correlating radar scans with RGB cameras and LiDAR sensors, for which semantic segmentation is an already consolidated procedure. The training procedure leverages a state-of-the-art natural image segmentation system which is publicly available and as such, in contrast to previous approaches, allows for the production of copious labels for the radar stream by incorporating four camera and two LiDAR streams. Additionally, the losses are computed taking into account labels to the radar sensor horizon by accumulating LiDAR returns along a pose-chain ahead and behind of the current vehicle position. Finally, we present the network with multi-channel radar scan inputs in order to deal with ephemeral and dynamic scene objects.

preprint2020arXiv

Sense-Assess-eXplain (SAX): Building Trust in Autonomous Vehicles in Challenging Real-World Driving Scenarios

This paper discusses ongoing work in demonstrating research in mobile autonomy in challenging driving scenarios. In our approach, we address fundamental technical issues to overcome critical barriers to assurance and regulation for large-scale deployments of autonomous systems. To this end, we present how we build robots that (1) can robustly sense and interpret their environment using traditional as well as unconventional sensors; (2) can assess their own capabilities; and (3), vitally in the purpose of assurance and trust, can provide causal explanations of their interpretations and assessments. As it is essential that robots are safe and trusted, we design, develop, and demonstrate fundamental technologies in real-world applications to overcome critical barriers which impede the current deployment of robots in economically and socially important areas. Finally, we describe ongoing work in the collection of an unusual, rare, and highly valuable dataset.