Researcher profile

Gonzalo Ferrer

Gonzalo Ferrer contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

CADFS: A Big CAD Program Dataset and Framework for Computer-Aided Design with Large Language Models

We introduce CADFS, a data-centric framework that enables large vision-language models to generate complex CAD design histories. Existing generative CAD systems are restricted to sketch-extrude operations due to simplified representations and limited datasets. We address this by introducing a FeatureScript-based representation and constructing a dataset of 450k real-world CAD models spanning 15 modeling operations. We obtain the dataset via a new pipeline that reconstructs clean, executable FeatureScript programs and provides multimodal annotations. Fine-tuning a VLM on this representation yields state-of-the-art results in text-conditioned CAD generation and image-based reconstruction, producing more accurate, diverse, and feature-rich designs than prior frameworks. Ablations show that each individual component of our framework, i.e., the FeatureScript representation, the extended operation set, and representation-aligned textual descriptions, significantly improves performance. Our framework substantially broadens the complexity and realism achievable in generative CAD. The CADFS framework and the new dataset are available at https://voyleg.github.io/cadfs/.

preprint2026arXiv

DynoSLAM: Dynamic SLAM with Generative Graph Neural Networks for Real-World Social Navigation

Traditional Simultaneous Localization and Mapping (SLAM) algorithms rely heavily on the static environment assumption, which severely limits their applicability in real-world spaces populated by moving entities, such as pedestrians. In this work, we propose DynoSLAM, a tightly-coupled Dynamic GraphSLAM architecture that integrates socially-aware Graph Neural Networks (GNNs) directly into the factor graph optimization. Unlike conventional approaches that use rigid constant-velocity heuristics or deterministic single-agent neural priors, our framework formulates pedestrian motion forecasting as a stochastic World Model. By utilizing Monte Carlo rollouts from a trained GNN, we capture the multimodal epistemic uncertainty of human interactions and embed it into the SLAM graph via a dynamic Mahalanobis distance factor. We demonstrate through extensive simulated experiments that this stochastic formulation not only maintains highly accurate retrospective tracking but also prevents the optimization failures caused by the deterministic "argmax problem". Ultimately, extracting the empirical mean and covariance matrices of future pedestrian states provides a mathematically rigorous, probabilistic safety envelope for downstream local planners, enabling anticipatory and collision-free robot navigation in densely crowded environments.

preprint2023arXiv

DDPEN: Trajectory Optimisation With Sub Goal Generation Model

Differential dynamic programming (DDP) is a widely used and powerful trajectory optimization technique, however, due to its internal structure, it is not exempt from local minima. In this paper, we present Differential Dynamic Programming with Escape Network (DDPEN) - a novel approach to avoid DDP local minima by utilising an additional term used in the optimization criteria pointing towards the direction where robot should move in order to escape local minima. In order to produce the aforementioned directions, we propose to utilize a deep model that takes as an input the map of the environment in the form of a costmap together with the desired goal position. The Model produces possible future directions that will lead to the goal, avoiding local minima which is possible to run in real time conditions. The model is trained on a synthetic dataset and overall the system is evaluated at the Gazebo simulator. In this work we show that our proposed method allows avoiding local minima of trajectory optimization algorithm and successfully execute a trajectory 278 m long with various convex and nonconvex obstacles.

preprint2022arXiv

Conditioned Human Trajectory Prediction using Iterative Attention Blocks

Human motion prediction is key to understand social environments, with direct applications in robotics, surveillance, etc. We present a simple yet effective pedestrian trajectory prediction model aimed at pedestrians positions prediction in urban-like environments conditioned by the environment: map and surround agents. Our model is a neural-based architecture that can run several layers of attention blocks and transformers in an iterative sequential fashion, allowing to capture the important features in the environment that improve prediction. We show that without explicit introduction of social masks, dynamical models, social pooling layers, or complicated graph-like structures, it is possible to produce on par results with SoTA models, which makes our approach easily extendable and configurable, depending on the data available. We report results performing similarly with SoTA models on publicly available and extensible-used datasets with unimodal prediction metrics ADE and FDE.

preprint2022arXiv

EVOPS Benchmark: Evaluation of Plane Segmentation from RGBD and LiDAR Data

This paper provides the EVOPS dataset for plane segmentation from 3D data, both from RGBD images and LiDAR point clouds. We have designed two annotation methodologies (RGBD and LiDAR) running on well-known and widely-used datasets for SLAM evaluation and we have provided a complete set of benchmarking tools including point, planes and segmentation metrics. The data includes a total number of 10k RGBD and 7K LiDAR frames over different selected scenes which consist of high quality segmented planes. The experiments report quality of SOTA methods for RGBD plane segmentation on our annotated data. We also have provided learnable baseline for plane segmentation in LiDAR point clouds. All labeled data and benchmark tools used have been made publicly available at https://evops.netlify.app/.

preprint2022arXiv

Open-Source LiDAR Time Synchronization System by Mimicking GNSS-clock

Data fusion algorithms that employ LiDAR measurements, such as Visual-LiDAR, LiDAR-Inertial, or Multiple LiDAR Odometry and simultaneous localization and mapping (SLAM) rely on precise timestamping schemes that grant synchronicity to data from LiDAR and other sensors. Poor synchronization performance, due to incorrect timestamping procedure, may negatively affect the algorithms' state estimation results. To provide highly accurate and precise synchronization between the sensors, we introduce an open-source hardware-software LiDAR to other sensors time synchronization system that exploits a dedicated hardware LiDAR time synchronization interface by providing emulated GNSS-clock to this interface, no physical GNSS-receiver is needed. The emulator is based on a general-purpose microcontroller and, due to concise hardware and software architecture, can be easily modified or extended for synchronization of sets of different sensors such as cameras, inertial measurement units (IMUs), wheel encoders, other LiDARs, etc. In the paper, we provide an example of such a system with synchronized LiDAR and IMU sensors. We conducted an evaluation of the sensors synchronization accuracy and precision, and state 1 microsecond performance. We compared our results with timestamping provided by ROS software and by a LiDAR inner clocking scheme to underline clear advantages over these two baseline methods.

preprint2022arXiv

SmartDepthSync: Open Source Synchronized Video Recording System of Smartphone RGB and Depth Camera Range Image Frames with Sub-millisecond Precision

Nowadays, smartphones can produce a synchronized (synced) stream of high-quality data, including RGB images, inertial measurements, and other data. Therefore, smartphones are becoming appealing sensor systems in the robotics community. Unfortunately, there is still the need for external supporting sensing hardware, such as a depth camera precisely synced with the smartphone sensors. In this paper, we propose a hardware-software recording system that presents a heterogeneous structure and contains a smartphone and an external depth camera for recording visual, depth, and inertial data that are mutually synchronized. The system is synced at the time and the frame levels: every RGB image frame from the smartphone camera is exposed at the same moment of time with a depth camera frame with sub-millisecond precision. We provide a method and a tool for sync performance evaluation that can be applied to any pair of depth and RGB cameras. Our system could be replicated, modified, or extended by employing our open-sourced materials.

preprint2022arXiv

SmartPortraits: Depth Powered Handheld Smartphone Dataset of Human Portraits for State Estimation, Reconstruction and Synthesis

We present a dataset of 1000 video sequences of human portraits recorded in real and uncontrolled conditions by using a handheld smartphone accompanied by an external high-quality depth camera. The collected dataset contains 200 people captured in different poses and locations and its main purpose is to bridge the gap between raw measurements obtained from a smartphone and downstream applications, such as state estimation, 3D reconstruction, view synthesis, etc. The sensors employed in data collection are the smartphone's camera and Inertial Measurement Unit (IMU), and an external Azure Kinect DK depth camera software synchronized with sub-millisecond precision to the smartphone system. During the recording, the smartphone flash is used to provide a periodic secondary source of lightning. Accurate mask of the foremost person is provided as well as its impact on the camera alignment accuracy. For evaluation purposes, we compare multiple state-of-the-art camera alignment methods by using a Motion Capture system. We provide a smartphone visual-inertial benchmark for portrait capturing, where we report results for multiple methods and motivate further use of the provided trajectories, available in the dataset, in view synthesis and 3D reconstruction tasks.