Source author record

Patricio A. Vela

Patricio A. Vela appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Robotics Computer Vision Machine Learning eess.SY Systems and Control

Catalog footprint

What is connected

14works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Dynamic Gap: Safe Gap-based Navigation in Dynamic Environments

This paper extends the family of gap-based local planners to unknown dynamic environments through generating provable collision-free properties for hierarchical navigation systems. Existing perception-informed local planners that operate in dynamic environments rely on emergent or empirical robustness for collision avoidance as opposed to performing formal analysis of dynamic obstacles. In addition to this, the obstacle tracking that is performed in these existent planners is often achieved with respect to a global inertial frame, subjecting such tracking estimates to transformation errors from odometry drift. The proposed local planner, dynamic gap, shifts the tracking paradigm to modeling how the free space, represented as gaps, evolves over time. Gap crossing and closing conditions are developed to aid in determining the feasibility of passage through gaps, and a breadth of simulation benchmarking is performed against other navigation planners in the literature where the proposed dynamic gap planner achieves the highest success rate out of all planners tested in all environments.

preprint2022arXiv

In-Place Rotation for Enhancing Snake-like Robot Mobility

Gaits engineered for snake-like robots to rotate in-place instrumentally fill a gap in the set of locomotive gaits that have traditionally prioritized translation. This paper designs a Turn-in-Place gait and demonstrates the ability of a shape-centric modeling framework to capture the gait's locomotive properties. Shape modeling for turning involves a time-varying continuous body curve described by a standing wave. Presumed viscous robot-ground frictional interactions lead to body dynamics conditioned on the time-varying shape model. The dynamic equations describing the Turn-in-Place gait are validated by an articulated snake-like robot using a physics-based simulator and a physical robot. The results affirm the shape-centric modeling framework's capacity to model a variety of snake-like robot gaits with fundamentally different body-ground contact patterns. As an applied demonstration, example locomotion scenarios partner the shape-centric Turn-in-Place gait with a Rectilinear gait for maneuvering through constrained environments based on a multi-modal locomotive planning strategy. Unified shape-centric modeling facilitates trajectory planning and tracking for a snake-like robot to successfully negotiate non-trivial obstacle configurations.

preprint2022arXiv

Keypoint-Based Category-Level Object Pose Tracking from an RGB Sequence with Uncertainty Estimation

We propose a single-stage, category-level 6-DoF pose estimation algorithm that simultaneously detects and tracks instances of objects within a known category. Our method takes as input the previous and current frame from a monocular RGB video, as well as predictions from the previous frame, to predict the bounding cuboid and 6-DoF pose (up to scale). Internally, a deep network predicts distributions over object keypoints (vertices of the bounding cuboid) in image coordinates, after which a novel probabilistic filtering process integrates across estimates before computing the final pose using PnP. Our framework allows the system to take previous uncertainties into consideration when predicting the current frame, resulting in predictions that are more accurate and stable than single frame methods. Extensive experiments show that our method outperforms existing approaches on the challenging Objectron benchmark of annotated object videos. We also demonstrate the usability of our work in an augmented reality setting.

preprint2022arXiv

Primitive Shape Recognition for Object Grasping

Shape informs how an object should be grasped, both in terms of where and how. As such, this paper describes a segmentation-based architecture for decomposing objects sensed with a depth camera into multiple primitive shapes, along with a post-processing pipeline for robotic grasping. Segmentation employs a deep network, called PS-CNN, trained on synthetic data with 6 classes of primitive shapes and generated using a simulation engine. Each primitive shape is designed with parametrized grasp families, permitting the pipeline to identify multiple grasp candidates per shape region. The grasps are rank ordered, with the first feasible one chosen for execution. For task-free grasping of individual objects, the method achieves a 94.2% success rate placing it amongst the top performing grasp methods when compared to top-down and SE(3)-based approaches. Additional tests involving variable viewpoints and clutter demonstrate robustness to setup. For task-oriented grasping, PS-CNN achieves a 93.0% success rate. Overall, the outcomes support the hypothesis that explicitly encoding shape primitives within a grasping pipeline should boost grasping performance, including task-free and task-relevant grasp prediction.

preprint2022arXiv

SGL: Symbolic Goal Learning in a Hybrid, Modular Framework for Human Instruction Following

This paper investigates robot manipulation based on human instruction with ambiguous requests. The intent is to compensate for imperfect natural language via visual observations. Early symbolic methods, based on manually defined symbols, built modular framework consist of semantic parsing and task planning for producing sequences of actions from natural language requests. Modern connectionist methods employ deep neural networks to automatically learn visual and linguistic features and map to a sequence of low-level actions, in an endto-end fashion. These two approaches are blended to create a hybrid, modular framework: it formulates instruction following as symbolic goal learning via deep neural networks followed by task planning via symbolic planners. Connectionist and symbolic modules are bridged with Planning Domain Definition Language. The vision-and-language learning network predicts its goal representation, which is sent to a planner for producing a task-completing action sequence. For improving the flexibility of natural language, we further incorporate implicit human intents with explicit human instructions. To learn generic features for vision and language, we propose to separately pretrain vision and language encoders on scene graph parsing and semantic textual similarity tasks. Benchmarking evaluates the impacts of different components of, or options for, the vision-and-language learning model and shows the effectiveness of pretraining strategies. Manipulation experiments conducted in the simulator AI2THOR show the robustness of the framework to novel scenarios.

preprint2022arXiv

Single-Stage Keypoint-Based Category-Level Object Pose Estimation from an RGB Image

Prior work on 6-DoF object pose estimation has largely focused on instance-level processing, in which a textured CAD model is available for each object being detected. Category-level 6-DoF pose estimation represents an important step toward developing robotic vision systems that operate in unstructured, real-world scenarios. In this work, we propose a single-stage, keypoint-based approach for category-level object pose estimation that operates on unknown object instances within a known category using a single RGB image as input. The proposed network performs 2D object detection, detects 2D keypoints, estimates 6-DoF pose, and regresses relative bounding cuboid dimensions. These quantities are estimated in a sequential fashion, leveraging the recent idea of convGRU for propagating information from easier tasks to those that are more difficult. We favor simplicity in our design choices: generic cuboid vertex coordinates, single-stage network, and monocular RGB input. We conduct extensive experiments on the challenging Objectron benchmark, outperforming state-of-the-art methods on the 3D IoU metric (27.6% higher than the MobilePose single-stage approach and 7.1% higher than the related two-stage approach).

preprint2021arXiv

NavTuner: Learning a Scene-Sensitive Family of Navigation Policies

The advent of deep learning has inspired research into end-to-end learning for a variety of problem domains in robotics. For navigation, the resulting methods may not have the generalization properties desired let alone match the performance of traditional methods. Instead of learning a navigation policy, we explore learning an adaptive policy in the parameter space of an existing navigation module. Having adaptive parameters provides the navigation module with a family of policies that can be dynamically reconfigured based on the local scene structure, and addresses the common assertion in machine learning that engineered solutions are inflexible. Of the methods tested, reinforcement learning (RL) is shown to provide a significant performance boost to a modern navigation method through reduced sensitivity of its success rate to environmental clutter. The outcomes indicate that RL as a meta-policy learner, or dynamic parameter tuner, effectively robustifies algorithms sensitive to external, measurable nuisance factors.

preprint2020arXiv

Closed-Loop Benchmarking of Stereo Visual-Inertial SLAM Systems: Understanding the Impact of Drift and Latency on Tracking Accuracy

Visual-inertial SLAM is essential for robot navigation in GPS-denied environments, e.g. indoor, underground. Conventionally, the performance of visual-inertial SLAM is evaluated with open-loop analysis, with a focus on the drift level of SLAM systems. In this paper, we raise the question on the importance of visual estimation latency in closed-loop navigation tasks, such as accurate trajectory tracking. To understand the impact of both drift and latency on visual-inertial SLAM systems, a closed-loop benchmarking simulation is conducted, where a robot is commanded to follow a desired trajectory using the feedback from visual-inertial estimation. By extensively evaluating the trajectory tracking performance of representative state-of-the-art visual-inertial SLAM systems, we reveal the importance of latency reduction in visual estimation module of these systems. The findings suggest directions of future improvements for visual-inertial SLAM.

preprint2020arXiv

Good Feature Matching: Towards Accurate, Robust VO/VSLAM with Low Latency

Analysis of state-of-the-art VO/VSLAM system exposes a gap in balancing performance (accuracy & robustness) and efficiency (latency). Feature-based systems exhibit good performance, yet have higher latency due to explicit data association; direct & semidirect systems have lower latency, but are inapplicable in some target scenarios or exhibit lower accuracy than feature-based ones. This paper aims to fill the performance-efficiency gap with an enhancement applied to feature-based VSLAM. We present good feature matching, an active map-to-frame feature matching method. Feature matching effort is tied to submatrix selection, which has combinatorial time complexity and requires choosing a scoring metric. Via simulation, the Max-logDet matrix revealing metric is shown to perform best. For real-time applicability, the combination of deterministic selection and randomized acceleration is studied. The proposed algorithm is integrated into monocular & stereo feature-based VSLAM systems. Extensive evaluations on multiple benchmarks and compute hardware quantify the latency reduction and the accuracy & robustness preservation.

preprint2020arXiv

Good Graph to Optimize: Cost-Effective, Budget-Aware Bundle Adjustment in Visual SLAM

The cost-efficiency of visual(-inertial) SLAM (VSLAM) is a critical characteristic of resource-limited applications. While hardware and algorithm advances have been significantly improved the cost-efficiency of VSLAM front-ends, the cost-efficiency of VSLAM back-ends remains a bottleneck. This paper describes a novel, rigorous method to improve the cost-efficiency of local BA in a BA-based VSLAM back-end. An efficient algorithm, called Good Graph, is developed to select size-reduced graphs optimized in local BA with condition preservation. To better suit BA-based VSLAM back-ends, the Good Graph predicts future estimation needs, dynamically assigns an appropriate size budget, and selects a condition-maximized subgraph for BA estimation. Evaluations are conducted on two scenarios: 1) VSLAM as standalone process, and 2) VSLAM as part of closed-loop navigation system. Results from the first scenario show Good Graph improves accuracy and robustness of VSLAM estimation, when computational limits exist. Results from the second scenario, indicate that Good Graph benefits the trajectory tracking performance of VSLAM-based closed-loop navigation systems, which is a primary application of VSLAM.

preprint2020arXiv

Recognizing Object Affordances to Support Scene Reasoning for Manipulation Tasks

Affordance information about a scene provides important clues as to what actions may be executed in pursuit of meeting a specified goal state. Thus, integrating affordance-based reasoning into symbolic action plannning pipelines would enhance the flexibility of robot manipulation. Unfortunately, the top performing affordance recognition methods use object category priors to boost the accuracy of affordance detection and segmentation. Object priors limit generalization to unknown object categories. This paper describes an affordance recognition pipeline based on a category-agnostic region proposal network for proposing instance regions of an image across categories. To guide affordance learning in the absence of category priors, the training process includes the auxiliary task of explicitly inferencing existing affordances within a proposal. Secondly, a self-attention mechanism trained to interpret each proposal learns to capture rich contextual dependencies through the region. Visual benchmarking shows that the trained network, called AffContext, reduces the performance gap between object-agnostic and object-informed affordance recognition. AffContext is linked to the Planning Domain Definition Language (PDDL) with an augmented state keeper for action planning across temporally spaced goal-oriented tasks. Manipulation experiments show that AffContext can successfully parse scene content to seed a symbolic planner problem specification, whose execution completes the target task. Additionally, task-oriented grasping for cutting and pounding actions demonstrate the exploitation of multiple affordances for a given object to complete specified tasks.

preprint2016arXiv

Learning Binary Features Online from Motion Dynamics for Incremental Loop-Closure Detection and Place Recognition

This paper proposes a simple yet effective approach to learn visual features online for improving loop-closure detection and place recognition, based on bag-of-words frameworks. The approach learns a codeword in bag-of-words model from a pair of matched features from two consecutive frames, such that the codeword has temporally-derived perspective invariance to camera motion. The learning algorithm is efficient: the binary descriptor is generated from the mean image patch, and the mask is learned based on discriminative projection by minimizing the intra-class distances among the learned feature and the two original features. A codeword for bag-of-words models is generated by packaging the learned descriptor and mask, with a masked Hamming distance defined to measure the distance between two codewords. The geometric properties of the learned codewords are then mathematically justified. In addition, hypothesis constraints are imposed through temporal consistency in matched codewords, which improves precision. The approach, integrated in an incremental bag-of-words system, is validated on multiple benchmark data sets and compared to state-of-the-art methods. Experiments demonstrate improved precision/recall outperforming state of the art with little loss in runtime.

preprint2015arXiv

Reduced-Set Kernel Principal Components Analysis for Improving the Training and Execution Speed of Kernel Machines

This paper presents a practical, and theoretically well-founded, approach to improve the speed of kernel manifold learning algorithms relying on spectral decomposition. Utilizing recent insights in kernel smoothing and learning with integral operators, we propose Reduced Set KPCA (RSKPCA), which also suggests an easy-to-implement method to remove or replace samples with minimal effect on the empirical operator. A simple data point selection procedure is given to generate a substitute density for the data, with accuracy that is governed by a user-tunable parameter . The effect of the approximation on the quality of the KPCA solution, in terms of spectral and operator errors, can be shown directly in terms of the density estimate error and as a function of the parameter . We show in experiments that RSKPCA can improve both training and evaluation time of KPCA by up to an order of magnitude, and compares favorably to the widely-used Nystrom and density-weighted Nystrom methods.

preprint2012arXiv

A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. In this paper, we first present an overview of these methods with an emphasis on their computational efficiency. We then compare eight commonly used linear time complexity initialization methods on a large and diverse collection of data sets using various performance criteria. Finally, we analyze the experimental results using non-parametric statistical tests and provide recommendations for practitioners. We demonstrate that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods.

Patricio A. Vela

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Dynamic Gap: Safe Gap-based Navigation in Dynamic Environments

In-Place Rotation for Enhancing Snake-like Robot Mobility

Keypoint-Based Category-Level Object Pose Tracking from an RGB Sequence with Uncertainty Estimation

Primitive Shape Recognition for Object Grasping

SGL: Symbolic Goal Learning in a Hybrid, Modular Framework for Human Instruction Following

Single-Stage Keypoint-Based Category-Level Object Pose Estimation from an RGB Image

NavTuner: Learning a Scene-Sensitive Family of Navigation Policies

Closed-Loop Benchmarking of Stereo Visual-Inertial SLAM Systems: Understanding the Impact of Drift and Latency on Tracking Accuracy

Good Feature Matching: Towards Accurate, Robust VO/VSLAM with Low Latency

Good Graph to Optimize: Cost-Effective, Budget-Aware Bundle Adjustment in Visual SLAM

Recognizing Object Affordances to Support Scene Reasoning for Manipulation Tasks

Learning Binary Features Online from Motion Dynamics for Incremental Loop-Closure Detection and Place Recognition

Reduced-Set Kernel Principal Components Analysis for Improving the Training and Execution Speed of Kernel Machines

A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm