Source author record

Tucker Hermans

Tucker Hermans appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Robotics Artificial Intelligence Machine Learning Computer Vision Computation and Language Computational Engineering, Finance, and Science

Catalog footprint

What is connected

17works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Do What You Say: Steering Vision-Language-Action Models via Runtime Reasoning-Action Alignment Verification

Reasoning Vision Language Action (VLA) models improve robotic instruction-following by generating step-by-step textual plans before low-level actions, an approach inspired by Chain-of-Thought (CoT) reasoning in language models. Yet even with a correct textual plan, the generated actions can still miss the intended outcomes in the plan, especially in out-of-distribution (OOD) scenarios. We formalize this phenomenon as a lack of embodied CoT faithfulness, and introduce a training-free, runtime policy steering method for reasoning-action alignment. Given a reasoning VLA's intermediate textual plan, our framework samples multiple candidate action sequences from the same model, predicts their outcomes via simulation, and uses a pre-trained Vision-Language Model (VLM) to select the sequence whose outcome best aligns with the VLA's own textual plan. Only executing action sequences that align with the textual reasoning turns our base VLA's natural action diversity from a source of error into a strength, boosting robustness to semantic and visual OOD perturbations and enabling novel behavior composition without costly re-training. We also contribute a reasoning-annotated extension of LIBERO-100, environment variations tailored for OOD evaluation, and demonstrate up to 15% performance gain over prior work on behavior composition tasks and scales with compute and data diversity. Project Website at: https://yilin-wu98.github.io/steering-reasoning-vla/

preprint2022arXiv

Active Learning of Probabilistic Movement Primitives

A Probabilistic Movement Primitive (ProMP) defines a distribution over trajectories with an associated feedback policy. ProMPs are typically initialized from human demonstrations and achieve task generalization through probabilistic operations. However, there is currently no principled guidance in the literature to determine how many demonstrations a teacher should provide and what constitutes a "good" demonstration for promoting generalization. In this paper, we present an active learning approach to learning a library of ProMPs capable of task generalization over a given space. We utilize uncertainty sampling techniques to generate a task instance for which a teacher should provide a demonstration. The provided demonstration is incorporated into an existing ProMP if possible, or a new ProMP is created from the demonstration if it is determined that it is too dissimilar from existing demonstrations. We provide a qualitative comparison between common active learning metrics; motivated by this comparison we present a novel uncertainty sampling approach named Greatest Mahalanobis Distance. We perform grasping experiments on a real KUKA robot and show our novel active learning measure achieves better task generalization with fewer demonstrations than a random sampling over the space.

preprint2022arXiv

Comparing Piezoresistive Substrates for Tactile Sensing in Dexterous Hands

While tactile skins have been shown to be useful for detecting collisions between a robotic arm and its environment, they have not been extensively used for improving robotic grasping and in-hand manipulation. We propose a novel sensor design for use in covering existing multi-fingered robot hands. We analyze the performance of four different piezoresistive materials using both fabric and anti-static foam substrates in benchtop experiments. We find that although the piezoresistive foam was designed as packing material and not for use as a sensing substrate, it performs comparably with fabrics specifically designed for this purpose. While these results demonstrate the potential of piezoresistive foams for tactile sensing applications, they do not fully characterize the efficacy of these sensors for use in robot manipulation. As such, we use a low density foam substrate to develop a scalable tactile skin that can be attached to the palm of a robotic hand. We demonstrate several robotic manipulation tasks using this sensor to show its ability to reliably detect and localize contact, as well as analyze contact patterns during grasping and transport tasks. Our project website provides details on all materials, software, and data used in the sensor development and analysis: https://sites.google.com/gcloud.utah.edu/piezoresistive-tactile-sensing/.

preprint2022arXiv

Correcting Robot Plans with Natural Language Feedback

When humans design cost or goal specifications for robots, they often produce specifications that are ambiguous, underspecified, or beyond planners' ability to solve. In these cases, corrections provide a valuable tool for human-in-the-loop robot control. Corrections might take the form of new goal specifications, new constraints (e.g. to avoid specific objects), or hints for planning algorithms (e.g. to visit specific waypoints). Existing correction methods (e.g. using a joystick or direct manipulation of an end effector) require full teleoperation or real-time interaction. In this paper, we explore natural language as an expressive and flexible tool for robot correction. We describe how to map from natural language sentences to transformations of cost functions. We show that these transformations enable users to correct goals, update robot motions to accommodate additional user preferences, and recover from planning errors. These corrections can be leveraged to get 81% and 93% success rates on tasks where the original planner failed, with either one or two language corrections. Our method makes it possible to compose multiple constraints and generalizes to unseen scenes, objects, and sentences in simulated environments and real-world environments.

preprint2022arXiv

DefGraspSim: Physics-based simulation of grasp outcomes for 3D deformable objects

Robotic grasping of 3D deformable objects (e.g., fruits/vegetables, internal organs, bottles/boxes) is critical for real-world applications such as food processing, robotic surgery, and household automation. However, developing grasp strategies for such objects is uniquely challenging. Unlike rigid objects, deformable objects have infinite degrees of freedom and require field quantities (e.g., deformation, stress) to fully define their state. As these quantities are not easily accessible in the real world, we propose studying interaction with deformable objects through physics-based simulation. As such, we simulate grasps on a wide range of 3D deformable objects using a GPU-based implementation of the corotational finite element method (FEM). To facilitate future research, we open-source our simulated dataset (34 objects, 1e5 Pa elasticity range, 6800 grasp evaluations, 1.1M grasp measurements), as well as a code repository that allows researchers to run our full FEM-based grasp evaluation pipeline on arbitrary 3D object models of their choice. Finally, we demonstrate good correspondence between grasp outcomes on simulated objects and their real counterparts.

preprint2022arXiv

DULA and DEBA: Differentiable Ergonomic Risk Models for Postural Assessment and Optimization in Ergonomically Intelligent pHRI

Ergonomics and human comfort are essential concerns in physical human-robot interaction applications. Defining an accurate and easy-to-use ergonomic assessment model stands as an important step in providing feedback for postural correction to improve operator health and comfort. Common practical methods in the area suffer from inaccurate ergonomics models in performing postural optimization. In order to retain assessment quality, while improving computational considerations, we propose a novel framework for postural assessment and optimization for ergonomically intelligent physical human-robot interaction. We introduce DULA and DEBA, differentiable and continuous ergonomics models learned to replicate the popular and scientifically validated RULA and REBA assessments with more than 99% accuracy. We show that DULA and DEBA provide assessment comparable to RULA and REBA while providing computational benefits when being used in postural optimization. We evaluate our framework through human and simulation experiments. We highlight DULA and DEBA's strength in a demonstration of postural optimization for a simulated pHRI task.

preprint2022arXiv

Learning Task Constraints from Demonstration for Hybrid Force/Position Control

We present a novel method for learning hybrid force/position control from demonstration. We learn a dynamic constraint frame aligned to the direction of desired force using Cartesian Dynamic Movement Primitives. In contrast to approaches that utilize a fixed constraint frame, our approach easily accommodates tasks with rapidly changing task constraints over time. We activate only one degree of freedom for force control at any given time, ensuring motion is always possible orthogonal to the direction of desired force. Since we utilize demonstrated forces to learn the constraint frame, we are able to compensate for forces not detected by methods that learn only from demonstrated kinematic motion, such as frictional forces between the end-effector and contact surface. We additionally propose novel extensions to the Dynamic Movement Primitive framework that encourage robust transition from free-space motion to in-contact motion in spite of environment uncertainty. We incorporate force feedback and a dynamically shifting goal to reduce forces applied to the environment and retain stable contact while enabling force control. Our methods exhibit low impact forces on contact and low steady-state tracking error.

preprint2022arXiv

Learning Visual Shape Control of Novel 3D Deformable Objects from Partial-View Point Clouds

If robots could reliably manipulate the shape of 3D deformable objects, they could find applications in fields ranging from home care to warehouse fulfillment to surgical assistance. Analytic models of elastic, 3D deformable objects require numerous parameters to describe the potentially infinite degrees of freedom present in determining the object's shape. Previous attempts at performing 3D shape control rely on hand-crafted features to represent the object shape and require training of object-specific control models. We overcome these issues through the use of our novel DeformerNet neural network architecture, which operates on a partial-view point cloud of the object being manipulated and a point cloud of the goal shape to learn a low-dimensional representation of the object shape. This shape embedding enables the robot to learn to define a visual servo controller that provides Cartesian pose changes to the robot end-effector causing the object to deform towards its target shape. Crucially, we demonstrate both in simulation and on a physical robot that DeformerNet reliably generalizes to object shapes and material stiffness not seen during training and outperforms comparison methods for both the generic shape control and the surgical task of retraction.

preprint2022arXiv

Occlusion-Robust Multi-Sensory Posture Estimation in Physical Human-Robot Interaction

3D posture estimation is important in analyzing and improving ergonomics in physical human-robot interaction and reducing the risk of musculoskeletal disorders. Vision-based posture estimation approaches are prone to sensor and model errors, as well as occlusion, while posture estimation solely from the interacting robot's trajectory suffers from ambiguous solutions. To benefit from the advantages of both approaches and improve upon their drawbacks, we introduce a low-cost, non-intrusive, and occlusion-robust multi-sensory 3D postural estimation algorithm in physical human-robot interaction. We use 2D postures from OpenPose over a single camera, and the trajectory of the interacting robot while the human performs a task. We model the problem as a partially-observable dynamical system and we infer the 3D posture via a particle filter. We present our work in teleoperation, but it can be generalized to other applications of physical human-robot interaction. We show that our multi-sensory system resolves human kinematic redundancy better than posture estimation solely using OpenPose or posture estimation solely using the robot's trajectory. This will increase the accuracy of estimated postures compared to the gold-standard motion capture postures. Moreover, our approach also performs better than other single sensory methods when postural assessment using RULA assessment tool.

preprint2021arXiv

In-Hand Object-Dynamics Inference using Tactile Fingertips

Having the ability to estimate an object's properties through interaction will enable robots to manipulate novel objects. Object's dynamics, specifically the friction and inertial parameters have only been estimated in a lab environment with precise and often external sensing. Could we infer an object's dynamics in the wild with only the robot's sensors? In this paper, we explore the estimation of dynamics of a grasped object in motion, with tactile force sensing at multiple fingertips. Our estimation approach does not rely on torque sensing to estimate the dynamics. To estimate friction, we develop a control scheme to actively interact with the object until slip is detected. To robustly perform the inertial estimation, we setup a factor graph that fuses all our sensor measurements on physically consistent manifolds and perform inference. We show that tactile fingertips enable in-hand dynamics estimation of low mass objects.

preprint2021arXiv

Optimizing Hospital Room Layout to Reduce the Risk of Patient Falls

Despite years of research into patient falls in hospital rooms, falls and related injuries remain a serious concern to patient safety. In this work, we formulate a gradient-free constrained optimization problem to generate and reconfigure the hospital room interior layout to minimize the risk of falls. We define a cost function built on a hospital room fall model that takes into account the supportive or hazardous effect of the patient's surrounding objects, as well as simulated patient trajectories inside the room. We define a constraint set that ensures the functionality of the generated room layouts in addition to conforming to architectural guidelines. We solve this problem efficiently using a variant of simulated annealing. We present results for two real-world hospital room types and demonstrate a significant improvement of 18% on average in patient fall risk when compared with a traditional hospital room layout and 41% when compared with randomly generated layouts.

preprint2020arXiv

Benchmarking In-Hand Manipulation

The purpose of this benchmark is to evaluate the planning and control aspects of robotic in-hand manipulation systems. The goal is to assess the system's ability to change the pose of a hand-held object by either using the fingers, environment or a combination of both. Given an object surface mesh from the YCB data-set, we provide examples of initial and goal states (i.e.\ static object poses and fingertip locations) for various in-hand manipulation tasks. We further propose metrics that measure the error in reaching the goal state from a specific initial state, which, when aggregated across all tasks, also serves as a measure of the system's in-hand manipulation capability. We provide supporting software, task examples, and evaluation results associated with the benchmark. All the supporting material is available at https://robot-learning.cs.utah.edu/project/benchmarking_in_hand_manipulation

preprint2020arXiv

Development of a Novel Computational Model for Evaluating Fall Risk in Patient Room Design

Objectives: The aims of this study are to identify factors in physical environments that contribute to patient falls in hospitals and to propose a computational model to evaluate patient room designs. Background: The existing fall risk assessment tools have an acceptable level of sensitivity and specificity, however, they only consider intrinsic factors and medications, making the prediction very limited in terms of how the physical environment contributes to fall risk. Methods: We provide a computational model for risk of fall based on physical-environment and patient-motion factors. We use a trajectory optimization approach for patient motion prediction. Results: We run the proposed model on four room designs as examples of various room design categories. Results show the capabilities of the proposed model in identifying risky locations within the room. Conclusions: Our study shows the potential capabilities of the proposed model. Due to lack of enough evidence for the examined factors, it is not possible at this point to gain robust confidence in the final evaluations. More studies using quantitative, relational, or causal designs are recommended to inform the proposed model for patient falls. Application: Developing a comprehensive fall risk model is a significant step in understanding and solving the problem of patient falls in hospitals. It can provide guidance for healthcare decision makers to optimize effective interventions to reduce risk of falls while promoting safe patient mobility in the hospital room environment. We can also use it in healthcare technologies such as assistive robots to provide informed assistance.

preprint2020arXiv

Learning Continuous 3D Reconstructions for Geometrically Aware Grasping

Deep learning has enabled remarkable improvements in grasp synthesis for previously unseen objects from partial object views. However, existing approaches lack the ability to explicitly reason about the full 3D geometry of the object when selecting a grasp, relying on indirect geometric reasoning derived when learning grasp success networks. This abandons explicit geometric reasoning, such as avoiding undesired robot object collisions. We propose to utilize a novel, learned 3D reconstruction to enable geometric awareness in a grasping system. We leverage the structure of the reconstruction network to learn a grasp success classifier which serves as the objective function for a continuous grasp optimization. We additionally explicitly constrain the optimization to avoid undesired contact, directly using the reconstruction. We examine the role of geometry in grasping both in the training of grasp metrics and through 96 robot grasping trials. Our results can be found on https://sites.google.com/view/reconstruction-grasp/.

preprint2020arXiv

Learning to Manipulate Object Collections Using Grounded State Representations

We propose a method for sim-to-real robot learning which exploits simulator state information in a way that scales to many objects. We first train a pair of encoder networks to capture multi-object state information in a latent space. One of these encoders is a CNN, which enables our system to operate on RGB images in the real world; the other is a graph neural network (GNN) state encoder, which directly consumes a set of raw object poses and enables more accurate reward calculation and value estimation. Once trained, we use these encoders in a reinforcement learning algorithm to train image-based policies that can manipulate many objects. We evaluate our method on the task of pushing a collection of objects to desired tabletop regions. Compared to methods which rely only on images or use fixed-length state encodings, our method achieves higher success rates, performs well in the real world without fine tuning, and generalizes to different numbers and types of objects not seen during training.

preprint2020arXiv

Multi-Fingered Active Grasp Learning

Learning-based approaches to grasp planning are preferred over analytical methods due to their ability to better generalize to new, partially observed objects. However, data collection remains one of the biggest bottlenecks for grasp learning methods, particularly for multi-fingered hands. The relatively high dimensional configuration space of the hands coupled with the diversity of objects common in daily life requires a significant number of samples to produce robust and confident grasp success classifiers. In this paper, we present the first active deep learning approach to grasping that searches over the grasp configuration space and classifier confidence in a unified manner. We base our approach on recent success in planning multi-fingered grasps as probabilistic inference with a learned neural network likelihood function. We embed this within a multi-armed bandit formulation of sample selection. We show that our active grasp learning approach uses fewer training samples to produce grasp success rates comparable with the passive supervised learning method trained with grasping data generated by an analytical planner. We additionally show that grasps generated by the active learner have greater qualitative and quantitative diversity in shape.

preprint2020arXiv

Multi-Fingered Grasp Planning via Inference in Deep Neural Networks

We propose a novel approach to multi-fingered grasp planning leveraging learned deep neural network models. We train a voxel-based 3D convolutional neural network to predict grasp success probability as a function of both visual information of an object and grasp configuration. We can then formulate grasp planning as inferring the grasp configuration which maximizes the probability of grasp success. In addition, we learn a prior over grasp configurations as a mixture density network conditioned on our voxel-based object representation. We show that this object conditional prior improves grasp inference when used with the learned grasp success prediction network when compared to a learned, object-agnostic prior, or an uninformed uniform prior. Our work is the first to directly plan high quality multi-fingered grasps in configuration space using a deep neural network without the need of an external planner. We validate our inference method performing multi-finger grasping on a physical robot. Our experimental results show that our planning method outperforms existing grasp planning methods for neural networks.

Tucker Hermans

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Do What You Say: Steering Vision-Language-Action Models via Runtime Reasoning-Action Alignment Verification

Active Learning of Probabilistic Movement Primitives

Comparing Piezoresistive Substrates for Tactile Sensing in Dexterous Hands

Correcting Robot Plans with Natural Language Feedback

DefGraspSim: Physics-based simulation of grasp outcomes for 3D deformable objects

DULA and DEBA: Differentiable Ergonomic Risk Models for Postural Assessment and Optimization in Ergonomically Intelligent pHRI

Learning Task Constraints from Demonstration for Hybrid Force/Position Control

Learning Visual Shape Control of Novel 3D Deformable Objects from Partial-View Point Clouds

Occlusion-Robust Multi-Sensory Posture Estimation in Physical Human-Robot Interaction

In-Hand Object-Dynamics Inference using Tactile Fingertips

Optimizing Hospital Room Layout to Reduce the Risk of Patient Falls

Benchmarking In-Hand Manipulation

Development of a Novel Computational Model for Evaluating Fall Risk in Patient Room Design

Learning Continuous 3D Reconstructions for Geometrically Aware Grasping

Learning to Manipulate Object Collections Using Grounded State Representations

Multi-Fingered Active Grasp Learning

Multi-Fingered Grasp Planning via Inference in Deep Neural Networks