Source author record

Stefanie Tellex

Stefanie Tellex appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Robotics Artificial Intelligence Computer Vision Computation and Language Human-Computer Interaction Machine Learning

Catalog footprint

What is connected

12works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Beyond Task and Motion Planning: Hierarchical Robot Planning with General-Purpose Skills

Task and motion planning is a well-established approach for solving long-horizon robot planning problems. However, traditional methods assume that each task-level robot action, or skill, can be reduced to kinematic motion planning. We address the challenge of combining motion planning with closed-loop motor controllers that go beyond mere kinematic considerations. We propose a novel framework that integrates these policies into motion planning using Composable Interaction Primitives (CIPs), enabling the use of diverse, non-composable pre-learned skills in hierarchical robot planning. We validate our Task and Skill Planning (TASP) approach through real-world experiments on a bimanual manipulator and a mobile manipulator, demonstrating that CIPs allow diverse robots to combine motion planning with general-purpose skills to solve complex, long-horizon tasks.

preprint2026arXiv

When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning

Behavior Cloning (BC) has emerged as a highly effective paradigm for robot learning. However, BC lacks a self-guided mechanism for online improvement after demonstrations have been collected. Existing offline-to-online learning methods often cause policies to replace previously learned good actions due to a distribution mismatch between offline data and online learning. In this work, we propose Q2RL, Q-Estimation and Q-Gating from BC for Reinforcement Learning, an algorithm for efficient offline-to-online learning. Our method consists of two parts: (1) Q-Estimation extracts a Q-function from a BC policy using a few interaction steps with the environment, followed by online RL with (2) Q-Gating, which switches between BC and RL policy actions based on their respective Q-values to collect samples for RL policy training. Across manipulation tasks from D4RL and robomimic benchmarks, Q2RL outperforms SOTA offline-to-online learning baselines on success rate and time to convergence. Q2RL is efficient enough to be applied in an on-robot RL setting, learning robust policies for contact-rich and high precision manipulation tasks such as pipe assembly and kitting, in 1-2 hours of online interaction, achieving success rates of up to 100% and up to 3.75x improvement against the original BC policy. Code and video are available at https://pages.rai-inst.com/q2rl_website/

preprint2022arXiv

Generalizing to New Domains by Mapping Natural Language to Lifted LTL

Recent work on using natural language to specify commands to robots has grounded that language to LTL. However, mapping natural language task specifications to LTL task specifications using language models require probability distributions over finite vocabulary. Existing state-of-the-art methods have extended this finite vocabulary to include unseen terms from the input sequence to improve output generalization. However, novel out-of-vocabulary atomic propositions cannot be generated using these methods. To overcome this, we introduce an intermediate contextual query representation which can be learned from single positive task specification examples, associating a contextual query with an LTL template. We demonstrate that this intermediate representation allows for generalization over unseen object references, assuming accurate groundings are available. We compare our method of mapping natural language task specifications to intermediate contextual queries against state-of-the-art CopyNet models capable of translating natural language to LTL, by evaluating whether correct LTL for manipulation and navigation task specifications can be output, and show that our method outperforms the CopyNet model on unseen object references. We demonstrate that the grounded LTL our method outputs can be used for planning in a simulated OO-MDP environment. Finally, we discuss some common failure modes encountered when translating natural language task specifications to grounded LTL.

preprint2022arXiv

Multi-Resolution POMDP Planning for Multi-Object Search in 3D

Robots operating in households must find objects on shelves, under tables, and in cupboards. In such environments, it is crucial to search efficiently at 3D scale while coping with limited field of view and the complexity of searching for multiple objects. Principled approaches to object search frequently use Partially Observable Markov Decision Process (POMDP) as the underlying framework for computing search strategies, but constrain the search space in 2D. In this paper, we present a POMDP formulation for multi-object search in a 3D region with a frustum-shaped field-of-view. To efficiently solve this POMDP, we propose a multi-resolution planning algorithm based on online Monte-Carlo tree search. In this approach, we design a novel octree-based belief representation to capture uncertainty of the target objects at different resolution levels, then derive abstract POMDPs at lower resolutions with dramatically smaller state and observation spaces. Evaluation in a simulated 3D domain shows that our approach finds objects more efficiently and successfully compared to a set of baselines without resolution hierarchy in larger instances under the same computational requirement. We demonstrate our approach on a mobile robot to find objects placed at different heights in two 10m$^2 \times 2$m regions by moving its base and actuating its torso.

preprint2022arXiv

Towards Optimal Correlational Object Search

In realistic applications of object search, robots will need to locate target objects in complex environments while coping with unreliable sensors, especially for small or hard-to-detect objects. In such settings, correlational information can be valuable for planning efficiently. Previous approaches that consider correlational information typically resort to ad-hoc, greedy search strategies. We introduce the Correlational Object Search POMDP (COS-POMDP), which models correlations while preserving optimal solutions with a reduced state space. We propose a hierarchical planning algorithm to scale up COS-POMDPs for practical domains. Our evaluation, conducted with the AI2-THOR household simulator and the YOLOv5 object detector, shows that our method finds objects more successfully and efficiently compared to baselines,particularly for hard-to-detect objects such as srub brush and remote control.

preprint2021arXiv

Bootstrapping Motor Skill Learning with Motion Planning

Learning a robot motor skill from scratch is impractically slow; so much so that in practice, learning must be bootstrapped using a good skill policy obtained from human demonstration. However, relying on human demonstration necessarily degrades the autonomy of robots that must learn a wide variety of skills over their operational lifetimes. We propose using kinematic motion planning as a completely autonomous, sample efficient way to bootstrap motor skill learning for object manipulation. We demonstrate the use of motion planners to bootstrap motor skills in two complex object manipulation scenarios with different policy representations: opening a drawer with a dynamic movement primitive representation, and closing a microwave door with a deep neural network policy. We also show how our method can bootstrap a motor skill for the challenging dynamic task of learning to hit a ball off a tee, where a kinematic plan based on treating the scene as static is insufficient to solve the task, but sufficient to bootstrap a more dynamic policy. In all three cases, our method is competitive with human-demonstrated initialization, and significantly outperforms starting with a random policy. This approach enables robots to to efficiently and autonomously learn motor policies for dynamic tasks without human demonstration.

preprint2021arXiv

Learning Deep Parameterized Skills from Demonstration for Re-targetable Visuomotor Control

Robots need to learn skills that can not only generalize across similar problems but also be directed to a specific goal. Previous methods either train a new skill for every different goal or do not infer the specific target in the presence of multiple goals from visual data. We introduce an end-to-end method that represents targetable visuomotor skills as a goal-parameterized neural network policy. By training on an informative subset of available goals with the associated target parameters, we are able to learn a policy that can zero-shot generalize to previously unseen goals. We evaluate our method in a representative 2D simulation of a button-grid and on both button-pressing and peg-insertion tasks on two different physical arms. We demonstrate that our model trained on 33% of the possible goals is able to generalize to more than 90% of the targets in the scene for both simulation and robot experiments. We also successfully learn a mapping from target pixel coordinates to a robot policy to complete a specified goal.

preprint2020arXiv

Generating Handwriting via Decoupled Style Descriptors

Representing a space of handwriting stroke styles includes the challenge of representing both the style of each character and the overall style of the human writer. Existing VRNN approaches to representing handwriting often do not distinguish between these different style components, which can reduce model capability. Instead, we introduce the Decoupled Style Descriptor (DSD) model for handwriting, which factors both character- and writer-level styles and allows our model to represent an overall greater space of styles. This approach also increases flexibility: given a few examples, we can generate handwriting in new writer styles, and also now generate handwriting of new characters across writer styles. In experiments, our generated results were preferred over a state of the art baseline method 88% of the time, and in a writer identification task on 20 held-out writers, our DSDs achieved 89.38% accuracy from a single sample word. Overall, DSDs allows us to improve both the quality and flexibility over existing handwriting stroke generation approaches.

preprint2020arXiv

pomdp_py: A Framework to Build and Solve POMDP Problems

In this paper, we present pomdp_py, a general purpose Partially Observable Markov Decision Process (POMDP) library written in Python and Cython. Existing POMDP libraries often hinder accessibility and efficient prototyping due to the underlying programming language or interfaces, and require extra complexity in software toolchain to integrate with robotics systems. pomdp_py features simple and comprehensive interfaces capable of describing large discrete or continuous (PO)MDP problems. Here, we summarize the design principles and describe in detail the programming model and interfaces in pomdp_py. We also describe intuitive integration of this library with ROS (Robot Operating System), which enabled our torso-actuated robot to perform object search in 3D. Finally, we note directions to improve and extend this library for POMDP planning and beyond.

preprint2020arXiv

RoboMem: Giving Long Term Memory to Robots

Robots have the potential to improve health monitoring outcomes for the elderly by providing doctors, and caregivers with information about the person's behavior, health activities and their surrounding environment. Over the years, less work has been done to enable robots to preserve information for longer periods of time, on the order of months and years of data, and use this contextual information to answer queries. Time complexity to process this massive sensor data in a timely fashion, inability to anticipate the future queries in advance and imprecision involved in the results have been the main impediments in making progress in this area. We make a contribution by introducing RoboMem, a query answering system for health-care assistance of elderly over long term; continuous data feeds that intends to overcome the challenges of giving long term memory to robots. The design for our framework preprocesses the sensor data and stores this preprocessed data into the database. This data is updated in the database by going through successive refinements, improving its accuracy for responding to queries. If data in the database is not enough to answer a query, a small set of relevant frames (also obtained from the database) will be reprocessed to obtain the answer. [Our initial prototype of RoboMem stores 3.5MB of data in the database as compared to 535.8MB of actual video frames and with minimal data in the database it is able to fetch information fundamental to respond to queries in 0.0002 seconds on average].

preprint2020arXiv

Robot Object Retrieval with Contextual Natural Language Queries

Natural language object retrieval is a highly useful yet challenging task for robots in human-centric environments. Previous work has primarily focused on commands specifying the desired object's type such as "scissors" and/or visual attributes such as "red," thus limiting the robot to only known object classes. We develop a model to retrieve objects based on descriptions of their usage. The model takes in a language command containing a verb, for example "Hand me something to cut," and RGB images of candidate objects and selects the object that best satisfies the task specified by the verb. Our model directly predicts an object's appearance from the object's use specified by a verb phrase. We do not need to explicitly specify an object's class label. Our approach allows us to predict high level concepts like an object's utility based on the language query. Based on contextual information present in the language commands, our model can generalize to unseen object classes and unknown nouns in the commands. Our model correctly selects objects out of sets of five candidates to fulfill natural language commands, and achieves an average accuracy of 62.3% on a held-out test set of unseen ImageNet object classes and 53.0% on unseen object classes and unknown nouns. Our model also achieves an average accuracy of 54.7% on unseen YCB object classes, which have a different image distribution from ImageNet objects. We demonstrate our model on a KUKA LBR iiwa robot arm, enabling the robot to retrieve objects based on natural language descriptions of their usage. We also present a new dataset of 655 verb-object pairs denoting object usage over 50 verbs and 216 object classes.

preprint2020arXiv

Steps Towards Best Practices For Robot Videos

There are unwritten guidelines for how to make robot videos that researchers learn from their advisors and pass onto their students. We believe that it is important for the community to collaboratively discuss and develop a standard set of best practices when making robot. We suggest a starting set of maxims for best robot video practices, and highlight positive examples from the community and negative examples only from videos made by the authors of this article. In addition, we offer a checklist that we hope can act as an document that can be given to robotic researchers to inform them of how to make robot videos that truthfully characterize what a robot can and can not do. We consider this a first draft, and are looking for feedback from the community as we refine and grow our maxims and checklist.

Stefanie Tellex

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Beyond Task and Motion Planning: Hierarchical Robot Planning with General-Purpose Skills

When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning

Generalizing to New Domains by Mapping Natural Language to Lifted LTL

Multi-Resolution POMDP Planning for Multi-Object Search in 3D

Towards Optimal Correlational Object Search

Bootstrapping Motor Skill Learning with Motion Planning

Learning Deep Parameterized Skills from Demonstration for Re-targetable Visuomotor Control

Generating Handwriting via Decoupled Style Descriptors

pomdp_py: A Framework to Build and Solve POMDP Problems

RoboMem: Giving Long Term Memory to Robots

Robot Object Retrieval with Contextual Natural Language Queries

Steps Towards Best Practices For Robot Videos