Source author record

Joao Carvalho

Joao Carvalho appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Robotics math.OC physics.soc-ph Social and Information Networks Systems and Control

Catalog footprint

What is connected

6works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Hierarchical Approach to Active Pose Estimation

Creating mobile robots which are able to find and manipulate objects in large environments is an active topic of research. These robots not only need to be capable of searching for specific objects but also to estimate their poses often relying on environment observations, which is even more difficult in the presence of occlusions. Therefore, to tackle this problem we propose a simple hierarchical approach to estimate the pose of a desired object. An Active Visual Search module operating with RGB images first obtains a rough estimation of the object 2D pose, followed by a more computationally expensive Active Pose Estimation module using point cloud data. We empirically show that processing image features to obtain a richer observation speeds up the search and pose estimation computations, in comparison to a binary decision that indicates whether the object is or not in the current image.

preprint2022arXiv

An Analysis of Measure-Valued Derivatives for Policy Gradients

Reinforcement learning methods for robotics are increasingly successful due to the constant development of better policy gradient techniques. A precise (low variance) and accurate (low bias) gradient estimator is crucial to face increasingly complex tasks. Traditional policy gradient algorithms use the likelihood-ratio trick, which is known to produce unbiased but high variance estimates. More modern approaches exploit the reparametrization trick, which gives lower variance gradient estimates but requires differentiable value function approximators. In this work, we study a different type of stochastic gradient estimator - the Measure-Valued Derivative. This estimator is unbiased, has low variance, and can be used with differentiable and non-differentiable function approximators. We empirically evaluate this estimator in the actor-critic policy gradient setting and show that it can reach comparable performance with methods based on the likelihood-ratio or reparametrization tricks, both in low and high-dimensional action spaces. With this work, we want to show that the Measure-Valued Derivative estimator can be a useful alternative to other policy gradient estimators.

preprint2022arXiv

Residual Robot Learning for Object-Centric Probabilistic Movement Primitives

It is desirable for future robots to quickly learn new tasks and adapt learned skills to constantly changing environments. To this end, Probabilistic Movement Primitives (ProMPs) have shown to be a promising framework to learn generalizable trajectory generators from distributions over demonstrated trajectories. However, in practical applications that require high precision in the manipulation of objects, the accuracy of ProMPs is often insufficient, in particular when they are learned in cartesian space from external observations and executed with limited controller gains. Therefore, we propose to combine ProMPs with recently introduced Residual Reinforcement Learning (RRL), to account for both, corrections in position and orientation during task execution. In particular, we learn a residual on top of a nominal ProMP trajectory with Soft-Actor Critic and incorporate the variability in the demonstrations as a decision variable to reduce the search space for RRL. As a proof of concept, we evaluate our proposed method on a 3D block insertion task with a 7-DoF Franka Emika Panda robot. Experimental results show that the robot successfully learns to complete the insertion which was not possible before with using basic ProMPs.

preprint2020arXiv

A Nonparametric Off-Policy Policy Gradient

Reinforcement learning (RL) algorithms still suffer from high sample complexity despite outstanding recent successes. The need for intensive interactions with the environment is especially observed in many widely popular policy gradient algorithms that perform updates using on-policy samples. The price of such inefficiency becomes evident in real-world scenarios such as interaction-driven robot learning, where the success of RL has been rather limited. We address this issue by building on the general sample efficiency of off-policy algorithms. With nonparametric regression and density estimation methods we construct a nonparametric Bellman equation in a principled manner, which allows us to obtain closed-form estimates of the value function, and to analytically express the full policy gradient. We provide a theoretical analysis of our estimate to show that it is consistent under mild smoothness assumptions and empirically show that our approach has better sample efficiency than state-of-the-art policy gradient methods.

preprint2016arXiv

On the origin of burstiness in human behavior: The wikipedia edits case

A number of human activities exhibit a bursty pattern, namely periods of very high activity that are followed by rest periods. Records of this process generate time series of events whose inter-event times follow a probability distribution that displays a fat tail. The grounds for such phenomenon are not yet clearly understood. In the present work we use the freely available Wikipedia's editing records to tackle this question by measuring the level of burstiness, as well as the memory effect of the editing tasks performed by different editors in different pages. Our main finding is that, even though the editing activity is conditioned by the circadian 24 hour cycle, the conditional probability of an activity of a given duration at a given time of the day is independent from the latter. This suggests that the human activity seems to be related to the high "cost" of starting an action as opposed to the much lower "cost" of continuing that action.

preprint2015arXiv

Distributed Verification of Structural Controllability for Linear Time-Invariant Systems

Motivated by the development and deployment of large-scale dynamical systems, often composed of geographically distributed smaller subsystems, we address the problem of verifying their controllability in a distributed manner. In this work we study controllability in the structural system theoretic sense, structural controllability. In other words, instead of focusing on a specific numerical system realization, we provide guarantees for equivalence classes of linear time-invariant systems on the basis of their structural sparsity patterns, i.e., location of zero/nonzero entries in the plant matrices. To this end, we first propose several necessary and/or sufficient conditions to ensure structural controllability of the overall system, on the basis of the structural patterns of the subsystems and their interconnections. The proposed verification criteria are shown to be efficiently implementable (i.e., with polynomial time complexity in the number of the state variables and inputs) in two important subclasses of interconnected dynamical systems: similar (i.e., every subsystem has the same structure), and serial (i.e., every subsystem outputs to at most one other subsystem). Secondly, we provide a distributed algorithm to verify structural controllability for interconnected dynamical systems. The proposed distributed algorithm is efficient and implementable at the subsystem level; the algorithm is iterative, based on communication among (physically) interconnected subsystems, and requires only local model and interconnection knowledge at each subsystem.

Joao Carvalho

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

A Hierarchical Approach to Active Pose Estimation

An Analysis of Measure-Valued Derivatives for Policy Gradients

Residual Robot Learning for Object-Centric Probabilistic Movement Primitives

A Nonparametric Off-Policy Policy Gradient

On the origin of burstiness in human behavior: The wikipedia edits case

Distributed Verification of Structural Controllability for Linear Time-Invariant Systems