Source author record

Santiago Paternain

Santiago Paternain appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning math.DS Artificial Intelligence eess.SY Robotics Systems and Control

Catalog footprint

What is connected

14works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Navigation of a Quadratic Potential with Ellipsoidal Obstacles

Given a convex quadratic potential of which its minimum is the agent's goal and a Euclidean space populated with ellipsoidal obstacles, one can construct a Rimon-Koditschek (RK) artificial potential to navigate. Its negative gradient attracts the agent toward the goal and repels the agent away from the boundary of the obstacles. This is a popular approach to navigation problems since it can be implemented with local spatial information that is acquired during operation time. However, navigation is only successful in situations where the obstacles are not too eccentric (flat). This paper proposes a modification to gradient dynamics that allows successful navigation of an environment with a quadratic cost and ellipsoidal obstacles regardless of their eccentricity. This is accomplished by altering gradient dynamics with a Hessian correction that is intended to imitate worlds with spherical obstacles in which RK potentials are known to work. The resulting dynamics simplify by the quadratic form of the obstacles. Convergence to the goal and obstacle avoidance is established from almost every initial position (up to a set of measure one) in the free space, with mild conditions on the location of the target. Results are corroborated empirically with numerical simulations.

preprint2022arXiv

Safe Policies for Reinforcement Learning via Primal-Dual Methods

In this paper, we study the learning of safe policies in the setting of reinforcement learning problems. This is, we aim to control a Markov Decision Process (MDP) of which we do not know the transition probabilities, but we have access to sample trajectories through experience. We define safety as the agent remaining in a desired safe set with high probability during the operation time. We therefore consider a constrained MDP where the constraints are probabilistic. Since there is no straightforward way to optimize the policy with respect to the probabilistic constraint in a reinforcement learning framework, we propose an ergodic relaxation of the problem. The advantages of the proposed relaxation are threefold. (i) The safety guarantees are maintained in the case of episodic tasks and they are kept up to a given time horizon for continuing tasks. (ii) The constrained optimization problem despite its non-convexity has arbitrarily small duality gap if the parametrization of the policy is rich enough. (iii) The gradients of the Lagrangian associated with the safe-learning problem can be easily computed using standard policy gradient results and stochastic approximation tools. Leveraging these advantages, we establish that primal-dual algorithms are able to find policies that are safe and optimal. We test the proposed approach in a navigation task in a continuous domain. The numerical results show that our algorithm is capable of dynamically adapting the policy to the environment and the required safety levels.

preprint2021arXiv

Sufficiently Accurate Model Learning for Planning

Data driven models of dynamical systems help planners and controllers to provide more precise and accurate motions. Most model learning algorithms will try to minimize a loss function between the observed data and the model's predictions. This can be improved using prior knowledge about the task at hand, which can be encoded in the form of constraints. This turns the unconstrained model learning problem into a constrained one. These constraints allow models with finite capacity to focus their expressive power on important aspects of the system. This can lead to models that are better suited for certain tasks. This paper introduces the constrained Sufficiently Accurate model learning approach, provides examples of such problems, and presents a theorem on how close some approximate solutions can be. The approximate solution quality will depend on the function parameterization, loss and constraint function smoothness, and the number of samples in model learning.

preprint2021arXiv

Towards Safe Continuing Task Reinforcement Learning

Safety is a critical feature of controller design for physical systems. When designing control policies, several approaches to guarantee this aspect of autonomy have been proposed, such as robust controllers or control barrier functions. However, these solutions strongly rely on the model of the system being available to the designer. As a parallel development, reinforcement learning provides model-agnostic control solutions but in general, it lacks the theoretical guarantees required for safety. Recent advances show that under mild conditions, control policies can be learned via reinforcement learning, which can be guaranteed to be safe by imposing these requirements as constraints of an optimization problem. However, to transfer from learning safety to learning safely, there are two hurdles that need to be overcome: (i) it has to be possible to learn the policy without having to re-initialize the system; and (ii) the rollouts of the system need to be in themselves safe. In this paper, we tackle the first issue, proposing an algorithm capable of operating in the continuing task setting without the need of restarts. We evaluate our approach in a numerical example, which shows the capabilities of the proposed approach in learning safe policies via safe exploration.

preprint2020arXiv

Counterfactual Programming for Optimal Control

In recent years, considerable work has been done to tackle the issue of designing control laws based on observations to allow unknown dynamical systems to perform pre-specified tasks. At least as important for autonomy, however, is the issue of learning which tasks can be performed in the first place. This is particularly critical in situations where multiple (possibly conflicting) tasks and requirements are demanded from the agent, resulting in infeasible specifications. Such situations arise due to over-specification or dynamic operating conditions and are only aggravated when the dynamical system model is learned through simulations. Often, these issues are tackled using regularization and penalties tuned based on application-specific expert knowledge. Nevertheless, this solution becomes impractical for large-scale systems, unknown operating conditions, and/or in online settings where expert input would be needed during the system operation. Instead, this work enables agents to autonomously pose, tune, and solve optimal control problems by compromising between performance and specification costs. Leveraging duality theory, it puts forward a counterfactual optimization algorithm that directly determines the specification trade-off while solving the optimal control problem.

preprint2020arXiv

Resilient Control: Compromising to Adapt

In optimal control problems, disturbances are typically dealt with using robust solutions, such as H-infinity or tube model predictive control, that plan control actions feasible for the worst-case disturbance. Yet, planning for every contingency can lead to over-conservative, poorly performing solutions or even, in extreme cases, to infeasibility. Resilience addresses these shortcomings by adapting the underlying control problem, e.g., by relaxing its specifications, to obtain a feasible, possibly still valuable trajectory. Despite their different aspects, robustness and resilience are often conflated in the context of dynamical systems and control. The goal of this paper is to formalize, in the context of optimal control, the concept of resilience understood as above, i.e., in terms of adaptation. To do so, we introduce a resilient formulation of optimal control by allowing disruption-dependent modifications of the requirements that induce the desired resilient behavior. We then propose a framework to design these behaviors automatically by trading off control performance and requirement violations. We analyze this resilience-by-compromise method to obtain inverse optimality results and quantify the effect of disturbances on the induced requirement relaxations. By proving that robustness and resilience optimize different objectives, we show that these are in fact distinct system properties. We conclude by illustrating the effect of resilience in different control problems.

preprint2020arXiv

Sufficiently Accurate Model Learning

Modeling how a robot interacts with the environment around it is an important prerequisite for designing control and planning algorithms. In fact, the performance of controllers and planners is highly dependent on the quality of the model. One popular approach is to learn data driven models in order to compensate for inaccurate physical measurements and to adapt to systems that evolve over time. In this paper, we investigate a method to regularize model learning techniques to provide better error characteristics for traditional control and planning algorithms. This work proposes learning "Sufficiently Accurate" models of dynamics using a primal-dual method that can explicitly enforce constraints on the error in pre-defined parts of the state-space. The result of this method is that the error characteristics of the learned model is more predictable and can be better utilized by planning and control algorithms. The characteristics of Sufficiently Accurate models are analyzed through experiments on a simulated ball paddle system.

preprint2020arXiv

The empirical duality gap of constrained statistical learning

This paper is concerned with the study of constrained statistical learning problems, the unconstrained version of which are at the core of virtually all of modern information processing. Accounting for constraints, however, is paramount to incorporate prior knowledge and impose desired structural and statistical properties on the solutions. Still, solving constrained statistical problems remains challenging and guarantees scarce, leaving them to be tackled using regularized formulations. Though practical and effective, selecting regularization parameters so as to satisfy requirements is challenging, if at all possible, due to the lack of a straightforward relation between parameters and constraints. In this work, we propose to directly tackle the constrained statistical problem overcoming its infinite dimensionality, unknown distributions, and constraints by leveraging finite dimensional parameterizations, sample averages, and duality theory. Aside from making the problem tractable, these tools allow us to bound the empirical duality gap, i.e., the difference between our approximate tractable solutions and the actual solutions of the original statistical problem. We demonstrate the effectiveness and usefulness of this constrained formulation in a fair learning application.

preprint2019arXiv

Distributed Constrained Online Learning

In this paper, we consider groups of agents in a network that select actions in order to satisfy a set of constraints that vary arbitrarily over time and minimize a time-varying function of which they have only local observations. The selection of actions, also called a strategy, is causal and decentralized, i.e., the dynamical system that determines the actions of a given agent depends only on the constraints at the current time and on its own actions and those of its neighbors. To determine such a strategy, we propose a decentralized saddle point algorithm and show that the corresponding global fit and regret are bounded by functions of the order of $\sqrt{T}$. Specifically, we define the global fit of a strategy as a vector that integrates over time the global constraint violations as seen by a given node. The fit is a performance loss associated with online operation as opposed to offline clairvoyant operation which can always select an action if one exists, that satisfies the constraints at all times. If this fit grows sublinearly with the time horizon it suggests that the strategy approaches the feasible set of actions. Likewise, we define the regret of a strategy as the difference between its accumulated cost and that of the best fixed action that one could select knowing beforehand the time evolution of the objective function. Numerical examples support the theoretical conclusions.

preprint2016arXiv

Navigation Functions for Convex Potentials in a Space with Convex Obstacles

Given a convex potential in a space with convex obstacles, an artificial potential is used to navigate to the minimum of the natural potential while avoiding collisions. The artificial potential combines the natural potential with potentials that repel the agent from the border of the obstacles. This is a popular approach to navigation problems because it can be implemented with spatially local information that is acquired during operation time. Artificial potentials can, however, have local minima that prevent navigation to the minimum of the natural potential. This paper derives conditions that guarantee artificial potentials have a single minimum that is arbitrarily close to the minimum of the natural potential. The qualitative implication is that artificial potentials succeed when either the condition number-- the ratio of the maximum over the minimum eigenvalue-- of the Hessian of the natural potential is not large and the obstacles are not too flat or when the destination is not close to the border of an obstacle. Numerical analyses explore the practical value of these theoretical conclusions.

preprint2016arXiv

Online Learning of Feasible Strategies in Unknown Environments

Define an environment as a set of convex constraint functions that vary arbitrarily over time and consider a cost function that is also convex and arbitrarily varying. Agents that operate in this environment intend to select actions that are feasible for all times while minimizing the cost's time average. Such action is said optimal and can be computed offline if the cost and the environment are known a priori. An online policy is one that depends causally on the cost and the environment. To compare online policies to the optimal offline action define the fit of a trajectory as a vector that integrates the constraint violations over time and its regret as the cost difference with the optimal action accumulated over time. Fit measures the extent to which an online policy succeeds in learning feasible actions while regret measures its success in learning optimal actions. This paper proposes the use of online policies computed from a saddle point controller which are shown to have fit and regret that are either bounded or grow at a sublinear rate. These properties provide an indication that the controller finds trajectories that are feasible and optimal in a relaxed sense. Concepts are illustrated throughout with the problem of a shepherd that wants to stay close to all sheep in a herd. Numerical experiments show that the saddle point controller allows the shepherd to do so.

preprint2016arXiv

Prediction-Correction Interior-Point Method for Time-Varying Convex Optimization

In this paper, we develop an interior-point method for solving a class of convex optimization problems with time-varying objective and constraint functions. Using log-barrier penalty functions, we propose a continuous-time dynamical system for tracking the (time-varying) optimal solution with an asymptotically vanishing error. This dynamical system is composed of two terms: (i) a correction term consisting of a continuous-time version of Newton's method, and (ii) a prediction term able to track the drift of the optimal solution by taking into account the time-varying nature of the objective and constraint functions. Using appropriately chosen time-varying slack and barrier parameters, we ensure that the solution to this dynamical system globally asymptotically converges to the optimal solution at an exponential rate. We illustrate the applicability of the proposed method in two practical applications: a sparsity promoting least squares problem and a collision-free robot navigation problem.

preprint2016arXiv

Stochastic Artificial Potentials for Online Safe Navigation

Consider a convex set of which we remove an arbitrarily number of disjoints convex sets -- the obstacles -- and a convex function whose minimum is the agent's goal. We consider a local and stochastic approximation of the gradient of a Rimon-Koditschek navigation function where the attractive potential is the convex function that the agent is minimizing. In particular we show that if the estimate available to the agent is unbiased convergence to the desired destination while obstacle avoidance is guaranteed with probability one under the same geometrical conditions than in the deterministic case. Qualitatively these conditions are that the ratio of the maximum over the minimum eigenvalue of the Hessian of the objective function is not too large and that the obstacles are not too flat or too close to the desired destination. Moreover, we show that for biased estimates a similar result holds under some assumptions on the bias. These assumptions are motivated by the study of the estimate of the gradient of a Rimon-Koditschek navigation function for sensor models that fit circles or ellipses around the obstacles. Numerical examples explore the practical value of these theoretical results.

preprint2015arXiv

Interior Point Method for Dynamic Constrained Optimization in Continuous Time

This paper considers a class of convex optimization problems where both, the objective function and the constraints, have a continuously varying dependence on time. Our goal is to develop an algorithm to track the optimal solution as it continuously changes over time inside or on the boundary of the dynamic feasible set. We develop an interior point method that asymptotically succeeds in tracking this optimal point in nonstationary settings. The method utilizes a time varying constraint slack and a prediction-correction structure that relies on time derivatives of functions and constraints and Newton steps in the spatial domain. Error free tracking is guaranteed under customary assumptions on the optimization problems and time differentiability of objective and constraints. The effectiveness of the method is illustrated in a problem that involves multiple agents tracking multiple targets.

Santiago Paternain

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Navigation of a Quadratic Potential with Ellipsoidal Obstacles

Safe Policies for Reinforcement Learning via Primal-Dual Methods

Sufficiently Accurate Model Learning for Planning

Towards Safe Continuing Task Reinforcement Learning

Counterfactual Programming for Optimal Control

Resilient Control: Compromising to Adapt

Sufficiently Accurate Model Learning

The empirical duality gap of constrained statistical learning

Distributed Constrained Online Learning

Navigation Functions for Convex Potentials in a Space with Convex Obstacles

Online Learning of Feasible Strategies in Unknown Environments

Prediction-Correction Interior-Point Method for Time-Varying Convex Optimization

Stochastic Artificial Potentials for Online Safe Navigation

Interior Point Method for Dynamic Constrained Optimization in Continuous Time