Source author record

Michele Palladino

Michele Palladino appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC eess.SY Machine Learning Systems and Control

Catalog footprint

What is connected

5works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Convergence results for an averaged LQR problem with applications to reinforcement learning

In this paper, we will deal with a Linear Quadratic Optimal Control problem with unknown dynamics. As a modeling assumption, we will suppose that the knowledge that an agent has on the current system is represented by a probability distribution $π$ on the space of matrices. Furthermore, we will assume that such a probability measure is opportunely updated to take into account the increased experience that the agent obtains while exploring the environment, approximating with increasing accuracy the underlying dynamics. Under these assumptions, we will show that the optimal control obtained by solving the "average" Linear Quadratic Optimal Control problem with respect to a certain $π$ converges to the optimal control driven related to the Linear Quadratic Optimal Control problem governed by the actual, underlying dynamics. This approach is closely related to model-based Reinforcement Learning algorithms where prior and posterior probability distributions describing the knowledge on the uncertain system are recursively updated. In the last section, we will show a numerical test that confirms the theoretical results.

preprint2021arXiv

Stability-Constrained Markov Decision Processes Using MPC

In this paper, we consider solving discounted Markov Decision Processes (MDPs) under the constraint that the resulting policy is stabilizing. In practice MDPs are solved based on some form of policy approximation. We will leverage recent results proposing to use Model Predictive Control (MPC) as a structured policy in the context of Reinforcement Learning to make it possible to introduce stability requirements directly inside the MPC-based policy. This will restrict the solution of the MDP to stabilizing policies by construction. The stability theory for MPC is most mature for the undiscounted MPC case. Hence, we will first show in this paper that stable discounted MDPs can be reformulated as undiscounted ones. This observation will entail that the MPC-based policy with stability requirements will produce the optimal policy for the discounted MDP if it is stable, and the best stabilizing policy otherwise.

preprint2020arXiv

Variational Problems for Tree Roots and Branches

This paper studies two classes of variational problems introduced in [7], related to the optimal shapes of tree roots and branches. Given a measure $μ$ describing the distribution of leaves, a sunlight functional $§(μ)$ computes the total amount of light captured by the leaves. For a measure $μ$ describing the distribution of root hair cells, a harvest functional $\H(μ)$ computes the total amount of water and nutrients gathered by the roots. In both cases, we seek a measure $μ$ that maximizes these functionals subject to a rami?ed transportation cost, for transporting nutrients from the roots to the trunk or from the trunk to the leaves. Compared with [7], here we do not impose any a priori bound on the total mass of the optimal measure $μ$, and more careful a priori estimates are thus required. In the unconstrained optimization problem for branches, we prove that an optimal measure exists, with bounded support and bounded total mass. In the unconstrained problem for tree roots, we prove that an optimal measure exists, with bounded support but possibly unbounded total mass. The last section of the paper analyzes how the size of the optimal tree depends on the parameters defining the various functionals.

preprint2019arXiv

A geometrically based criterion to avoid infimum-gaps in Optimal Control

In optimal control theory the expression infimum gap means a strictly negative difference between the infimum value of a given minimum problem and the infimum value of a new problem obtained by the former by extending the original family V of controls to a larger family W. Now, for some classes of domain-extensions -- like convex relaxation or impulsive embedding of unbounded control problems -- the normality of an extended minimizer has been shown to be sufficient for the avoidance of an infimum gaps. A natural issue is then the search of a general hypothesis under which the criterium 'normality implies no gap' holds true. We prove that, far from being a peculiarity of those specific extensions and from requiring the convexity of the extended dynamics, this criterium is valid provided the original family V of controls is abundant in the extended family W. Abundance, which is stronger than the mere C^0-density of the original trajectories in the set of extended trajectories, is a dynamical-topological notion introduced by J. Warga, and is here utilized in a 'non-convex' version which, moreover, is adapted to differential manifolds. To get the main result, which is based on set separation arguments, we prove an open mapping result valid for Quasi-Differential-Quotient (QDQ) approximating cones, a notion of 'tangent cone' resulted as a peculiar specification of H. Sussmann's Approximate-Generalized-Differential-Quotients (AGDQ) approximating cone.

preprint2016arXiv

A Stochastic Model of Optimal Debt Management and Bankruptcy

A problem of optimal debt management is modeled as a noncooperative game between a borrower and a pool of lenders, in infinite time horizon with exponential discount. The yearly income of the borrower is governed by a stochastic process. When the debt-to-income ratio $x(t)$ reaches a given size $x^*$, bankruptcy instantly occurs. The interest rate charged by the risk-neutral lenders is precisely determined in order to compensate for this possible loss of their investment. For a given bankruptcy threshold $x^*$, existence and properties of optimal feedback strategies for the borrower are studied, in a stochastic framework as well as in a limit deterministic setting. The paper also analyzes how the expected total cost to the borrower changes, depending on different values of $x^*$, changes, depending on different values of $x^*$?.