Source author record

Michael M. Zavlanos

Michael M. Zavlanos appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Robotics math.OC Multiagent Systems Systems and Control Computer Science and Game Theory Artificial Intelligence Computational Engineering, Finance, and Science Discrete Mathematics eess.SY Information Theory math.IT Methodology physics.soc-ph Social and Information Networks

Catalog footprint

What is connected

16works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games

We consider risk-averse learning in repeated unknown games where the goal of the agents is to minimize their individual risk of incurring significantly high cost. Specifically, the agents use the conditional value at risk (CVaR) as a risk measure and rely on bandit feedback in the form of the cost values of the selected actions at every episode to estimate their CVaR values and update their actions. A major challenge in using bandit feedback to estimate CVaR is that the agents can only access their own cost values, which, however, depend on the actions of all agents. To address this challenge, we propose a new risk-averse learning algorithm with momentum that utilizes the full historical information on the cost values. We show that this algorithm achieves sub-linear regret and matches the best known algorithms in the literature. We provide numerical experiments for a Cournot game that show that our method outperforms existing methods.

preprint2022arXiv

Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health

In this paper, we consider a risk-averse multi-armed bandit (MAB) problem where the goal is to learn a policy that minimizes the risk of low expected return, as opposed to maximizing the expected return itself, which is the objective in the usual approach to risk-neutral MAB. Specifically, we formulate this problem as a transfer learning problem between an expert and a learner agent in the presence of contexts that are only observable by the expert but not by the learner. Thus, such contexts are unobserved confounders (UCs) from the learner's perspective. Given a dataset generated by the expert that excludes the UCs, the goal for the learner is to identify the true minimum-risk arm with fewer online learning steps, while avoiding possible biased decisions due to the presence of UCs in the expert's data.

preprint2022arXiv

Risk-Averse No-Regret Learning in Online Convex Games

We consider an online stochastic game with risk-averse agents whose goal is to learn optimal decisions that minimize the risk of incurring significantly high costs. Specifically, we use the Conditional Value at Risk (CVaR) as a risk measure that the agents can estimate using bandit feedback in the form of the cost values of only their selected actions. Since the distributions of the cost functions depend on the actions of all agents that are generally unobservable, they are themselves unknown and, therefore, the CVaR values of the costs are difficult to compute. To address this challenge, we propose a new online risk-averse learning algorithm that relies on one-point zeroth-order estimation of the CVaR gradients computed using CVaR values that are estimated by appropriately sampling the cost functions. We show that this algorithm achieves sub-linear regret with high probability. We also propose two variants of this algorithm that improve performance. The first variant relies on a new sampling strategy that uses samples from the previous iteration to improve the estimation accuracy of the CVaR values. The second variant employs residual feedback that uses CVaR values from the previous iteration to reduce the variance of the CVaR gradient estimates. We theoretically analyze the convergence properties of these variants and illustrate their performance on an online market problem that we model as a Cournot game.

preprint2022arXiv

Temporal Logic Task Allocation in Heterogeneous Multi-Robot Systems

In this paper, we consider the problem of optimally allocating tasks, expressed as global Linear Temporal Logic (LTL) specifications, to teams of heterogeneous mobile robots. The robots are classified in different types that capture their different capabilities, and each task may require robots of multiple types. The specific robots assigned to each task are immaterial, as long as they are of the desired type. Given a discrete workspace, our goal is to design paths, i.e., sequences of discrete states, for the robots so that the LTL specification is satisfied. To obtain a scalable solution to this complex temporal logic task allocation problem, we propose a hierarchical approach that first allocates specific robots to tasks using the information about the tasks contained in the Nondeterministic Buchi Automaton (NBA) that captures the LTL specification, and then designs low-level executable plans for the robots that respect the high-level assignment. Specifically, we first prune and relax the NBA by removing all negative atomic propositions. This step is motivated by "lazy collision checking" methods in robotics and allows to simplify the planning problem by checking constraint satisfaction only when needed. Then, we extract sequences of subtasks from the relaxed NBA along with their temporal orders, and formulate a Mixed Integer Linear Program (MILP) to allocate these subtasks to the robots. Finally, we define generalized multi-robot path planning problems to obtain low-level executable robot plans that satisfy both the high-level task allocation and the temporal constraints captured by the negative atomic propositions in the original NBA. We show that our method is complete for a subclass of LTL that covers a broad range of tasks and present numerical simulations demonstrating that it can generate paths with lower cost, considerably faster than existing methods.

preprint2021arXiv

Formal Verification of Stochastic Systems with ReLU Neural Network Controllers

In this work, we address the problem of formal safety verification for stochastic cyber-physical systems (CPS) equipped with ReLU neural network (NN) controllers. Our goal is to find the set of initial states from where, with a predetermined confidence, the system will not reach an unsafe configuration within a specified time horizon. Specifically, we consider discrete-time LTI systems with Gaussian noise, which we abstract by a suitable graph. Then, we formulate a Satisfiability Modulo Convex (SMC) problem to estimate upper bounds on the transition probabilities between nodes in the graph. Using this abstraction, we propose a method to compute tight bounds on the safety probabilities of nodes in this graph, despite possible over-approximations of the transition probabilities between these nodes. Additionally, using the proposed SMC formula, we devise a heuristic method to refine the abstraction of the system in order to further improve the estimated safety bounds. Finally, we corroborate the efficacy of the proposed method with simulation results considering a robot navigation example and comparison against a state-of-the-art verification scheme.

preprint2021arXiv

Physics-Based Learning for Robotic Environmental Sensing

We propose a physics-based method to learn environmental fields (EFs) using a mobile robot. Common purely data-driven methods require prohibitively many measurements to accurately learn such complex EFs. Alternatively, physics-based models provide global knowledge of EFs but require experimental validation, depend on uncertain parameters, and are intractable for mobile robots. To address these challenges, we propose a Bayesian framework to select the most likely physics-based models of EFs in real-time, from a pool of numerical solutions generated offline as a function of the uncertain parameters. Specifically, we focus on turbulent flow fields and utilize Gaussian processes (GPs) to construct statistical models for them, using the pool of numerical solutions to inform their prior mean. To incorporate flow measurements into these GPs, we control a custom-built mobile robot through a sequence of waypoints that maximize the information content of the measurements. We experimentally demonstrate that our proposed framework constructs a posterior distribution of the flow field that better approximates the real flow compared to the prior numerical solutions and purely data-driven methods.

preprint2020arXiv

Deep Learning for Robotic Mass Transport Cloaking

We consider the problem of mass transport cloaking using mobile robots. The robots move along a predefined curve that encloses a safe zone and carry sources that collectively counteract a chemical agent released in the environment. The goal is to steer the mass flux around a desired region so that it remains unaffected by the external concentration. We formulate the problem of controlling the robot positions and release rates as a PDE-constrained optimization, where the propagation of the chemical is modeled by the advection-diffusion (AD) PDE. We use a neural network (NN) to approximate the solution of the PDE. Particularly, we propose a novel loss function for the NN that utilizes the variational form of the AD-PDE and allows us to reformulate the planning problem as an unsupervised model-based learning problem. Our loss function is discretization-free and highly parallelizable. Unlike passive cloaking methods that use metamaterials to steer the mass flux, our method is the first to use mobile robots to actively control the concentration levels and create safe zones independent of environmental conditions. We demonstrate the performance of our method in simulations.

preprint2020arXiv

Socially-Aware Robot Planning via Bandit Human Feedback

In this paper, we consider the problem of designing collision-free, dynamically feasible, and socially-aware trajectories for robots operating in environments populated by humans. We define trajectories to be social-aware if they do not interfere with humans in any way that causes discomfort. In this paper, discomfort is defined broadly and, depending on specific individuals, it can result from the robot being too close to a human or from interfering with human sight or tasks. Moreover, we assume that human feedback is a bandit feedback indicating a complaint or no complaint on the part of the robot trajectory that interferes with the humans, and it does not reveal any contextual information about the locations of the humans or the reason for a complaint. Finally, we assume that humans can move in the obstacle-free space and, as a result, human utility can change. We formulate this planning problem as an online optimization problem that minimizes the social value of the time-varying robot trajectory, defined by the total number of incurred human complaints. As the human utility is unknown, we employ zeroth order, or derivative-free, optimization methods to solve this problem, which we combine with off-the-shelf motion planners to satisfy the dynamic feasibility and collision-free specifications of the resulting trajectories. To the best of our knowledge, this is a new framework for socially-aware robot planning that is not restricted to avoiding collisions with humans but, instead, focuses on increasing the social value of the robot trajectories using only bandit human feedback.

preprint2020arXiv

STyLuS*: A Temporal Logic Optimal Control Synthesis Algorithm for Large-Scale Multi-Robot Systems

This paper proposes a new highly scalable and asymptotically optimal control synthesis algorithm from linear temporal logic specifications, called $\text{STyLuS}^{*}$ for large-Scale optimal Temporal Logic Synthesis, that is designed to solve complex temporal planning problems in large-scale multi-robot systems. Existing planning approaches with temporal logic specifications rely on graph search techniques applied to a product automaton constructed among the robots. In our previous work, we have proposed a more tractable sampling-based algorithm that builds incrementally trees that approximate the state-space and transitions of the synchronous product automaton and does not require sophisticated graph search techniques. Here, we extend our previous work by introducing bias in the sampling process which is guided by transitions in the B$\ddot{\text{u}}$chi automaton that belong to the shortest path to the accepting states. This allows us to synthesize optimal motion plans from product automata with hundreds of orders of magnitude more states than those that existing optimal control synthesis methods or off-the-shelf model checkers can manipulate. We show that $\text{STyLuS}^{*}$ is probabilistically complete and asymptotically optimal and has exponential convergence rate. This is the first time that convergence rate results are provided for sampling-based optimal control synthesis methods. We provide simulation results that show that $\text{STyLuS}^{*}$ can synthesize optimal motion plans for very large multi-robot systems which is impossible using state-of-the-art methods.

preprint2020arXiv

Transfer Reinforcement Learning under Unobserved Contextual Information

In this paper, we study a transfer reinforcement learning problem where the state transitions and rewards are affected by the environmental context. Specifically, we consider a demonstrator agent that has access to a context-aware policy and can generate transition and reward data based on that policy. These data constitute the experience of the demonstrator. Then, the goal is to transfer this experience, excluding the underlying contextual information, to a learner agent that does not have access to the environmental context, so that they can learn a control policy using fewer samples. It is well known that, disregarding the causal effect of the contextual information, can introduce bias in the transition and reward models estimated by the learner, resulting in a learned suboptimal policy. To address this challenge, in this paper, we develop a method to obtain causal bounds on the transition and reward functions using the demonstrator's data, which we then use to obtain causal bounds on the value functions. Using these value function bounds, we propose new Q learning and UCB-Q learning algorithms that converge to the true value function without bias. We provide numerical experiments for robot motion planning problems that validate the proposed value function bounds and demonstrate that the proposed algorithms can effectively make use of the data from the demonstrator to accelerate the learning process of the learner.

preprint2019arXiv

Distributed Constrained Online Learning

In this paper, we consider groups of agents in a network that select actions in order to satisfy a set of constraints that vary arbitrarily over time and minimize a time-varying function of which they have only local observations. The selection of actions, also called a strategy, is causal and decentralized, i.e., the dynamical system that determines the actions of a given agent depends only on the constraints at the current time and on its own actions and those of its neighbors. To determine such a strategy, we propose a decentralized saddle point algorithm and show that the corresponding global fit and regret are bounded by functions of the order of $\sqrt{T}$. Specifically, we define the global fit of a strategy as a vector that integrates over time the global constraint violations as seen by a given node. The fit is a performance loss associated with online operation as opposed to offline clairvoyant operation which can always select an action if one exists, that satisfies the constraints at all times. If this fit grows sublinearly with the time horizon it suggests that the strategy approaches the feasible set of actions. Likewise, we define the regret of a strategy as the difference between its accumulated cost and that of the best fixed action that one could select knowing beforehand the time evolution of the objective function. Numerical examples support the theoretical conclusions.

preprint2016arXiv

Simultaneous Intermittent Communication Control and Path Optimization in Networks of Mobile Robots

In this paper, we propose an intermittent communication framework for mobile robot networks. Specifically, we consider robots that move along the edges of a connected mobility graph and communicate only when they meet at the nodes of that graph giving rise to a dynamic communication network. Our proposed distributed controllers ensure intermittent connectivity of the network and path optimization, simultaneously. We show that the intermittent connectivity requirement can be encapsulated by a global Linear Temporal Logic (LTL) formula. Then we approximately decompose it into local LTL expressions which are then assigned to the robots. To avoid conflicting robot behaviors that can occur due to this approximate decomposition, we develop a distributed conflict resolution scheme that generates non-conflicting discrete motion plans for every robot, based on the assigned local LTL expressions, whose composition satisfies the global LTL formula. By appropriately introducing delays in the execution of the generated motion plans we also show that the proposed controllers can be executed asynchronously.

preprint2013arXiv

Mobile Jammers for Secrecy Rate Maximization in Cooperative Networks

We consider a source (Alice) trying to communicate with a destination (Bob), in a way that an unauthorized node (Eve) cannot infer, based on her observations, the information that is being transmitted. The communication is assisted by multiple multi-antenna cooperating nodes (helpers) who have the ability to move. While Alice transmits, the helpers transmit noise that is designed to affect the entire space except Bob. We consider the problem of selecting the helper weights and positions that maximize the system secrecy rate. It turns out that this optimization problem can be efficiently solved, leading to a novel decentralized helper motion control scheme. Simulations indicate that introducing helper mobility leads to considerable savings in terms of helper transmit power, as well as total number of helpers required for secrecy communications.

preprint2012arXiv

Spectral Design of Dynamic Networks via Local Operations

Motivated by the relationship between the eigenvalue spectrum of the Laplacian matrix of a network and the behavior of dynamical processes evolving in it, we propose a distributed iterative algorithm in which a group of $n$ autonomous agents self-organize the structure of their communication network in order to control the network's eigenvalue spectrum. In our algorithm, we assume that each agent has access only to a local (myopic) view of the network around it. In each iteration, agents in the network peform a decentralized decision process to determine the edge addition/deletion that minimizes a distance function defined in the space of eigenvalue spectra. This spectral distance presents interesting theoretical properties that allow an efficient distributed implementation of the decision process. Our iterative algorithm is stable by construction, i.e., locally optimizes the network's eigenvalue spectrum, and is shown to perform extremely well in practice. We illustrate our results with nontrivial simulations in which we design networks matching the spectral properties of complex networks, such as small-world and power-law networks.

preprint2010arXiv

Distributed Control of the Laplacian Spectral Moments of a Network

It is well-known that the eigenvalue spectrum of the Laplacian matrix of a network contains valuable information about the network structure and the behavior of many dynamical processes run on it. In this paper, we propose a fully decentralized algorithm that iteratively modifies the structure of a network of agents in order to control the moments of the Laplacian eigenvalue spectrum. Although the individual agents have knowledge of their local network structure only (i.e., myopic information), they are collectively able to aggregate this local information and decide on what links are most beneficial to be added or removed at each time step. Our approach relies on gossip algorithms to distributively compute the spectral moments of the Laplacian matrix, as well as ensure network connectivity in the presence of link deletions. We illustrate our approach in nontrivial computer simulations and show that a good final approximation of the spectral moments of the target Laplacian matrix is achieved for many cases of interest.

preprint2010arXiv

Spectral Control of Mobile Robot Networks

The eigenvalue spectrum of the adjacency matrix of a network is closely related to the behavior of many dynamical processes run over the network. In the field of robotics, this spectrum has important implications in many problems that require some form of distributed coordination within a team of robots. In this paper, we propose a continuous-time control scheme that modifies the structure of a position-dependent network of mobile robots so that it achieves a desired set of adjacency eigenvalues. For this, we employ a novel abstraction of the eigenvalue spectrum by means of the adjacency matrix spectral moments. Since the eigenvalue spectrum is uniquely determined by its spectral moments, this abstraction provides a way to indirectly control the eigenvalues of the network. Our construction is based on artificial potentials that capture the distance of the network's spectral moments to their desired values. Minimization of these potentials is via a gradient descent closed-loop system that, under certain convexity assumptions, ensures convergence of the network topology to one with the desired set of moments and, therefore, eigenvalues. We illustrate our approach in nontrivial computer simulations.

Michael M. Zavlanos

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games

Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health

Risk-Averse No-Regret Learning in Online Convex Games

Temporal Logic Task Allocation in Heterogeneous Multi-Robot Systems

Formal Verification of Stochastic Systems with ReLU Neural Network Controllers

Physics-Based Learning for Robotic Environmental Sensing

Deep Learning for Robotic Mass Transport Cloaking

Socially-Aware Robot Planning via Bandit Human Feedback

STyLuS*: A Temporal Logic Optimal Control Synthesis Algorithm for Large-Scale Multi-Robot Systems

Transfer Reinforcement Learning under Unobserved Contextual Information

Distributed Constrained Online Learning

Simultaneous Intermittent Communication Control and Path Optimization in Networks of Mobile Robots

Mobile Jammers for Secrecy Rate Maximization in Cooperative Networks

Spectral Design of Dynamic Networks via Local Operations

Distributed Control of the Laplacian Spectral Moments of a Network

Spectral Control of Mobile Robot Networks