Researcher profile

Aivar Sootla

Aivar Sootla contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2022arXiv

Block Factor-width-two Matrices and Their Applications to Semidefinite and Sum-of-squares Optimization

Semidefinite and sum-of-squares (SOS) optimization are fundamental computational tools in many areas, including linear and nonlinear systems theory. However, the scale of problems that can be addressed reliably and efficiently is still limited. In this paper, we introduce a new notion of block factor-width-two matrices and build a new hierarchy of inner and outer approximations of the cone of positive semidefinite (PSD) matrices. This notion is a block extension of the standard factor-width-two matrices, and allows for an improved inner-approximation of the PSD cone. In the context of SOS optimization, this leads to a block extension of the scaled diagonally dominant sum-of-squares (SDSOS) polynomials. By varying a matrix partition, the notion of block factor-width-two matrices can balance a trade-off between the computation scalability and solution quality for solving semidefinite and SOS optimization problems. Numerical experiments on a range of large-scale instances confirm our theoretical findings.

preprint2022arXiv

Reinforcement Learning in Presence of Discrete Markovian Context Evolution

We consider a context-dependent Reinforcement Learning (RL) setting, which is characterized by: a) an unknown finite number of not directly observable contexts; b) abrupt (discontinuous) context changes occurring during an episode; and c) Markovian context evolution. We argue that this challenging case is often met in applications and we tackle it using a Bayesian approach and variational inference. We adapt a sticky Hierarchical Dirichlet Process (HDP) prior for model learning, which is arguably best-suited for Markov process modeling. We then derive a context distillation procedure, which identifies and removes spurious contexts in an unsupervised fashion. We argue that the combination of these two components allows to infer the number of contexts from data thus dealing with the context cardinality assumption. We then find the representation of the optimal policy enabling efficient policy learning using off-the-shelf RL algorithms. Finally, we demonstrate empirically (using gym environments cart-pole swing-up, drone, intersection) that our approach succeeds where state-of-the-art methods of other frameworks fail and elaborate on the reasons for such failures.

preprint2022arXiv

Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation

Satisfying safety constraints almost surely (or with probability one) can be critical for the deployment of Reinforcement Learning (RL) in real-life applications. For example, plane landing and take-off should ideally occur with probability one. We address the problem by introducing Safety Augmented (Saute) Markov Decision Processes (MDPs), where the safety constraints are eliminated by augmenting them into the state-space and reshaping the objective. We show that Saute MDP satisfies the Bellman equation and moves us closer to solving Safe RL with constraints satisfied almost surely. We argue that Saute MDP allows viewing the Safe RL problem from a different perspective enabling new features. For instance, our approach has a plug-and-play nature, i.e., any RL algorithm can be "Sauteed". Additionally, state augmentation allows for policy generalization across safety constraints. We finally show that Saute RL algorithms can outperform their state-of-the-art counterparts when constraint satisfaction is of high importance.

preprint2022arXiv

SEREN: Knowing When to Explore and When to Exploit

Efficient reinforcement learning (RL) involves a trade-off between "exploitative" actions that maximise expected reward and "explorative'" ones that sample unvisited states. To encourage exploration, recent approaches proposed adding stochasticity to actions, separating exploration and exploitation phases, or equating reduction in uncertainty with reward. However, these techniques do not necessarily offer entirely systematic approaches making this trade-off. Here we introduce SElective Reinforcement Exploration Network (SEREN) that poses the exploration-exploitation trade-off as a game between an RL agent -- \exploiter, which purely exploits known rewards, and another RL agent -- \switcher, which chooses at which states to activate a pure exploration policy that is trained to minimise system uncertainty and override Exploiter. Using a form of policies known as impulse control, \switcher is able to determine the best set of states to switch to the exploration policy while Exploiter is free to execute its actions everywhere else. We prove that SEREN converges quickly and induces a natural schedule towards pure exploitation. Through extensive empirical studies in both discrete (MiniGrid) and continuous (MuJoCo) control benchmarks, we show that SEREN can be readily combined with existing RL algorithms to yield significant improvement in performance relative to state-of-the-art algorithms.

preprint2022arXiv

Structured Q-learning For Antibody Design

Optimizing combinatorial structures is core to many real-world problems, such as those encountered in life sciences. For example, one of the crucial steps involved in antibody design is to find an arrangement of amino acids in a protein sequence that improves its binding with a pathogen. Combinatorial optimization of antibodies is difficult due to extremely large search spaces and non-linear objectives. Even for modest antibody design problems, where proteins have a sequence length of eleven, we are faced with searching over 2.05 x 10^14 structures. Applying traditional Reinforcement Learning algorithms such as Q-learning to combinatorial optimization results in poor performance. We propose Structured Q-learning (SQL), an extension of Q-learning that incorporates structural priors for combinatorial optimization. Using a molecular docking simulator, we demonstrate that SQL finds high binding energy sequences and performs favourably against baselines on eight challenging antibody design tasks, including designing antibodies for SARS-COV.

preprint2020arXiv

SAMBA: Safe Model-Based & Active Reinforcement Learning

In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel(semi-)metrics for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. Our results show orders of magnitude reductions in samples and violations compared to state-of-the-art methods. Lastly, we provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.

preprint2019arXiv

Distributed Design for Decentralized Control using Chordal Decomposition and ADMM

We propose a distributed design method for decentralized control by exploiting the underlying sparsity properties of the problem. Our method is based on chordal decomposition of sparse block matrices and the alternating direction method of multipliers (ADMM). We first apply a classical parameterization technique to restrict the optimal decentralized control into a convex problem that inherits the sparsity pattern of the original problem. The parameterization relies on a notion of strongly decentralized stabilization, and sufficient conditions are discussed to guarantee this notion. Then, chordal decomposition allows us to decompose the convex restriction into a problem with partially coupled constraints, and the framework of ADMM enables us to solve the decomposed problem in a distributed fashion. Consequently, the subsystems only need to share their model data with their direct neighbours, not needing a central computation. Numerical experiments demonstrate the effectiveness of the proposed method.

preprint2019arXiv

On the Existence of Block-Diagonal Solutions to Lyapunov and $\mathcal{H}_{\infty}$ Riccati Inequalities

In this paper, we describe sufficient conditions when block-diagonal solutions to Lyapunov and $\mathcal{H}_{\infty}$ Riccati inequalities exist. In order to derive our results, we define a new type of comparison systems, which are positive and are computed using the state-space matrices of the original (possibly nonpositive) systems. Computing the comparison system involves only the calculation of $\mathcal{H}_{\infty}$ norms of its subsystems. We show that the stability of this comparison system implies the existence of block-diagonal solutions to Lyapunov and Riccati inequalities. Furthermore, our proof is constructive and the overall framework allows the computation of block-diagonal solutions to these matrix inequalities with linear algebra and linear programming. Numerical examples illustrate our theoretical results.