Source author record

Aivar Sootla

Aivar Sootla appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Systems and Control Machine Learning Artificial Intelligence Quantitative Methods math.DS Computational Engineering, Finance, and Science Distributed, Parallel, and Cluster Computing eess.SY Molecular Networks Robotics

Catalog footprint

What is connected

18works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Block Factor-width-two Matrices and Their Applications to Semidefinite and Sum-of-squares Optimization

Semidefinite and sum-of-squares (SOS) optimization are fundamental computational tools in many areas, including linear and nonlinear systems theory. However, the scale of problems that can be addressed reliably and efficiently is still limited. In this paper, we introduce a new notion of block factor-width-two matrices and build a new hierarchy of inner and outer approximations of the cone of positive semidefinite (PSD) matrices. This notion is a block extension of the standard factor-width-two matrices, and allows for an improved inner-approximation of the PSD cone. In the context of SOS optimization, this leads to a block extension of the scaled diagonally dominant sum-of-squares (SDSOS) polynomials. By varying a matrix partition, the notion of block factor-width-two matrices can balance a trade-off between the computation scalability and solution quality for solving semidefinite and SOS optimization problems. Numerical experiments on a range of large-scale instances confirm our theoretical findings.

preprint2022arXiv

Reinforcement Learning in Presence of Discrete Markovian Context Evolution

We consider a context-dependent Reinforcement Learning (RL) setting, which is characterized by: a) an unknown finite number of not directly observable contexts; b) abrupt (discontinuous) context changes occurring during an episode; and c) Markovian context evolution. We argue that this challenging case is often met in applications and we tackle it using a Bayesian approach and variational inference. We adapt a sticky Hierarchical Dirichlet Process (HDP) prior for model learning, which is arguably best-suited for Markov process modeling. We then derive a context distillation procedure, which identifies and removes spurious contexts in an unsupervised fashion. We argue that the combination of these two components allows to infer the number of contexts from data thus dealing with the context cardinality assumption. We then find the representation of the optimal policy enabling efficient policy learning using off-the-shelf RL algorithms. Finally, we demonstrate empirically (using gym environments cart-pole swing-up, drone, intersection) that our approach succeeds where state-of-the-art methods of other frameworks fail and elaborate on the reasons for such failures.

preprint2022arXiv

Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation

Satisfying safety constraints almost surely (or with probability one) can be critical for the deployment of Reinforcement Learning (RL) in real-life applications. For example, plane landing and take-off should ideally occur with probability one. We address the problem by introducing Safety Augmented (Saute) Markov Decision Processes (MDPs), where the safety constraints are eliminated by augmenting them into the state-space and reshaping the objective. We show that Saute MDP satisfies the Bellman equation and moves us closer to solving Safe RL with constraints satisfied almost surely. We argue that Saute MDP allows viewing the Safe RL problem from a different perspective enabling new features. For instance, our approach has a plug-and-play nature, i.e., any RL algorithm can be "Sauteed". Additionally, state augmentation allows for policy generalization across safety constraints. We finally show that Saute RL algorithms can outperform their state-of-the-art counterparts when constraint satisfaction is of high importance.

preprint2022arXiv

SEREN: Knowing When to Explore and When to Exploit

Efficient reinforcement learning (RL) involves a trade-off between "exploitative" actions that maximise expected reward and "explorative'" ones that sample unvisited states. To encourage exploration, recent approaches proposed adding stochasticity to actions, separating exploration and exploitation phases, or equating reduction in uncertainty with reward. However, these techniques do not necessarily offer entirely systematic approaches making this trade-off. Here we introduce SElective Reinforcement Exploration Network (SEREN) that poses the exploration-exploitation trade-off as a game between an RL agent -- \exploiter, which purely exploits known rewards, and another RL agent -- \switcher, which chooses at which states to activate a pure exploration policy that is trained to minimise system uncertainty and override Exploiter. Using a form of policies known as impulse control, \switcher is able to determine the best set of states to switch to the exploration policy while Exploiter is free to execute its actions everywhere else. We prove that SEREN converges quickly and induces a natural schedule towards pure exploitation. Through extensive empirical studies in both discrete (MiniGrid) and continuous (MuJoCo) control benchmarks, we show that SEREN can be readily combined with existing RL algorithms to yield significant improvement in performance relative to state-of-the-art algorithms.

preprint2022arXiv

Structured Q-learning For Antibody Design

Optimizing combinatorial structures is core to many real-world problems, such as those encountered in life sciences. For example, one of the crucial steps involved in antibody design is to find an arrangement of amino acids in a protein sequence that improves its binding with a pathogen. Combinatorial optimization of antibodies is difficult due to extremely large search spaces and non-linear objectives. Even for modest antibody design problems, where proteins have a sequence length of eleven, we are faced with searching over 2.05 x 10^14 structures. Applying traditional Reinforcement Learning algorithms such as Q-learning to combinatorial optimization results in poor performance. We propose Structured Q-learning (SQL), an extension of Q-learning that incorporates structural priors for combinatorial optimization. Using a molecular docking simulator, we demonstrate that SQL finds high binding energy sequences and performs favourably against baselines on eight challenging antibody design tasks, including designing antibodies for SARS-COV.

preprint2020arXiv

SAMBA: Safe Model-Based & Active Reinforcement Learning

In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel(semi-)metrics for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. Our results show orders of magnitude reductions in samples and violations compared to state-of-the-art methods. Lastly, we provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.

preprint2019arXiv

Distributed Design for Decentralized Control using Chordal Decomposition and ADMM

We propose a distributed design method for decentralized control by exploiting the underlying sparsity properties of the problem. Our method is based on chordal decomposition of sparse block matrices and the alternating direction method of multipliers (ADMM). We first apply a classical parameterization technique to restrict the optimal decentralized control into a convex problem that inherits the sparsity pattern of the original problem. The parameterization relies on a notion of strongly decentralized stabilization, and sufficient conditions are discussed to guarantee this notion. Then, chordal decomposition allows us to decompose the convex restriction into a problem with partially coupled constraints, and the framework of ADMM enables us to solve the decomposed problem in a distributed fashion. Consequently, the subsystems only need to share their model data with their direct neighbours, not needing a central computation. Numerical experiments demonstrate the effectiveness of the proposed method.

preprint2019arXiv

On the Existence of Block-Diagonal Solutions to Lyapunov and $\mathcal{H}_{\infty}$ Riccati Inequalities

In this paper, we describe sufficient conditions when block-diagonal solutions to Lyapunov and $\mathcal{H}_{\infty}$ Riccati inequalities exist. In order to derive our results, we define a new type of comparison systems, which are positive and are computed using the state-space matrices of the original (possibly nonpositive) systems. Computing the comparison system involves only the calculation of $\mathcal{H}_{\infty}$ norms of its subsystems. We show that the stability of this comparison system implies the existence of block-diagonal solutions to Lyapunov and Riccati inequalities. Furthermore, our proof is constructive and the overall framework allows the computation of block-diagonal solutions to these matrix inequalities with linear algebra and linear programming. Numerical examples illustrate our theoretical results.

preprint2016arXiv

Properties of Isostables and Basins of Attraction of Monotone Systems

In this paper, we investigate geometric properties of monotone systems by studying their isostables and basins of attraction. Isostables are boundaries of specific forward-invariant sets defined by the so-called Koopman operator, which provides a linear infinite-dimensional description of a nonlinear system. First, we study the spectral properties of the Koopman operator and the associated semigroup in the context of monotone systems. Our results generalize the celebrated Perron-Frobenius theorem to the nonlinear case and allow us to derive geometric properties of isostables and basins of attraction. Additionally, we show that under certain conditions we can characterize the bounds on the basins of attraction under parametric uncertainty in the vector field. We discuss computational approaches to estimate isostables and basins of attraction and illustrate the results on two and four state monotone systems.

preprint2016arXiv

Shaping Pulses to Control Bistable Monotone Systems Using Koopman Operator

In this paper, we further develop a recently proposed control method to switch a bistable system between its steady states using temporal pulses. The motivation for using pulses comes from biomedical and biological applications (e.g. synthetic biology), where it is generally difficult to build feedback control systems due to technical limitations in sensing and actuation. The original framework was derived for monotone systems and all the extensions relied on monotone systems theory. In contrast, we introduce the concept of switching function which is related to eigenfunctions of the so-called Koopman operator subject to a fixed control pulse. Using the level sets of the switching function we can (i) compute the set of all pulses that drive the system toward the steady state in a synchronous way and (ii) estimate the time needed by the flow to reach an epsilon neighborhood of the target steady state. Additionally, we show that for monotone systems the switching function is also monotone in some sense, a property that can yield efficient algorithms to compute it. This observation recovers and further extends the results of the original framework, which we illustrate on numerical examples inspired by biological applications.

preprint2015arXiv

On Monotonicity and Propagation of Order Properties

In this paper, a link between monotonicity of deterministic dynamical systems and propagation of order by Markov processes is established. The order propagation has received considerable attention in the literature, however, this notion is still not fully understood. The main contribution of this paper is a study of the order propagation in the deterministic setting, which potentially can provide new techniques for analysis in the stochastic one. We take a close look at the propagation of the so-called increasing and increasing convex orders. Infinitesimal characterisations of these orders are derived, which resemble the well-known Kamke conditions for monotonicity. It is shown that increasing order is equivalent to the standard monotonicity, while the class of systems propagating the increasing convex order is equivalent to the class of monotone systems with convex vector fields. The paper is concluded by deriving a novel result on order propagating diffusion processes and an application of this result to biological processes.

preprint2015arXiv

Shaping Pulses to Control Bistable Biological Systems

In this paper we study how to shape temporal pulses to switch a bistable system between its stable steady states. Our motivation for pulse-based control comes from applications in synthetic biology, where it is generally difficult to implement real-time feedback control systems due to technical limitations in sensors and actuators. We show that for monotone bistable systems, the estimation of the set of all pulses that switch the system reduces to the computation of one non-increasing curve. We provide an efficient algorithm to compute this curve and illustrate the results with a genetic bistable system commonly used in synthetic biology. We also extend these results to models with parametric uncertainty and provide a number of examples and counterexamples that demonstrate the power and limitations of the current theory. In order to show the full potential of the framework, we consider the problem of inducing oscillations in a monotone biochemical system using a combination of temporal pulses and event-based control. Our results provide an insight into the dynamics of bistable systems under external inputs and open up numerous directions for future investigation.

preprint2015arXiv

Structured Projection-Based Model Reduction with Application to Stochastic Biochemical Networks

The Chemical Master Equation (CME) is well known to provide the highest resolution models of a biochemical reaction network. Unfortunately, even simulating the CME can be a challenging task. For this reason more simple approximations to the CME have been proposed. In this work we focus on one such model, the Linear Noise Approximation. Specifically, we consider implications of a recently proposed LNA time-scale separation method. We show that the reduced order LNA converges to the full order model in the mean square sense. Using this as motivation we derive a network structure preserving reduction algorithm based on structured projections. We present convex optimisation algorithms that describe how such projections can be computed and we discuss when structured solutions exits. We also show that for a certain class of systems, structured projections can be found using basic linear algebra and no optimisation is necessary. The algorithms are then applied to a linearised stochastic LNA model of the yeast glycolysis pathway.

preprint2015arXiv

Toggling a Genetic Switch Using Reinforcement Learning

In this paper, we consider the problem of optimal exogenous control of gene regulatory networks. Our approach consists in adapting an established reinforcement learning algorithm called the fitted Q iteration. This algorithm infers the control law directly from the measurements of the system's response to external control inputs without the use of a mathematical model of the system. The measurement data set can either be collected from wet-lab experiments or artificially created by computer simulations of dynamical models of the system. The algorithm is applicable to a wide range of biological systems due to its ability to deal with nonlinear and stochastic system dynamics. To illustrate the application of the algorithm to a gene regulatory network, the regulation of the toggle switch system is considered. The control objective of this problem is to drive the concentrations of two specific proteins to a target region in the state space.

preprint2014arXiv

Distributed Reconstruction of Nonlinear Networks: An ADMM Approach

In this paper, we present a distributed algorithm for the reconstruction of large-scale nonlinear networks. In particular, we focus on the identification from time-series data of the nonlinear functional forms and associated parameters of large-scale nonlinear networks. Recently, a nonlinear network reconstruction problem was formulated as a nonconvex optimisation problem based on the combination of a marginal likelihood maximisation procedure with sparsity inducing priors. Using a convex-concave procedure (CCCP), an iterative reweighted lasso algorithm was derived to solve the initial nonconvex optimisation problem. By exploiting the structure of the objective function of this reweighted lasso algorithm, a distributed algorithm can be designed. To this end, we apply the alternating direction method of multipliers (ADMM) to decompose the original problem into several subproblems. To illustrate the effectiveness of the proposed methods, we use our approach to identify a network of interconnected Kuramoto oscillators with different network sizes (500~100,000 nodes).

preprint2014arXiv

On Projection-Based Model Reduction of Biochemical Networks-- Part I: The Deterministic Case

This paper addresses the problem of model reduction for dynamical system models that describe biochemical reaction networks. Inherent in such models are properties such as stability, positivity and network structure. Ideally these properties should be preserved by model reduction procedures, although traditional projection based approaches struggle to do this. We propose a projection based model reduction algorithm which uses generalised block diagonal Gramians to preserve structure and positivity. Two algorithms are presented, one provides more accurate reduced order models, the second provides easier to simulate reduced order models. The results are illustrated through numerical examples.

preprint2014arXiv

On Projection-Based Model Reduction of Biochemical Networks-- Part II: The Stochastic Case

In this paper, we consider the problem of model order reduction of stochastic biochemical networks. In particular, we reduce the order of (the number of equations in) the Linear Noise Approximation of the Chemical Master Equation, which is often used to describe biochemical networks. In contrast to other biochemical network reduction methods, the presented one is projection-based. Projection-based methods are powerful tools, but the cost of their use is the loss of physical interpretation of the nodes in the network. In order alleviate this drawback, we employ structured projectors, which means that some nodes in the network will keep their physical interpretation. For many models in engineering, finding structured projectors is not always feasible; however, in the context of biochemical networks it is much more likely as the networks are often (almost) monotonic. To summarise, the method can serve as a trade-off between approximation quality and physical interpretation, which is illustrated on numerical examples.

preprint2013arXiv

On Periodic Reference Tracking Using Batch-Mode Reinforcement Learning with Application to Gene Regulatory Network Control

In this paper, we consider the periodic reference tracking problem in the framework of batch-mode reinforcement learning, which studies methods for solving optimal control problems from the sole knowledge of a set of trajectories. In particular, we extend an existing batch-mode reinforcement learning algorithm, known as Fitted Q Iteration, to the periodic reference tracking problem. The presented periodic reference tracking algorithm explicitly exploits a priori knowledge of the future values of the reference trajectory and its periodicity. We discuss the properties of our approach and illustrate it on the problem of reference tracking for a synthetic biology gene regulatory network known as the generalised repressilator. This system can produce decaying but long-lived oscillations, which makes it an interesting system for the tracking problem. In our companion paper we also take a look at the regulation problem of the toggle switch system, where the main goal is to drive the system's states to a specific bounded region in the state space.

Aivar Sootla

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Block Factor-width-two Matrices and Their Applications to Semidefinite and Sum-of-squares Optimization

Reinforcement Learning in Presence of Discrete Markovian Context Evolution

Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation

SEREN: Knowing When to Explore and When to Exploit

Structured Q-learning For Antibody Design

SAMBA: Safe Model-Based & Active Reinforcement Learning

Distributed Design for Decentralized Control using Chordal Decomposition and ADMM

On the Existence of Block-Diagonal Solutions to Lyapunov and $\mathcal{H}_{\infty}$ Riccati Inequalities

Properties of Isostables and Basins of Attraction of Monotone Systems

Shaping Pulses to Control Bistable Monotone Systems Using Koopman Operator

On Monotonicity and Propagation of Order Properties

Shaping Pulses to Control Bistable Biological Systems

Structured Projection-Based Model Reduction with Application to Stochastic Biochemical Networks

Toggling a Genetic Switch Using Reinforcement Learning

Distributed Reconstruction of Nonlinear Networks: An ADMM Approach

On Projection-Based Model Reduction of Biochemical Networks-- Part I: The Deterministic Case

On Projection-Based Model Reduction of Biochemical Networks-- Part II: The Stochastic Case

On Periodic Reference Tracking Using Batch-Mode Reinforcement Learning with Application to Gene Regulatory Network Control