Source author record

Toshimitsu Ushio

Toshimitsu Ushio appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Systems and Control eess.SY Formal Languages and Automata Theory Computer Science and Game Theory Machine Learning Artificial Intelligence Cryptography and Security Logic in Computer Science

Catalog footprint

What is connected

11works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Bounded Synthesis and Reinforcement Learning of Supervisors for Stochastic Discrete Event Systems with LTL Specifications

In this paper, we consider supervisory control of stochastic discrete event systems (SDESs) under linear temporal logic specifications. Applying the bounded synthesis, we reduce the supervisor synthesis into a problem of satisfying a safety condition. First, we consider a synthesis problem of a directed controller using the safety condition. We assign a negative reward to the unsafe states and introduce an expected return with a state-dependent discount factor. We compute a winning region and a directed controller with the maximum satisfaction probability using a dynamic programming method, where the expected return is used as a value function. Next, we construct a permissive supervisor via the optimal value function. We show that the supervisor accomplishes the maximum satisfaction probability and maximizes the reachable set within the winning region. Finally, for an unknown SDES, we propose a two-stage model-free reinforcement learning method for efficient learning of the winning region and the directed controllers with the maximum satisfaction probability. We also demonstrate the effectiveness of the proposed method by simulation.

preprint2022arXiv

Deep Reinforcement Learning Based Networked Control with Network Delays for Signal Temporal Logic Specifications

We apply deep reinforcement learning (DRL) to design of a networked controller with network delays to complete a temporal control task that is described by a signal temporal logic (STL) formula. STL is useful to deal with a specification with a bounded time interval for a dynamical system. In general, an agent needs not only the current system state but also the past behavior of the system to determine a desired control action for satisfying the given STL formula. Additionally, we need to consider the effect of network delays for data transmissions. Thus, we propose an extended Markov decision process using past system states and control actions, which is called a $τd$-MDP, so that the agent can evaluate the satisfaction of the STL formula considering the network delays. Thereafter, we apply a DRL algorithm to design a networked controller using the $τd$-MDP. Through simulations, we also demonstrate the learning performance of the proposed algorithm.

preprint2022arXiv

Learning-based Bounded Synthesis for Semi-MDPs with LTL Specifications

This letter proposes a learning-based bounded synthesis for a semi-Markov decision process (SMDP) with a linear temporal logic (LTL) specification. In the product of the SMDP and the deterministic $K$-co-Büchi automaton (d$K$cBA) converted from the LTL specification, we learn both the winning region of satisfying the LTL specification and the dynamics therein based on reinforcement learning and Bayesian inference. Then, we synthesize an optimal policy satisfying the following two conditions. (1) It maximizes the probability of reaching the wining region. (2) It minimizes a long-term risk for the dwell time within the winning region. The minimization of the long-term risk is done based on the estimated dynamics and a value iteration. We show that, if the discount factor is sufficiently close to one, the synthesized policy converges to the optimal policy as the number of the data obtained by the exploration goes to the infinity.

preprint2022arXiv

Learning-based Symbolic Abstractions for Nonlinear Control Systems

Symbolic models or abstractions are known to be powerful tools for the control design of cyber-physical systems (CPSs) with logic specifications. In this paper, we investigate a novel learning-based approach to the construction of symbolic models for nonlinear control systems. In particular, the symbolic model is constructed based on learning the un-modeled part of the dynamics from training data based on state-space exploration, and the concept of an alternating simulation relation that represents behavioral relationships with respect to the original control system. Moreover, we aim at achieving safe exploration, meaning that the trajectory of the system is guaranteed to be in a safe region for all times while collecting the training data. In addition, we provide some techniques to reduce the computational load, in terms of memory and computation time, of constructing the symbolic models and the safety controller synthesis, so as to make our approach practical. Finally, a numerical simulation illustrates the effectiveness of the proposed approach.

preprint2021arXiv

Stability analysis and control of decision-making of miners in blockchain

To maintain blockchain-based services with ensuring its security, it is an important issue how to decide a mining reward so that the number of miners participating in the mining increases. We propose a dynamical model of decision-making for miners using an evolutionary game approach and analyze the stability of equilibrium points of the proposed model. The proposed model is described by the 1st-order differential equation. So, it is simple but its theoretical analysis gives an insight into the characteristics of the decision-making. Through the analysis of the equilibrium points, we show the transcritical bifurcations and hysteresis phenomena of the equilibrium points. We also design a controller that determines the mining reward based on the number of participating miners to stabilize the state that all miners participate in the mining. Numerical simulation shows that there is a trade-off in the choice of the design parameters.

preprint2020arXiv

Learning self-triggered controllers with Gaussian processes

This paper investigates the design of self-triggered controllers for networked control systems (NCSs), where the dynamics of the plant is \textit{unknown} apriori. To deal with the unknown transition dynamics, we employ the Gaussian process (GP) regression in order to learn the dynamics of the plant. To design the self-triggered controller, we formulate an optimal control problem, such that the optimal control and communication policies can be jointly designed based on the GP model of the plant. Moreover, we provide an overall implementation algorithm that jointly learns the dynamics of the plant and the self-triggered controller based on a reinforcement learning framework. Finally, a numerical simulation illustrates the effectiveness of the proposed approach.

preprint2020arXiv

On-Line Permissive Supervisory Control of Discrete Event Systems for scLTL Specifications

We propose an on-line supervisory control scheme for discrete event systems (DESs), where a control specification is described by a fragment of linear temporal logic. On the product automaton of the DES and an acceptor for the specification, we define a ranking function that returns the minimum number of steps required to reach an accepting state from each state. In addition, we introduce a permissiveness function that indicates a time-varying permissive level. At each step during the on-line control scheme, the supervisor refers to the permissiveness function as well as the ranking function in order to guarantee the control specification while handling the tradeoff between its permissiveness and acceptance of the specification. The proposed scheme is demonstrated in a surveillance problem for a mobile robot.

preprint2020arXiv

On-Line Synthesis of Permissive Supervisors for Partially Observed Discrete Event Systems under scLTL Constraints

We consider a supervisory control problem of a discrete event system (DES) under partial observation, where a control specification is given by a fragment of linear temporal logic. We design an on-line supervisor that dynamically computes its control action with the complete information of the product automaton of the DES and an acceptor for the specification. The concepts of controllability and observability are defined by means of a ranking function defined on the product automaton, which decreases its value if an accepting state of the product automaton is being approached. The proposed on-line control scheme leverages the ranking function and a permissiveness function, which represents a time-varying permissiveness level. As a result, the on-line supervisor achieves the specification, being aware of the tradeoff between its permissiveness and acceptance of the specification, if the product automaton is controllable and observable.

preprint2020arXiv

Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Generalized Büchi Automata

This letter proposes a novel reinforcement learning method for the synthesis of a control policy satisfying a control specification described by a linear temporal logic formula. We assume that the controlled system is modeled by a Markov decision process (MDP). We convert the specification to a limit-deterministic generalized Büchi automaton (LDGBA) with several accepting sets that accepts all infinite sequences satisfying the formula. The LDGBA is augmented so that it explicitly records the previous visits to accepting sets. We take a product of the augmented LDGBA and the MDP, based on which we define a reward function. The agent gets rewards whenever state transitions are in an accepting set that has not been visited for a certain number of steps. Consequently, sparsity of rewards is relaxed and optimal circulations among the accepting sets are learned. We show that the proposed method can learn an optimal policy when the discount factor is sufficiently close to one.

preprint2016arXiv

Output Feedback Controller Design with Symbolic Observers for Cyber-physical Systems

In this paper, we design a symbolic output feedback controller of a cyber-physical system (CPS). The physical plant is modeled by an infinite transition system. We consider the situation that a finite abstracted system of the physical plant, called a c-abstracted system, is given. There exists an approximate alternating simulation relation from the c-abstracted system to the physical plant. A desired behavior of the c-abstracted system is also given, and we have a symbolic state feedback controller of the physical plant. We consider the case where some states of the plant are not measured. Then, to estimate the states with abstracted outputs measured by sensors, we introduce a finite abstracted system of the physical plant, called an o-abstracted system, such that there exists an approximate simulation relation. The relation guarantees that an observer designed based on the state of the o-abstracted system estimates the current state of the plant. We construct a symbolic output feedback controller by composing these systems. By a relation-based approach, we proved that the controlled system approximately exhibits the desired behavior.

preprint2013arXiv

Game Theoretic Approach to the Stabilization of Heterogeneous Multiagent Systems Using Subsidy

We consider a multiagent system consisting of selfish and heterogeneous agents. Its behavior is modeled by multipopulation replicator dynamics, where payoff functions of populations are different from each other. In general, there exist several equilibrium points in the replicator dynamics. In order to stabilize a desirable equilibrium point, we introduce a controller called a government which controls the behaviors of agents by offering them subsidies. In previous work, it is assumed that the government determines the subsidies based on the populations the agents belong to. In general, however, the government cannot identify the members of each population. In this paper, we assume that the government observes the action of each agent and determines the subsidies based on the observed action profile. Then, we model the controlled behaviors of the agents using replicator dynamics with feedback. We derive a stabilization condition of the target equilibrium point in the replicator dynamics.

Toshimitsu Ushio

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Bounded Synthesis and Reinforcement Learning of Supervisors for Stochastic Discrete Event Systems with LTL Specifications

Deep Reinforcement Learning Based Networked Control with Network Delays for Signal Temporal Logic Specifications

Learning-based Bounded Synthesis for Semi-MDPs with LTL Specifications

Learning-based Symbolic Abstractions for Nonlinear Control Systems

Stability analysis and control of decision-making of miners in blockchain

Learning self-triggered controllers with Gaussian processes

On-Line Permissive Supervisory Control of Discrete Event Systems for scLTL Specifications

On-Line Synthesis of Permissive Supervisors for Partially Observed Discrete Event Systems under scLTL Constraints

Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Generalized Büchi Automata

Output Feedback Controller Design with Symbolic Observers for Cyber-physical Systems

Game Theoretic Approach to the Stabilization of Heterogeneous Multiagent Systems Using Subsidy