Researcher profile

Akifumi Wachi

Akifumi Wachi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2024arXiv

Learning-based Event-triggered MPC with Gaussian processes under terminal constraints

Event-triggered control strategy is capable of significantly reducing the number of control task executions without sacrificing control performance. In this paper, we propose a novel learning-based approach towards an event-triggered model predictive control (MPC) for nonlinear control systems whose dynamics are unknown apriori. In particular, the optimal control problems (OCPs) are formulated based on predictive states learned by Gaussian process (GP) regression under a terminal constraint constructed by a symbolic abstraction. The event-triggered condition proposed in this paper is derived from the recursive feasibility so that the OCPs are solved only when an error between the predictive and the actual states exceeds a certain threshold. Based on the event-triggered condition, we analyze the stability of the closed-loop system and show that the finite-time convergence to the terminal set is achieved as the uncertainty of the GP model becomes smaller. Moreover, in order to reduce the uncertainty of the GP model and increase efficiency to find the optimal solution, we provide an overall learning-based event-triggered MPC algorithm based on an iterative task. Finally, we demonstrate the proposed approach through a tracking control problem.

preprint2021arXiv

Reinforcement Learning with External Knowledge by using Logical Neural Networks

Conventional deep reinforcement learning methods are sample-inefficient and usually require a large number of training trials before convergence. Since such methods operate on an unconstrained action set, they can lead to useless actions. A recent neuro-symbolic framework called the Logical Neural Networks (LNNs) can simultaneously provide key-properties of both neural networks and symbolic logic. The LNNs functions as an end-to-end differentiable network that minimizes a novel contradiction loss to learn interpretable rules. In this paper, we utilize LNNs to define an inference graph using basic logical operations, such as AND and NOT, for faster convergence in reinforcement learning. Specifically, we propose an integrated method that enables model-free reinforcement learning from external knowledge sources in an LNNs-based logical constrained framework such as action shielding and guide. Our results empirically demonstrate that our method converges faster compared to a model-free reinforcement learning method that doesn't have such logical constraints.

preprint2020arXiv

Safe Reinforcement Learning in Constrained Markov Decision Processes

Safe reinforcement learning has been a promising approach for optimizing the policy of an agent that operates in safety-critical applications. In this paper, we propose an algorithm, SNO-MDP, that explores and optimizes Markov decision processes under unknown safety constraints. Specifically, we take a stepwise approach for optimizing safety and cumulative reward. In our method, the agent first learns safety constraints by expanding the safe region, and then optimizes the cumulative reward in the certified safe region. We provide theoretical guarantees on both the satisfaction of the safety constraint and the near-optimality of the cumulative reward under proper regularity assumptions. In our experiments, we demonstrate the effectiveness of SNO-MDP through two experiments: one uses a synthetic data in a new, openly-available environment named GP-SAFETY-GYM, and the other simulates Mars surface exploration by using real observation data.