Researcher profile

Taisuke Kobayashi

Taisuke Kobayashi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

Real-time Sampling-based Model Predictive Control based on Reverse Kullback-Leibler Divergence and Its Adaptive Acceleration

Sampling-based model predictive control (MPC) has the potential for use in a wide variety of robotic systems. However, its unstable updates and poor convergence render it unsuitable for real-time control of robotic systems. This study addresses this challenge with a novel approach from reverse Kullback-Leibler divergence, which has a mode-seeking property and is likely to find one of the locally optimal solutions early. Using this approach, a weighted maximum likelihood estimation with positive and negative weights is obtained and solved using the mirror descent (MD) algorithm. Negative weights eliminate unnecessary actions, but a practical implementation needs to be designed to avoid interference with positive and negative updates based on rejection sampling. In addition, Nesterov's acceleration method for the proposed MD is modified to improve heuristic step size adaptive to the noise estimated in update amounts. Real-time simulations show that the proposed method can solve a wider variety of tasks statistically than the conventional method. In addition, higher degrees-of-freedom tasks can be solved by the improved acceleration even with a CPU only. The real-world applicability of the proposed method is also demonstrated by optimizing the operability in a variable impedance control of a force-driven mobile robot. https://youtu.be/D8bFMzct1XM

preprint2022arXiv

Adaptive and Multiple Time-scale Eligibility Traces for Online Deep Reinforcement Learning

Deep reinforcement learning (DRL) is one promising approach to teaching robots to perform complex tasks. Because methods that directly reuse the stored experience data cannot follow the change of the environment in robotic problems with a time-varying environment, online DRL is required. The eligibility traces method is well known as an online learning technique for improving sample efficiency in traditional reinforcement learning with linear regressors rather than DRL. The dependency between parameters of deep neural networks would destroy the eligibility traces, which is why they are not integrated with DRL. Although replacing the gradient with the most influential one rather than accumulating the gradients as the eligibility traces can alleviate this problem, the replacing operation reduces the number of reuses of previous experiences. To address these issues, this study proposes a new eligibility traces method that can be used even in DRL while maintaining high sample efficiency. When the accumulated gradients differ from those computed using the latest parameters, the proposed method takes into account the divergence between the past and latest parameters to adaptively decay the eligibility traces. Bregman divergences between outputs computed by the past and latest parameters are exploited due to the infeasible computational cost of the divergence between the past and latest parameters. In addition, a generalized method with multiple time-scale traces is designed for the first time. This design allows for the replacement of the most influential adaptively accumulated (decayed) eligibility traces.

preprint2022arXiv

Artificial Perception Meets Psychophysics, Revealing a Fundamental Law of Illusory Motion

Rotating Snakes is a visual illusion in which a stationary design is perceived to move dramatically. In the current study, the mechanism that generates perception of motion was analyzed using a combination of psychophysics experiments and deep neural network models that mimic human vision. We prepared three- and four-color illusion-like designs with a wide range of luminance and measured their strength of induced rotational motion. As a result, we discovered the fundamental law that the effect of the four-color snake rotation illusion was successfully enhanced by the combination of two perceptual motion vectors produced by the two three-color designs. In years to come, deep neural network technology will be one of the most effective tools not only for engineering applications but also for human perception research.

preprint2022arXiv

Motion Illusion-like Patterns Extracted from Photo and Art Images Using Predictive Deep Neural Networks

In our previous study, we successfully reproduced the illusory motion of the rotating snakes illusion using deep neural networks incorporating predictive coding theory. In the present study, we further examined the properties of the networks using a set of 1500 images, including ordinary static images of paintings and photographs and images of various types of motion illusions. Results showed that the networks clearly classified illusory images and others and reproduced illusory motions against various types of illusions similar to human perception. Notably, the networks occasionally detected anomalous motion vectors, even in ordinally static images where humans were unable to perceive any illusory motion. Additionally, illusion-like designs with repeating patterns were generated using areas where anomalous vectors were detected, and psychophysical experiments were conducted, in which illusory motion perception in the generated designs was detected. The observed inaccuracy of the networks will provide useful information for further understanding information processing associated with human vision.

preprint2022arXiv

Optimistic Reinforcement Learning by Forward Kullback-Leibler Divergence Optimization

This paper addresses a new interpretation of the traditional optimization method in reinforcement learning (RL) as optimization problems using reverse Kullback-Leibler (KL) divergence, and derives a new optimization method using forward KL divergence, instead of reverse KL divergence in the optimization problems. Although RL originally aims to maximize return indirectly through optimization of policy, the recent work by Levine has proposed a different derivation process with explicit consideration of optimality as stochastic variable. This paper follows this concept and formulates the traditional learning laws for both value function and policy as the optimization problems with reverse KL divergence including optimality. Focusing on the asymmetry of KL divergence, the new optimization problems with forward KL divergence are derived. Remarkably, such new optimization problems can be regarded as optimistic RL. That optimism is intuitively specified by a hyperparameter converted from an uncertainty parameter. In addition, it can be enhanced when it is integrated with prioritized experience replay and eligibility traces, both of which accelerate learning. The effects of this expected optimism was investigated through learning tendencies on numerical simulations using Pybullet. As a result, moderate optimism accelerated learning and yielded higher rewards. In a realistic robotic simulation, the proposed method with the moderate optimism outperformed one of the state-of-the-art RL method.

preprint2021arXiv

Deep unfolding-based output feedback control design for linear systems with input saturation

In this paper, we propose a deep unfolding-based framework for the output feedback control of systems with input saturation. Although saturation commonly arises in several practical control systems, there is still a scarce of effective design methodologies that can directly deal with the severe non-linearity of the saturation operator. In this paper, we aim to design an anti-windup controller for enlarging the region of stability of the closed-loop system by learning from the numerical simulations of the closed-loop system. The data-driven framework we propose in this paper is based on a deep-learning technique called Neural Ordinary Differential Equations. Within our framework, we first obtain a candidate controller by using the deep-learning technique, which is then tested by the existing theoretical results already established in the literature, thereby avoiding the computational challenge in the conventional design methodologies as well as theoretically guaranteeing the performance of the system. Our numerical simulation shows that the proposed framework can significantly outperform a conventional design methodology based on linear matrix inequalities.

preprint2021arXiv

Latent Representation in Human-Robot Interaction with Explicit Consideration of Periodic Dynamics

This paper presents a new data-driven framework for analyzing periodic physical human-robot interaction (pHRI) in latent state space. To elaborate human understanding and/or robot control during pHRI, the model representing pHRI is critical. Recent developments of deep learning technologies would enable us to learn such a model from a dataset collected from the actual pHRI. Our framework is developed based on variational recurrent neural network (VRNN), which can inherently handle time-series data like one pHRI generates. This paper modifies VRNN in order to include the latent dynamics from robot to human explicitly. In addition, to analyze periodic motions like walking, we integrate a new recurrent network based on reservoir computing (RC), which has random and fixed connections between numerous neurons, with VRNN. By augmenting RC into complex domain, periodic behavior can be represented as the phase rotation in complex domain without decaying the amplitude. For verification of the proposed framework, a rope-rotation/swinging experiment was analyzed. The proposed framework, trained on the dataset collected from the experiment, achieved the latent state space where the differences in periodic motions can be distinguished. Such a well-distinguished space yielded the best prediction accuracy of the human observations and the robot actions. The attached video can be seen in youtube: https://youtu.be/umn0MVcIpsY

preprint2021arXiv

Optimization Algorithm for Feedback and Feedforward Policies towards Robot Control Robust to Sensing Failures

Model-free or learning-based control, in particular, reinforcement learning (RL), is expected to be applied for complex robotic tasks. Traditional RL requires a policy to be optimized is state-dependent, that means, the policy is a kind of feedback (FB) controllers. Due to the necessity of correct state observation in such a FB controller, it is sensitive to sensing failures. To alleviate this drawback of the FB controllers, feedback error learning integrates one of them with a feedforward (FF) controller. RL can be improved by dealing with the FB/FF policies, but to the best of our knowledge, a methodology for learning them in a unified manner has not been developed. In this paper, we propose a new optimization problem for optimizing both the FB/FF policies simultaneously. Inspired by control as inference, the optimization problem considers minimization/maximization of divergences between trajectory, predicted by the composed policy and a stochastic dynamics model, and optimal/non-optimal trajectories. By approximating the stochastic dynamics model using variational method, we naturally derive a regularization between the FB/FF policies. In numerical simulations and a robot experiment, we verified that the proposed method can stably optimize the composed policy even with the different learning law from the traditional RL. In addition, we demonstrated that the FF policy is robust to the sensing failures and can hold the optimal motion. Attached video is also uploaded on youtube: https://youtu.be/zLL4uXIRmrE

preprint2020arXiv

t-Soft Update of Target Network for Deep Reinforcement Learning

This paper proposes a new robust update rule of target network for deep reinforcement learning (DRL), to replace the conventional update rule, given as an exponential moving average. The target network is for smoothly generating the reference signals for a main network in DRL, thereby reducing learning variance. The problem with its conventional update rule is the fact that all the parameters are smoothly copied with the same speed from the main network, even when some of them are trying to update toward the wrong directions. This behavior increases the risk of generating the wrong reference signals. Although slowing down the overall update speed is a naive way to mitigate wrong updates, it would decrease learning speed. To robustly update the parameters while keeping learning speed, a t-soft update method, which is inspired by student-t distribution, is derived with reference to the analogy between the exponential moving average and the normal distribution. Through the analysis of the derived t-soft update, we show that it takes over the properties of the student-t distribution. Specifically, with a heavy-tailed property of the student-t distribution, the t-soft update automatically excludes extreme updates that differ from past experiences. In addition, when the updates are similar to the past experiences, it can mitigate the learning delay by increasing the amount of updates. In PyBullet robotics simulations for DRL, an online actor-critic algorithm with the t-soft update outperformed the conventional methods in terms of the obtained return and/or its variance. From the training process by the t-soft update, we found that the t-soft update is globally consistent with the standard soft update, and the update rates are locally adjusted for acceleration or suppression.

preprint2020arXiv

TAdam: A Robust Stochastic Gradient Optimizer

Machine learning algorithms aim to find patterns from observations, which may include some noise, especially in robotics domain. To perform well even with such noise, we expect them to be able to detect outliers and discard them when needed. We therefore propose a new stochastic gradient optimization method, whose robustness is directly built in the algorithm, using the robust student-t distribution as its core idea. Adam, the popular optimization method, is modified with our method and the resultant optimizer, so-called TAdam, is shown to effectively outperform Adam in terms of robustness against noise on diverse task, ranging from regression and classification to reinforcement learning problems. The implementation of our algorithm can be found at https://github.com/Mahoumaru/TAdam.git