Source author record

Takayuki Osogami

Takayuki Osogami appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Science and Game Theory Neural and Evolutionary Computing Artificial Intelligence q-fin.RM

Catalog footprint

What is connected

9works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Distorted Distributional Policy Evaluation for Offline Reinforcement Learning

While Distributional Reinforcement Learning (DRL) methods have demonstrated strong performance in online settings, its success in offline scenarios remains limited. We hypothesize that a key limitation of existing offline DRL methods lies in their approach to uniformly underestimate return quantiles. This uniform pessimism can lead to overly conservative value estimates, ultimately hindering generalization and performance. To address this, we introduce a novel concept called quantile distortion, which enables non-uniform pessimism by adjusting the degree of conservatism based on the availability of supporting data. Our approach is grounded in theoretical analysis and empirically validated, demonstrating improved performance over uniform pessimism.

preprint2022arXiv

Biases in In Silico Evaluation of Molecular Optimization Methods and Bias-Reduced Evaluation Methodology

We are interested in in silico evaluation methodology for molecular optimization methods. Given a sample of molecules and their properties of our interest, we wish not only to train an agent that can find molecules optimized with respect to the target property but also to evaluate its performance. A common practice is to train a predictor of the target property on the sample and use it for both training and evaluating the agent. We show that this evaluator potentially suffers from two biases; one is due to misspecification of the predictor and the other to reusing the same sample for training and evaluation. We discuss bias reduction methods for each of the biases comprehensively, and empirically investigate their effectiveness.

preprint2022arXiv

Mechanism Learning for Trading Networks

We study the problem of designing mechanisms for trading networks that satisfy four desired properties: dominant-strategy incentive compatibility, efficiency, weak budget balance (WBB), and individual rationality (IR). Although there exist mechanisms that simultaneously satisfy these properties ex post for combinatorial auctions, we prove the impossibility that such mechanisms do not exist for a broad class of trading networks. We thus propose approaches for computing and learning the mechanisms that satisfy the four properties, in a Bayesian setting, where WBB and IR, respectively, are relaxed to ex ante and interim. For computational and sample efficiency, we introduce several techniques, including game theoretical analysis to reduce the input feature space. We empirically demonstrate that the proposed approaches successfully find the mechanisms with the four properties for those trading networks where the impossibility holds ex post.

preprint2022arXiv

Online Learning in Supply-Chain Games

We study a repeated game between a supplier and a retailer who want to maximize their respective profits without full knowledge of the problem parameters. After characterizing the uniqueness of the Stackelberg equilibrium of the stage game with complete information, we show that even with partial knowledge of the joint distribution of demand and production costs, natural learning dynamics guarantee convergence of the joint strategy profile of supplier and retailer to the Stackelberg equilibrium of the stage game. We also prove finite-time bounds on the supplier's regret and asymptotic bounds on the retailer's regret, where the specific rates depend on the type of knowledge preliminarily available to the players. In the special case when the supplier is not strategic (vertical integration), we prove optimal finite-time regret bounds on the retailer's regret (or, equivalently, the social welfare) when costs and demand are adversarially generated and the demand is censored.

preprint2021arXiv

Proofs and additional experiments on Second order techniques for learning time-series with structural breaks

We provide complete proofs of the lemmas about the properties of the regularized loss function that is used in the second order techniques for learning time-series with structural breaks in Osogami (2021). In addition, we show experimental results that support the validity of the techniques.

preprint2016arXiv

Learning binary or real-valued time-series via spike-timing dependent plasticity

A dynamic Boltzmann machine (DyBM) has been proposed as a model of a spiking neural network, and its learning rule of maximizing the log-likelihood of given time-series has been shown to exhibit key properties of spike-timing dependent plasticity (STDP), which had been postulated and experimentally confirmed in the field of neuroscience as a learning rule that refines the Hebbian rule. Here, we relax some of the constraints in the DyBM in a way that it becomes more suitable for computation and learning. We show that learning the DyBM can be considered as logistic regression for binary-valued time-series. We also show how the DyBM can learn real-valued data in the form of a Gaussian DyBM and discuss its relation to the vector autoregressive (VAR) model. The Gaussian DyBM extends the VAR by using additional explanatory variables, which correspond to the eligibility traces of the DyBM and capture long term dependency of the time-series. Numerical experiments show that the Gaussian DyBM significantly improves the predictive accuracy over VAR.

preprint2016arXiv

Regularized Dynamic Boltzmann Machine with Delay Pruning for Unsupervised Learning of Temporal Sequences

We introduce Delay Pruning, a simple yet powerful technique to regularize dynamic Boltzmann machines (DyBM). The recently introduced DyBM provides a particularly structured Boltzmann machine, as a generative model of a multi-dimensional time-series. This Boltzmann machine can have infinitely many layers of units but allows exact inference and learning based on its biologically motivated structure. DyBM uses the idea of conduction delays in the form of fixed length first-in first-out (FIFO) queues, with a neuron connected to another via this FIFO queue, and spikes from a pre-synaptic neuron travel along the queue to the post-synaptic neuron with a constant period of delay. Here, we present Delay Pruning as a mechanism to prune the lengths of the FIFO queues (making them zero) by setting some delay lengths to one with a fixed probability, and finally selecting the best performing model with fixed delays. The uniqueness of structure and a non-sampling based learning rule in DyBM, make the application of previously proposed regularization techniques like Dropout or DropConnect difficult, leading to poor generalization. First, we evaluate the performance of Delay Pruning to let DyBM learn a multidimensional temporal sequence generated by a Markov chain. Finally, we show the effectiveness of delay pruning in learning high dimensional sequences using the moving MNIST dataset, and compare it with Dropout and DropConnect methods.

preprint2015arXiv

Learning dynamic Boltzmann machines with spike-timing dependent plasticity

We propose a particularly structured Boltzmann machine, which we refer to as a dynamic Boltzmann machine (DyBM), as a stochastic model of a multi-dimensional time-series. The DyBM can have infinitely many layers of units but allows exact and efficient inference and learning when its parameters have a proposed structure. This proposed structure is motivated by postulates and observations, from biological neural networks, that the synaptic weight is strengthened or weakened, depending on the timing of spikes (i.e., spike-timing dependent plasticity or STDP). We show that the learning rule of updating the parameters of the DyBM in the direction of maximizing the likelihood of given time-series can be interpreted as STDP with long term potentiation and long term depression. The learning rule has a guarantee of convergence and can be performed in a distributed matter (i.e., local in space) with limited memory (i.e., local in time).

preprint2012arXiv

Iterated risk measures for risk-sensitive Markov decision processes with discounted cost

We demonstrate a limitation of discounted expected utility, a standard approach for representing the preference to risk when future cost is discounted. Specifically, we provide an example of the preference of a decision maker that appears to be rational but cannot be represented with any discounted expected utility. A straightforward modification to discounted expected utility leads to inconsistent decision making over time. We will show that an iterated risk measure can represent the preference that cannot be represented by any discounted expected utility and that the decisions based on the iterated risk measure are consistent over time.

Takayuki Osogami

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Distorted Distributional Policy Evaluation for Offline Reinforcement Learning

Biases in In Silico Evaluation of Molecular Optimization Methods and Bias-Reduced Evaluation Methodology

Mechanism Learning for Trading Networks

Online Learning in Supply-Chain Games

Proofs and additional experiments on Second order techniques for learning time-series with structural breaks

Learning binary or real-valued time-series via spike-timing dependent plasticity

Regularized Dynamic Boltzmann Machine with Delay Pruning for Unsupervised Learning of Temporal Sequences

Learning dynamic Boltzmann machines with spike-timing dependent plasticity

Iterated risk measures for risk-sensitive Markov decision processes with discounted cost