Researcher profile

Denis Belomestny

Denis Belomestny contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2022arXiv

From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses

We propose the Bayes-UCBVI algorithm for reinforcement learning in tabular, stage-dependent, episodic Markov decision process: a natural extension of the Bayes-UCB algorithm by Kaufmann et al. (2012) for multi-armed bandits. Our method uses the quantile of a Q-value function posterior as upper confidence bound on the optimal Q-value function. For Bayes-UCBVI, we prove a regret bound of order $\widetilde{O}(\sqrt{H^3SAT})$ where $H$ is the length of one episode, $S$ is the number of states, $A$ the number of actions, $T$ the number of episodes, that matches the lower-bound of $Ω(\sqrt{H^3SAT})$ up to poly-$\log$ terms in $H,S,A,T$ for a large enough $T$. To the best of our knowledge, this is the first algorithm that obtains an optimal dependence on the horizon $H$ (and $S$) without the need for an involved Bernstein-like bonus or noise. Crucial to our analysis is a new fine-grained anti-concentration bound for a weighted Dirichlet sum that can be of independent interest. We then explain how Bayes-UCBVI can be easily extended beyond the tabular setting, exhibiting a strong link between our algorithm and Bayesian bootstrap (Rubin, 1981).

preprint2022arXiv

Reinforced optimal control

Least squares Monte Carlo methods are a popular numerical approximation method for solving stochastic control problems. Based on dynamic programming, their key feature is the approximation of the conditional expectation of future rewards by linear least squares regression. Hence, the choice of basis functions is crucial for the accuracy of the method. Earlier work by some of us [Belomestny, Schoenmakers, Spokoiny, Zharkynbay. Commun.~Math.~Sci., 18(1):109-121, 2020](arXiv:1808.02341) proposes to reinforce the basis functions in the case of optimal stopping problems by already computed value functions for later times, thereby considerably improving the accuracy with limited additional computational cost. We extend the reinforced regression method to a general class of stochastic control problems, while considerably improving the method's efficiency, as demonstrated by substantial numerical examples as well as theoretical analysis.

preprint2022arXiv

Variance Reduction for Policy-Gradient Methods via Empirical Variance Minimization

Policy-gradient methods in Reinforcement Learning(RL) are very universal and widely applied in practice but their performance suffers from the high variance of the gradient estimate. Several procedures were proposed to reduce it including actor-critic(AC) and advantage actor-critic(A2C) methods. Recently the approaches have got new perspective due to the introduction of Deep RL: both new control variates(CV) and new sub-sampling procedures became available in the setting of complex models like neural networks. The vital part of CV-based methods is the goal functional for the training of the CV, the most popular one is the least-squares criterion of A2C. Despite its practical success, the criterion is not the only one possible. In this paper we for the first time investigate the performance of the one called Empirical Variance(EV). We observe in the experiments that not only EV-criterion performs not worse than A2C but sometimes can be considerably better. Apart from that, we also prove some theoretical guarantees of the actual variance reduction under very general assumptions and show that A2C least-squares goal functional is an upper bound for EV goal. Our experiments indicate that in terms of variance reduction EV-based methods are much better than A2C and allow stronger variance reduction.

preprint2021arXiv

From optimal martingales to randomized dual optimal stopping

In this article we study and classify optimal martingales in the dual formulation of optimal stopping problems. In this respect we distinguish between weakly optimal and surely optimal martingales. It is shown that the family of weakly optimal and surely optimal martingales may be quite large. On the other hand it is shown that the Doob-martingale, that is, the martingale part of the Snell envelope, is in a certain sense the most robust surely optimal martingale under random perturbations. This new insight leads to a novel randomized dual martingale minimization algorithm that doesn't require nested simulation. As a main feature, in a possibly large family of optimal martingales the algorithm efficiently selects a martingale that is as close as possible to the Doob martingale. As a result, one obtains the dual upper bound for the optimal stopping problem with low variance.

preprint2020arXiv

Density deconvolution under general assumptions on the distribution of measurement errors

In this paper we study the problem of density deconvolution under general assumptions on the measurement error distribution. Typically deconvolution estimators are constructed using Fourier transform techniques, and it is assumed that the characteristic function of the measurement errors does not have zeros on the real line. This assumption is rather strong and is not fulfilled in many cases of interest. In this paper we develop a methodology for constructing optimal density deconvolution estimators in the general setting that covers vanishing and non--vanishing characteristic functions of the measurement errors. We derive upper bounds on the risk of the proposed estimators and provide sufficient conditions under which zeros of the corresponding characteristic function have no effect on estimation accuracy. Moreover, we show that the derived conditions are also necessary in some specific problem instances.

preprint2020arXiv

Estimating TVP-VAR models with time invariant long-run multipliers

The main goal of this paper is to develop a methodology for estimating time varying parameter vector auto-regression (TVP-VAR) models with a timeinvariant long-run relationship between endogenous variables and changes in exogenous variables. We propose a Gibbs sampling scheme for estimation of model parameters as well as time-invariant long-run multiplier parameters. Further we demonstrate the applicability of the proposed method by analyzing examples of the Norwegian and Russian economies based on the data on real GDP, real exchange rate and real oil prices. Our results show that incorporating the time invariance constraint on the long-run multipliers in TVP-VAR model helps to significantly improve the forecasting performance.

preprint2019arXiv

Fourier transform MCMC, heavy tailed distributions and geometric ergodicity

Markov Chain Monte Carlo methods become increasingly popular in applied mathematics as a tool for numerical integration with respect to complex and high-dimensional distributions. However, application of MCMC methods to heavy tailed distributions and distributions with analytically intractable densities turns out to be rather problematic. In this paper, we propose a novel approach towards the use of MCMC algorithms for distributions with analytically known Fourier transforms and, in particular, heavy tailed distributions. The main idea of the proposed approach is to use MCMC methods in Fourier domain to sample from a density proportional to the absolute value of the underlying characteristic function. A subsequent application of the Parseval's formula leads to an efficient algorithm for the computation of integrals with respect to the underlying density. We show that the resulting Markov chain in Fourier domain may be geometrically ergodic even in the case of heavy tailed original distributions. We illustrate our approach by several numerical examples including multivariate elliptically contoured stable distributions.

preprint2010arXiv

Spectral estimation of the fractional order of a Lévy process

We consider the problem of estimating the fractional order of a Lévy process from low frequency historical and options data. An estimation methodology is developed which allows us to treat both estimation and calibration problems in a unified way. The corresponding procedure consists of two steps: the estimation of a conditional characteristic function and the weighted least squares estimation of the fractional order in spectral domain. While the second step is identical for both calibration and estimation, the first one depends on the problem at hand. Minimax rates of convergence for the fractional order estimate are derived, the asymptotic normality is proved and a data-driven algorithm based on aggregation is proposed. The performance of the estimator in both estimation and calibration setups is illustrated by a simulation study.