Source author record

Piotr Miłoś

Piotr Miłoś appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.PR math-ph math.MP Artificial Intelligence cond-mat.stat-mech Machine Learning

Catalog footprint

What is connected

15works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

When Does Non-Uniform Replay Matter in Reinforcement Learning?

Modern off-policy reinforcement learning algorithms often rely on simple uniform replay sampling and it remains unclear when and why non-uniform replay improves over this strong baseline. Across diverse RL settings, we show that the effectiveness of non-uniform replay is governed by three factors: replay volume, the number of replayed transitions per environment step; expected recency, how recent sampled transitions are; and the entropy of the replay sampling distribution. Our main contribution is clarifying when non-uniform replay is beneficial and providing practical guidance for replay design in modern off-policy RL. Namely, we find that non-uniform replay is most beneficial when replay volume is low, and that high-entropy sampling is important even at comparable expected recency. Motivated by these findings, we adopt a simple Truncated Geometric replay that biases sampling toward recent experience while preserving high entropy and incurring negligible computational overhead. Across large-scale parallel simulation, single-task, and multi-task settings, including three modern algorithms evaluated on five RL benchmark suites, this replay sampling strategy improves sample efficiency in low-volume regimes while remaining competitive when replay volume is high.

preprint2022arXiv

Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers

In theorem proving, the task of selecting useful premises from a large library to unlock the proof of a given conjecture is crucially important. This presents a challenge for all theorem provers, especially the ones based on language models, due to their relative inability to reason over huge volumes of premises in text form. This paper introduces Thor, a framework integrating language models and automated theorem provers to overcome this difficulty. In Thor, a class of methods called hammers that leverage the power of automated theorem provers are used for premise selection, while all other tasks are designated to language models. Thor increases a language model's success rate on the PISA dataset from $39\%$ to $57\%$, while solving $8.2\%$ of problems neither language models nor automated theorem provers are able to solve on their own. Furthermore, with a significantly smaller computational budget, Thor can achieve a success rate on the MiniF2F dataset that is on par with the best existing methods. Thor can be instantiated for the majority of popular interactive theorem provers via a straightforward protocol we provide.

preprint2020arXiv

Uncertainty-sensitive Learning and Planning with Ensembles

We propose a reinforcement learning framework for discrete environments in which an agent makes both strategic and tactical decisions. The former manifests itself through the use of value function, while the latter is powered by a tree search planner. These tools complement each other. The planning module performs a local \textit{what-if} analysis, which allows to avoid tactical pitfalls and boost backups of the value function. The value function, being global in nature, compensates for inherent locality of the planner. In order to further solidify this synergy, we introduce an exploration mechanism with two distinctive components: uncertainty modelling and risk measurement. To model the uncertainty we use value function ensembles, and to reflect risk we use propose several functionals that summarize the implied by the ensemble. We show that our method performs well on hard exploration environments: Deep-sea, toy Montezuma's Revenge, and Sokoban. In all the cases, we obtain speed-up in learning and boost in performance.

preprint2016arXiv

Branching Brownian motion with absorption and the all-time minimum of branching Brownian motion with drift

We study a dyadic branching Brownian motion on the real line with absorption at 0, drift $μ\in \mathbb{R}$ and started from a single particle at position $x>0.$ When $μ$ is large enough so that the process has a positive probability of survival, we consider $K(t),$ the number of individuals absorbed at 0 by time $t$ and for $s\ge 0$ the functions $ω_s(x):= \mathbb{E}^x[s^{K(\infty)}].$ We show that $ω_s<\infty$ if and only of $s\in[0,s_0]$ for some $s_0>1$ and we study the properties of these functions. Furthermore, for $s=0, ω(x) := ω_0(x) =\mathbb{P}^x(K(\infty)=0)$ is the cumulative distribution function of the all time minimum of the branching Brownian motion with drift started at 0 without absorption. We give three descriptions of the family $ω_s, s\in [0,s_0]$ through a single pair of functions, as the two extremal solutions of the Kolmogorov-Petrovskii-Piskunov (KPP) traveling wave equation on the half-line, through a martingale representation and as an explicit series expansion. We also obtain a precise result concerning the tail behavior of $K(\infty)$. In addition, in the regime where $K(\infty)>0$ almost surely, we show that $u(x,t) := \mathbb{P}^x(K(t)=0)$ suitably centered converges to the KPP critical travelling wave on the whole real line.

preprint2016arXiv

The random interchange process on the hypercube

We prove the occurrence of a phase transition accompanied by the emergence of cycles of diverging lengths in the random interchange process on the hypercube.

preprint2015arXiv

Delocalization of two-dimensional random surfaces with hard-core constraints

We study the fluctuations of random surfaces on a two-dimensional discrete torus. The random surfaces we consider are defined via a nearest-neighbor pair potential which we require to be twice continuously differentiable on a (possibly infinite) interval and infinity outside of this interval. No convexity assumption is made and we include the case of the so-called hammock potential, when the random surface is uniformly chosen from the set of all surfaces satisfying a Lipschitz constraint. Our main result is that these surfaces delocalize, having fluctuations whose variance is at least of order $\log n$, where $n$ is the side length of the torus. We also show that the expected maximum of such surfaces is of order at least $\log n$. The main tool in our analysis is an adaptation to the lattice setting of an algorithm of Richthammer, who developed a variant of a Mermin-Wagner-type argument applicable to hard-core constraints. We rely also on the reflection positivity of the random surface model. The result answers a question mentioned by Brascamp, Lieb and Lebowitz 1975 on the hammock potential and a question of Velenik 2006.

preprint2013arXiv

A note on the discrete Gaussian Free Field with disordered pinning on Z^d, d\geq 2

We study the discrete massless Gaussian Free Field on $\Z^d$, $d\geq2$, in the presence of a disordered square-well potential supported on a finite strip around zero. The disorder is introduced by reward/penalty interaction coefficients, which are given by i.i.d. random variables. Under minimal assumptions on the law of the environment, we prove that the quenched free energy associated to this model exists in $\R^+$, is deterministic, and strictly smaller than the annealed free energy whenever the latter is strictly positive.

preprint2013arXiv

A second note on the discrete Gaussian Free Field with disordered pinning on Z^d, d\geq 2

We study the discrete massless Gaussian Free Field on Z^d, d \geq 2, in the presence of a disordered square-well potential supported on a finite strip around zero. The disorder is introduced by reward/penalty interaction coefficients, which are given by i.i.d. random variables. In the previous note, we proved under minimal assumptions on the law of the environment, that the quenched free energy associated to this model exists in R^+, is deterministic, and strictly smaller than the annealed free energy whenever the latter is strictly positive. Here we consider Bernoulli reward/penalty coefficients b e_x + h with P(e_x=-1)=P(e_x=+1)=1/2 for all x in Z^d, and b > 0, h in R. We prove that in the plane (b,h), the quenched critical line (separating the phases of positive and zero free energy) lies strictly below the line h = 0, showing in particular that there exists a non trivial region where the field is localized though repulsed on average by the environment.

preprint2013arXiv

Exact representation of truncated variation of Brownian motion

In the recent papers [Lochowski:2011fk, Lochowski:2013yq, Lochowski:2013lr] the truncated variation has been introduced, characterized and studied in various stochastic settings. In this note we uncover an intimate link to the Skorokhod problem. Further, we exploit it to give an explicit representation of the truncated variation of a Brownian motion. More precisely, we prove that the inverse of this process is, up to a minor time shift, a Lévy subordinator with the exponent \sqrt{2q}\tanh(c\sqrt{q/2}) . This also gives a representation of a solution of the two-sided Skorokhod problem for a Brownian motion.

preprint2012arXiv

On limit distributions of normalized truncated variation, upward truncated variation and downward truncated variation processes

In the paper we introduce the truncated variation, upward truncated variation and downward truncated variation. These are closely related to the total variation but are well-defined even if the latter is infinite. Our aim is to explore their feasibility to studies of stochastic processes. We concentrate on a Brownian motion with drift for which we prove the convergence of the above- mentioned quantities. For example, we study the truncated variation when the truncation parameter c tends to 0. We prove in this case that for "small" c's it is well-approximated by a deterministic process. Moreover we prove that error in this approximation converges weakly (in functional sense) to a Brownian motion. We prove also similar result for truncated variation processes when time parameter is rescaled to infinity. We stress that our methodology is robust. A key to the proofs was a decomposition of the truncated variation (see Lemmas 11 and 12). It can be used for studies of any continuous processes. Some additional results like an analog of the Anscombe-Donsker theorem and the Laplace transform of time to given drawdown by c (and analogously drawup till time) are presented.

preprint2012arXiv

On truncated variation, upward truncated variation and downward truncated variation for diffusions

The truncated variation, $TV^c$, is a fairly new concept introduced in [5]. Roughly speaking, given a càdlàg function $f$, its truncated variation is "the total variation which does not pay attention to small changes of $f$, below some threshold $c>0$". The very basic consequence of such approach is that contrary to the total variation, $TV^c$ is always finite. This is appealing to the stochastic analysis where so-far large classes of processes, like semimartingales or diffusions, could not be studied with the total variation. Recently in [6], another characterization of $TV^c$ was found. Namely $TV^c$ is the smallest total variation of a function which approximates $f$ uniformly with accuracy $c/2$. Due to these properties we envisage that $TV^c$ might be a useful concept to the theory of processes. For this reason we determine some properties of $TV^c$ for some well-known processes. In course of our research we discover intimate connections with already known concepts of the stochastic processes theory. Firstly, for semimartingales we proved that $TV^c$ is of order $c^{-1}$ and the normalized truncated variation converges almost surely to the quadratic variation of the semimartingale as $c\searrow0$. Secondly, we studied the rate of this convergence. As this task was much more demanding we narrowed to the class of diffusions (with some mild additional assumptions). We obtained the weak convergence to a so-called Ocone martingale. These results can be viewed as some kind of large numbers theorem and the corresponding central limit theorem. All the results above were obtained in a functional setting, viz. we worked with processes describing the growth of the truncated variation in time. Moreover, in the same respect we also treated two closely related quantities - the so-called upward truncated variation and downward truncated variation.

preprint2012arXiv

Spatial CLT for the supercritical Ornstein-Uhlenbeck superprocess

In this paper we consider a superprocess being a measure-valued diffusion corresponding to the equation $u_{t}=Lu+αu-βu^{2}$, where $L$ is the infinitesimal operator of the \emph{Ornstein-Uhlenbeck process} and $β>0,\:α>0$. The latter condition implies that the process is \emph{supercritical,} i.e. its total mass grows exponentially. This system is known to fulfill a law of large numbers. In the paper we prove the corresponding \emph{central limit theorem}. The limit and the CLT normalization fall into three qualitatively different classes. In what we call the small growth rate case the situation resembles the classical CLT. The weak limit is Gaussian and the normalization is the square root of the size of the system. In the critical case the limit is still Gaussian, however the normalization requires an additional term. Finally, when the growth rate is large the situation is completely different. The limit is no longer Gaussian, the normalization is substantially larger than the classical one and the convergence holds in probability. These different regimes arise as a result of "competition" between spatial smoothing due to the particles' movement and the system's growth which is local.

preprint2011arXiv

CLT for Ornstein-Uhlenbeck branching particle system

In this paper we consider a branching particle system consisting of particles moving according to the Ornstein-Uhlenbeck process in $\Rd$ and undergoing a binary, supercritical branching with a constant rate $λ>0$. This system is known to fulfil a law of large numbers (under exponential scaling). In the paper we prove the corresponding central limit theorem. The limit and the CLT normalisation fall into three qualitatively different classes. In, what we call, the small branching rate case the situation resembles the classical one. The weak limit is Gaussian and normalisation is the square root of the size of the system. In the critical case the limit is still Gaussian, however the normalisation requires an additional term. Finally, when branching has large rate the situation is completely different. The limit is no longer Gaussian, the normalisation is substantially larger than the classical one and the convergence holds in probability. We prove also that the spatial fluctuations are asymptotically independent of the fluctuations of the total number of particles (which is a Galton-Watson process).

preprint2011arXiv

CLT for U-statistics of Ornstein-Uhlenbeck branching particle system with small branching rate

In this paper we consider a branching particle system consisting of particles moving according to the Ornstein-Uhlenbeck process in R^d and undergoing a binary, supercritical branching with a constant rate λ>0. This system is known to fulfil a law of large numbers (under exponential scaling). In the paper we prove the corresponding central limit theorem. Moreover, in the second part of the paper we consider U-statistics of the system, for which, under mild assumptions, we prove a law of large numbers and a central limit theorem. The limits are expressed in terms of multiple stochastic integrals with respect to a random Gaussian measure. The second order behaviour depends qualitatively on the growth rate of the system. In this paper we concentrate on the case when the growth rate is relatively small comparing to smoothing properties of particles' movement.

preprint2011arXiv

U-statistics of Ornstein-Uhlenbeck branching particle system

We consider a branching particle system consisting of particles moving according to the Ornstein-Uhlenbeck process in $\Rd$ and undergoing a binary, supercritical branching with a constant rate $λ>0$. This system is known to fulfil a law of large numbers (under exponential scaling). Recently the question of the corresponding central limit theorem has been addressed. It turns out that the normalization and form of the limit in the CLT fall into three qualitatively different regimes, depending on the relation between the branching intensity and the parameters of the Orstein-Uhlenbeck process. In the present paper we extend those results to $U$-statistics of the system proving a law of large numbers and a central limit theorem.

Piotr Miłoś

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

When Does Non-Uniform Replay Matter in Reinforcement Learning?

Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers

Uncertainty-sensitive Learning and Planning with Ensembles

Branching Brownian motion with absorption and the all-time minimum of branching Brownian motion with drift

The random interchange process on the hypercube

Delocalization of two-dimensional random surfaces with hard-core constraints

A note on the discrete Gaussian Free Field with disordered pinning on Z^d, d\geq 2

A second note on the discrete Gaussian Free Field with disordered pinning on Z^d, d\geq 2

Exact representation of truncated variation of Brownian motion

On limit distributions of normalized truncated variation, upward truncated variation and downward truncated variation processes

On truncated variation, upward truncated variation and downward truncated variation for diffusions

Spatial CLT for the supercritical Ornstein-Uhlenbeck superprocess

CLT for Ornstein-Uhlenbeck branching particle system

CLT for U-statistics of Ornstein-Uhlenbeck branching particle system with small branching rate

U-statistics of Ornstein-Uhlenbeck branching particle system