Researcher profile

Zhenjie Ren

Zhenjie Ren contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

Continuous-time q-learning for mean-field control with common noise, part-I: Theoretical foundations

This paper investigates the continuous-time counterpart of the Q-function for entropy-regularized mean-field control (MFC) with controlled common noise, coined as q-function by Jia and Zhou (2023) in the single agent's model. We first show that, under discretely sampled actions, the value function in the exploratory formulation converges to the one in the relaxed control formulation as the time grid refines. Leveraging the relaxed control formulation, we derive the exploratory Hamilton-Jacobi-Bellman (HJB) equation, in which the controlled common noise gives rise to an additional nonlinear functional of policy, rendering the policy iteration intricate. Under certain concavity condition, we establish the existence and uniqueness of the optimal one-step policy iteration via a first-order condition using the partial linear functional derivative with respect to policy. The policy improvement at each iteration is verified by relating to an entropy-regularized optimization problem over the space of policies. In the mean-field setting, we introduce the integrated q-function (Iq-function) defined on the state distribution and the policy, and it is shown that an optimal policy is identified as a two-layer fixed point to the argmax operator of the Iq-function. Finally, we provide the explicit characterization of an optimal policy as a Gaussian distribution in the general linear-quadratic (LQ) setting.

preprint2026arXiv

Continuous-time q-learning for mean-field control with common noise, part-II: q-learning algorithms

This paper is a continuation work of Ren et al. (2026) aiming to further devise q-learning algorithms for mean-field control (MFC) with controlled common noise. Based on the relaxed control formulation, we first establish the martingale condition of the value function and the Iq-function by evaluating along the conditional state distributions generated by all test policies. As the data in the relaxed control formulation are not observable in practice, we quantify the error incurred when they are replaced by the observable ones in the exploratory formulation under discretely sampled actions. This, together with a two-layer fixed point characterization of an optimal policy in Ren et al. (2026), allows us to propose several algorithms including the Actor-Critic q-learning algorithm, in which the policy is updated in the Actor-step based on the iteration rule induced by the improved Iq-function, and the value function and Iq-function are updated in the Critic-step based on the martingale orthogonality condition using the data from the exploratory formulation. We also establish the convergence of the inner iterations in the Actor-step in an infinite-horizon linear quadratic (LQ) framework. In two examples, within and beyond LQ framework, our q-learning algorithms are implemented with satisfactory performance.

preprint2026arXiv

Discrete Flow Matching: Convergence Guarantees Under Minimal Assumptions

Flow Matching has recently emerged as a popular class of generative models for simulating a target distribution $μ_1$ from samples drawn from a source distribution $μ_0$. This framework relies on a fixed coupling between $μ_0$ and $μ_1$, and on a deterministic or stochastic bridge to define an interpolating process between the two distributions. The time marginals of this process can then be approximately sampled by estimating the transition rates, or more generally the generator, of its Markovian projection. This framework has recently been extended to the case of discrete source and target distributions, under the name Discrete Flow Matching (DFM). However, theoretical guarantees for such models remain scarce. In this paper, we study two DFM models on $\mathbb{Z}_m^d = \{0,\ldots,m-1\}^d$, sampled through time discretization, and derive non-asymptotic associated bounds for both of them. In contrast to previous work, we establish non-asymptotic bounds in Kullback--Leibler divergence for the early-stopped version of the target distribution. We also derive explicit convergence guarantees in total variation distance with respect to the true target distribution. Importantly, these bounds rely only on an approximation error assumption, relaxing standard score assumptions used in earlier works, while also yielding improved dependence on the vocabulary size $m$ and the dimension $d$.

preprint2026arXiv

Quantitative weak propagation of chaos for McKean--Vlasov branching diffusion processes

We study in this paper the weak propagation of chaos for McKean--Vlasov diffusions with branching, whose induced marginal measures are nonnegative finite measures but not necessary probability measures. The flow of marginal measures satisfies a non-linear Fokker--Planck equation, along which we provide a functional Itô's formula. We then consider a functional of the terminal marginal measure of the branching process, whose conditional value is solution to a Kolmogorov backward master equation. By using Itô's formula and based on the estimates of second-order linear and intrinsic functional derivatives of the value function, we finally derive a quantitative weak convergence rate for the empirical measures of the branching diffusion processes with finite population.

preprint2022arXiv

Entropic turnpike estimates for the kinetic Schrödinger problem

We investigate the kinetic Schrödinger problem, obtained considering Langevin dynamics instead of Brownian motion in Schrödinger's thought experiment. Under a quasilinearity assumption we establish exponential entropic turnpike estimates for the corresponding Schrödinger bridges and exponentially fast convergence of the entropic cost to the sum of the marginal entropies in the long-time regime, which provides as a corollary an entropic Talagrand inequality. In order to do so, we profit from recent advances in the understanding of classical Schrödinger bridges and adaptations of Bakry-Émery formalism to the kinetic setting. Our quantitative results are complemented by basic structural results such as dual representation of the entropic cost and the existence of Schrödinger potentials.

preprint2022arXiv

Nonlinear predictable representation and $L^1$-solutions of backward SDEs and second-order backward SDEs

The theory of backward SDEs extends the predictable representation property of Brownian motion to the nonlinear framework, thus providing a path-dependent analog of fully nonlinear parabolic PDEs. In this paper, we consider backward SDEs, their reflected version, and their second-order extension, in the context where the final data and the generator satisfy $L^1$-type of integrability condition. Our main objective is to provide the corresponding existence and uniqueness results for general Lipschitz generators. The uniqueness holds in the so-called Doob class of processes, simultaneously under an appropriate class of measures. We emphasize that the previous literature only deals with backward SDEs, and requires either that the generator is separable in $(y,z)$, see Peng [Pen97], or strictly sublinear in the gradient variable $z$, see [BDHPS03], or that the final data satisfies an $L\ln L$-integrability condition, see [HT18]. We by-pass these conditions by defining $L^1$-integrability under the nonlinear expectation operator induced by the previously mentioned class of measures.

preprint2022arXiv

On path-dependent multidimensional forward-backward SDEs

This paper extends the results of Ma, Wu, Zhang, Zhang [11] to the context of path-dependent multidimensional forward-backward stochastic differential equations (FBSDE). By path-dependent we mean that the coefficients of the forward-backward SDE at time t can depend on the whole path of the forward process up to time t. Such a situation appears when solving path-dependent stochastic control problems by means of variational calculus. At the heart of our analysis is the construction of a decoupling random field on the path space. We first prove the existence and the uniqueness of decoupling field on small time interval. Then by introducing the characteristic BSDE, we show that a global decoupling field can be constructed by patching local solutions together as long as the solution of the characteristic BSDE remains bounded. Finally, we provide a stability result for path-dependent forward-backward SDEs.

preprint2022arXiv

Principal-agent problem with multiple principals

We consider a moral hazard problem with multiple principals in a continuous-time model. The agent can only work exclusively for one principal at a given time, so faces an optimal switching problem. Using a randomized formulation, we manage to represent the agent's value function and his optimal effort by an Itô process. This representation further helps to solve the principals' problem in case we have infinite number of principals in the sense of mean field game. Finally the mean field formulation is justified by an argument of propagation of chaos.

preprint2022arXiv

Random horizon principal-agent problems

We consider a general formulation of the random horizon Principal-Agent problem with a continuous payment and a lump-sum payment at termination. In the European version of the problem, the random horizon is chosen solely by the principal with no other possible action from the agent than exerting effort on the dynamics of the output process. We also consider the American version of the contract, which covers the seminal Sannikov's model, where the agent can also quit by optimally choosing the termination time of the contract. Our main result reduces such non-zero-sum stochastic differential games to appropriate stochastic control problems which may be solved by standard methods of stochastic control theory. This reduction is obtained by following Sannikov's approach, further developed by Cvitanic, Possamai, and Touzi. We first introduce an appropriate class of contracts for which the agent's optimal effort is immediately characterized by the standard verification argument in stochastic control theory. We then show that this class of contracts is dense in an appropriate sense so that the optimization over this restricted family of contracts represents no loss of generality. The result is obtained by using the recent well-posedness result of random horizon second-order backward SDE.

preprint2022arXiv

Second order backward SDE with random terminal time

Backward stochastic differential equations extend the martingale representation theorem to the nonlinear setting. This can be seen as path-dependent counterpart of the extension from the heat equation to fully nonlinear parabolic equations in the Markov setting. This paper extends such a nonlinear representation to the context where the random variable of interest is measurable with respect to the information at a finite stopping time. We provide a complete wellposedness theory which covers the semilinear case (backward SDE), the semilinear case with obstacle (reflected backward SDE), and the fully nonlinear case (second order backward SDE).

preprint2020arXiv

Game on Random Environment, Mean-field Langevin System and Neural Networks

In this paper we study a type of games regularized by the relative entropy, where the players' strategies are coupled through a random environment variable. Besides the existence and the uniqueness of equilibria of such games, we prove that the marginal laws of the corresponding mean-field Langevin systems can converge towards the games' equilibria in different settings. As applications, the dynamic games can be treated as games on a random environment when one treats the time horizon as the environment. In practice, our results can be applied to analysing the stochastic gradient descent algorithm for deep neural networks in the context of supervised learning as well as for the generative adversarial networks.