Researcher profile

Xun Yu Zhou

Xun Yu Zhou contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Amortized Guidance for Image Inpainting with Pretrained Diffusion Models

We study image inpainting with generative diffusion models. Existing methods typically either train dedicated task-specific models, or adapt a pretrained diffusion model separately for each masked image at deployment. We introduce a middle-ground model, termed Amortized Inpainting with Diffusion (AID), which keeps a pretrained diffusion backbone fixed, trains a small reusable guidance module offline, and then reuses it across masked images without per-instance optimization. We formulate it as a deterministic guidance problem with a supervised terminal objective. To make this problem learnable in high dimensions, we derive an auxiliary Gaussian formulation and prove that solving this randomized problem recovers the optimal deterministic guidance field. This bridge yields a principled continuous-time actor--critic algorithm for learning the guidance module in a fully data-driven manner. Empirically, on AFHQv2 and FFHQ under the pixel EDM pipeline and on ImageNet under the latent EDM2 pipeline, AID consistently improves the quality--speed trade-off over strong fixed-backbone and amortized inpainting baselines across multiple mask types, while adding less than one percent trainable overhead.

preprint2026arXiv

Continuous-time q-learning for mean-field control with common noise, part-I: Theoretical foundations

This paper investigates the continuous-time counterpart of the Q-function for entropy-regularized mean-field control (MFC) with controlled common noise, coined as q-function by Jia and Zhou (2023) in the single agent's model. We first show that, under discretely sampled actions, the value function in the exploratory formulation converges to the one in the relaxed control formulation as the time grid refines. Leveraging the relaxed control formulation, we derive the exploratory Hamilton-Jacobi-Bellman (HJB) equation, in which the controlled common noise gives rise to an additional nonlinear functional of policy, rendering the policy iteration intricate. Under certain concavity condition, we establish the existence and uniqueness of the optimal one-step policy iteration via a first-order condition using the partial linear functional derivative with respect to policy. The policy improvement at each iteration is verified by relating to an entropy-regularized optimization problem over the space of policies. In the mean-field setting, we introduce the integrated q-function (Iq-function) defined on the state distribution and the policy, and it is shown that an optimal policy is identified as a two-layer fixed point to the argmax operator of the Iq-function. Finally, we provide the explicit characterization of an optimal policy as a Gaussian distribution in the general linear-quadratic (LQ) setting.

preprint2026arXiv

Continuous-time q-learning for mean-field control with common noise, part-II: q-learning algorithms

This paper is a continuation work of Ren et al. (2026) aiming to further devise q-learning algorithms for mean-field control (MFC) with controlled common noise. Based on the relaxed control formulation, we first establish the martingale condition of the value function and the Iq-function by evaluating along the conditional state distributions generated by all test policies. As the data in the relaxed control formulation are not observable in practice, we quantify the error incurred when they are replaced by the observable ones in the exploratory formulation under discretely sampled actions. This, together with a two-layer fixed point characterization of an optimal policy in Ren et al. (2026), allows us to propose several algorithms including the Actor-Critic q-learning algorithm, in which the policy is updated in the Actor-step based on the iteration rule induced by the improved Iq-function, and the value function and Iq-function are updated in the Critic-step based on the martingale orthogonality condition using the data from the exploratory formulation. We also establish the convergence of the inner iterations in the Actor-step in an infinite-horizon linear quadratic (LQ) framework. In two examples, within and beyond LQ framework, our q-learning algorithms are implemented with satisfactory performance.

preprint2026arXiv

Tweedie's Formulae and Diffusion Generative Models Beyond Gaussian

Diffusion models have achieved remarkable success in generating samples from unknown data distributions. Most popular stochastic differential equation-based diffusion models perturb the target distribution by adding Gaussian noise, transforming it into a simple prior, and then use denoising score matching, a consequence of Tweedie's formula, to learn the score function and generate clean samples from noise. However, non-Gaussian diffusion models with state-dependent diffusion coefficient have been largely underexplored, as have the corresponding Tweedie's formulae. In this work, we extend Tweedie's formula to important non-Gaussian processes, including geometric Brownian motion (GBM), squared Bessel (BESQ) processes, and Cox-Ingersoll-Ross (CIR) processes, thereby yielding the corresponding denoising score-matching objectives. We then apply the derived formulae to image and financial time series generation using GBM- and CIR-based diffusion models, and to empirical Bayes estimation under the BESQ setting. The reported experimental results demonstrate the potential of non-Gaussian models.

preprint2022arXiv

$g$-Expectation of Distributions

We define $g$-expectation of a distribution as the infimum of the $g$-expectations of all the terminal random variables sharing that distribution. We present two special cases for nonlinear $g$ where the $g$-expectation of distributions can be explicitly derived. As a related problem, we introduce the notion of law-invariant $g$-expectation and provide its sufficient conditions. Examples of application in financial dynamic portfolio choice are supplied.

preprint2022arXiv

Choquet regularization for reinforcement learning

We propose \emph{Choquet regularizers} to measure and manage the level of exploration for reinforcement learning (RL), and reformulate the continuous-time entropy-regularized RL problem of Wang et al. (2020, JMLR, 21(198)) in which we replace the differential entropy used for regularization with a Choquet regularizer. We derive the Hamilton--Jacobi--Bellman equation of the problem, and solve it explicitly in the linear--quadratic (LQ) case via maximizing statically a mean--variance constrained Choquet regularizer. Under the LQ setting, we derive explicit optimal distributions for several specific Choquet regularizers, and conversely identify the Choquet regularizers that generate a number of broadly used exploratory samplers such as $ε$-greedy, exponential, uniform and Gaussian.

preprint2022arXiv

Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach

We propose a unified framework to study policy evaluation (PE) and the associated temporal difference (TD) methods for reinforcement learning in continuous time and space. We show that PE is equivalent to maintaining the martingale condition of a process. From this perspective, we find that the mean--square TD error approximates the quadratic variation of the martingale and thus is not a suitable objective for PE. We present two methods to use the martingale characterization for designing PE algorithms. The first one minimizes a "martingale loss function", whose solution is proved to be the best approximation of the true value function in the mean--square sense. This method interprets the classical gradient Monte-Carlo algorithm. The second method is based on a system of equations called the "martingale orthogonality conditions" with test functions. Solving these equations in different ways recovers various classical TD algorithms, such as TD($λ$), LSTD, and GTD. Different choices of test functions determine in what sense the resulting solutions approximate the true value function. Moreover, we prove that any convergent time-discretized algorithm converges to its continuous-time counterpart as the mesh size goes to zero, and we provide the convergence rate. We demonstrate the theoretical results and corresponding algorithms with numerical experiments and applications.

preprint2021arXiv

Simulated annealing from continuum to discretization: a convergence analysis via the Eyring--Kramers law

We study the convergence rate of continuous-time simulated annealing $(X_t; \, t \ge 0)$ and its discretization $(x_k; \, k =0,1, \ldots)$ for approximating the global optimum of a given function $f$. We prove that the tail probability $\mathbb{P}(f(X_t) > \min f +δ)$ (resp. $\mathbb{P}(f(x_k) > \min f +δ)$) decays polynomial in time (resp. in cumulative step size), and provide an explicit rate as a function of the model parameters. Our argument applies the recent development on functional inequalities for the Gibbs measure at low temperatures -- the Eyring-Kramers law. In the discrete setting, we obtain a condition on the step size to ensure the convergence.

preprint2021arXiv

When to Quit Gambling, if You Must!

We develop an approach to solve Barberis (2012)'s casino gambling model in which a gambler whose preferences are specified by the cumulative prospect theory (CPT) must decide when to stop gambling by a prescribed deadline. We assume that the gambler can assist their decision using an independent randomization, and explain why it is a reasonable assumption. The problem is inherently time-inconsistent due to the probability weighting in CPT, and we study both precommitted and naive stopping strategies. We turn the original problem into a computationally tractable mathematical program, based on which we derive an optimal precommitted rule which is randomized and Markovian. The analytical treatment enables us to make several predictions regarding a gambler's behavior, including that with randomization they may enter the casino even when allowed to play only once, that whether they will play longer once they are granted more bets depends on whether they are in a gain or at a loss, and that it is prevalent that a naivite never stops loss.

preprint2020arXiv

Consistent Investment of Sophisticated Rank-Dependent Utility Agents in Continuous Time

We study portfolio selection in a complete continuous-time market where the preference is dictated by the rank-dependent utility. As such a model is inherently time inconsistent due to the underlying probability weighting, we study the investment behavior of sophisticated consistent planners who seek (subgame perfect) intra-personal equilibrium strategies. We provide sufficient conditions under which an equilibrium strategy is a replicating portfolio of a final wealth. We derive this final wealth profile explicitly, which turns out to be in the same form as in the classical Merton model with the market price of risk process properly scaled by a deterministic function in time. We present this scaling function explicitly through the solution to a highly nonlinear and singular ordinary differential equation, whose existence of solutions is established. Finally, we give a necessary and sufficient condition for the scaling function to be smaller than 1 corresponding to an effective reduction in risk premium due to probability weighting.

preprint2020arXiv

Variance Contracts

We study the design of an optimal insurance contract in which the insured maximizes her expected utility and the insurer limits the variance of his risk exposure while maintaining the principle of indemnity and charging the premium according to the expected value principle. We derive the optimal policy semi-analytically, which is coinsurance above a deductible when the variance bound is binding. This policy automatically satisfies the incentive-compatible condition, which is crucial to rule out ex post moral hazard. We also find that the deductible is absent if and only if the contract pricing is actuarially fair. Focusing on the actuarially fair case, we carry out comparative statics on the effects of the insured's initial wealth and the variance bound on insurance demand. Our results indicate that the expected coverage is always larger for a wealthier insured, implying that the underlying insurance is a normal good, which supports certain recent empirical findings. Moreover, as the variance constraint tightens, the insured who is prudent cedes less losses, while the insurer is exposed to less tail risk.

preprint2013arXiv

Optimal stopping under probability distortion

We formulate an optimal stopping problem for a geometric Brownian motion where the probability scale is distorted by a general nonlinear function. The problem is inherently time inconsistent due to the Choquet integration involved. We develop a new approach, based on a reformulation of the problem where one optimally chooses the probability distribution or quantile function of the stopped state. An optimal stopping time can then be recovered from the obtained distribution/quantile function, either in a straightforward way for several important cases or in general via the Skorokhod embedding. This approach enables us to solve the problem in a fairly general manner with different shapes of the payoff and probability distortion functions. We also discuss economical interpretations of the results. In particular, we justify several liquidation strategies widely adopted in stock trading, including those of "buy and hold", "cut loss or take profit", "cut loss and let profit run" and "sell on a percentage of historical high".

preprint2009arXiv

Continuous-Time Markowitz's Model with Transaction Costs

A continuous-time Markowitz's mean-variance portfolio selection problem is studied in a market with one stock, one bond, and proportional transaction costs. This is a singular stochastic control problem,inherently in a finite time horizon. With a series of transformations, the problem is turned into a so-called double obstacle problem, a well studied problem in physics and partial differential equation literature, featuring two time-varying free boundaries. The two boundaries, which define the buy, sell, and no-trade regions, are proved to be smooth in time. This in turn characterizes the optimal strategy, via a Skorokhod problem, as one that tries to keep a certain adjusted bond-stock position within the no-trade region. Several features of the optimal strategy are revealed that are remarkably different from its no-transaction-cost counterpart. It is shown that there exists a critical length in time, which is dependent on the stock excess return as well as the transaction fees but independent of the investment target and the stock volatility, so that an expected terminal return may not be achievable if the planning horizon is shorter than that critical length (while in the absence of transaction costs any expected return can be reached in an arbitrary period of time). It is further demonstrated that anyone following the optimal strategy should not buy the stock beyond the point when the time to maturity is shorter than the aforementioned critical length. Moreover, the investor would be less likely to buy the stock and more likely to sell the stock when the maturity date is getting closer. These features, while consistent with the widely accepted investment wisdom, suggest that the planning horizon is an integral part of the investment opportunities.

preprint2007arXiv

A Convex Stochastic Optimization Problem Arising from Portfolio Selection

A continuous-time financial portfolio selection model with expected utility maximization typically boils down to solving a (static) convex stochastic optimization problem in terms of the terminal wealth, with a budget constraint. In literature the latter is solved by assuming {\it a priori} that the problem is well-posed (i.e., the supremum value is finite) and a Lagrange multiplier exists (and as a consequence the optimal solution is attainable). In this paper it is first shown, via various counter-examples, neither of these two assumptions needs to hold, and an optimal solution does not necessarily exist. These anomalies in turn have important interpretations in and impacts on the portfolio selection modeling and solutions. Relations among the non-existence of the Lagrange multiplier, the ill-posedness of the problem, and the non-attainability of an optimal solution are then investigated. Finally, explicit and easily verifiable conditions are derived which lead to finding the unique optimal solution.