Source author record

Xun Yu Zhou

Xun Yu Zhou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC math.PR Machine Learning q-fin.PM q-fin.MF Multiagent Systems q-fin.RM Artificial Intelligence Computer Vision eess.SY math.DS math.NA math.ST Statistics Theory

Catalog footprint

What is connected

17works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Amortized Guidance for Image Inpainting with Pretrained Diffusion Models

We study image inpainting with generative diffusion models. Existing methods typically either train dedicated task-specific models, or adapt a pretrained diffusion model separately for each masked image at deployment. We introduce a middle-ground model, termed Amortized Inpainting with Diffusion (AID), which keeps a pretrained diffusion backbone fixed, trains a small reusable guidance module offline, and then reuses it across masked images without per-instance optimization. We formulate it as a deterministic guidance problem with a supervised terminal objective. To make this problem learnable in high dimensions, we derive an auxiliary Gaussian formulation and prove that solving this randomized problem recovers the optimal deterministic guidance field. This bridge yields a principled continuous-time actor--critic algorithm for learning the guidance module in a fully data-driven manner. Empirically, on AFHQv2 and FFHQ under the pixel EDM pipeline and on ImageNet under the latent EDM2 pipeline, AID consistently improves the quality--speed trade-off over strong fixed-backbone and amortized inpainting baselines across multiple mask types, while adding less than one percent trainable overhead.

preprint2026arXiv

Continuous-time q-learning for mean-field control with common noise, part-I: Theoretical foundations

This paper investigates the continuous-time counterpart of the Q-function for entropy-regularized mean-field control (MFC) with controlled common noise, coined as q-function by Jia and Zhou (2023) in the single agent's model. We first show that, under discretely sampled actions, the value function in the exploratory formulation converges to the one in the relaxed control formulation as the time grid refines. Leveraging the relaxed control formulation, we derive the exploratory Hamilton-Jacobi-Bellman (HJB) equation, in which the controlled common noise gives rise to an additional nonlinear functional of policy, rendering the policy iteration intricate. Under certain concavity condition, we establish the existence and uniqueness of the optimal one-step policy iteration via a first-order condition using the partial linear functional derivative with respect to policy. The policy improvement at each iteration is verified by relating to an entropy-regularized optimization problem over the space of policies. In the mean-field setting, we introduce the integrated q-function (Iq-function) defined on the state distribution and the policy, and it is shown that an optimal policy is identified as a two-layer fixed point to the argmax operator of the Iq-function. Finally, we provide the explicit characterization of an optimal policy as a Gaussian distribution in the general linear-quadratic (LQ) setting.

preprint2026arXiv

Continuous-time q-learning for mean-field control with common noise, part-II: q-learning algorithms

This paper is a continuation work of Ren et al. (2026) aiming to further devise q-learning algorithms for mean-field control (MFC) with controlled common noise. Based on the relaxed control formulation, we first establish the martingale condition of the value function and the Iq-function by evaluating along the conditional state distributions generated by all test policies. As the data in the relaxed control formulation are not observable in practice, we quantify the error incurred when they are replaced by the observable ones in the exploratory formulation under discretely sampled actions. This, together with a two-layer fixed point characterization of an optimal policy in Ren et al. (2026), allows us to propose several algorithms including the Actor-Critic q-learning algorithm, in which the policy is updated in the Actor-step based on the iteration rule induced by the improved Iq-function, and the value function and Iq-function are updated in the Critic-step based on the martingale orthogonality condition using the data from the exploratory formulation. We also establish the convergence of the inner iterations in the Actor-step in an infinite-horizon linear quadratic (LQ) framework. In two examples, within and beyond LQ framework, our q-learning algorithms are implemented with satisfactory performance.

preprint2026arXiv

Tweedie's Formulae and Diffusion Generative Models Beyond Gaussian

Diffusion models have achieved remarkable success in generating samples from unknown data distributions. Most popular stochastic differential equation-based diffusion models perturb the target distribution by adding Gaussian noise, transforming it into a simple prior, and then use denoising score matching, a consequence of Tweedie's formula, to learn the score function and generate clean samples from noise. However, non-Gaussian diffusion models with state-dependent diffusion coefficient have been largely underexplored, as have the corresponding Tweedie's formulae. In this work, we extend Tweedie's formula to important non-Gaussian processes, including geometric Brownian motion (GBM), squared Bessel (BESQ) processes, and Cox-Ingersoll-Ross (CIR) processes, thereby yielding the corresponding denoising score-matching objectives. We then apply the derived formulae to image and financial time series generation using GBM- and CIR-based diffusion models, and to empirical Bayes estimation under the BESQ setting. The reported experimental results demonstrate the potential of non-Gaussian models.

preprint2022arXiv

$g$-Expectation of Distributions

We define $g$-expectation of a distribution as the infimum of the $g$-expectations of all the terminal random variables sharing that distribution. We present two special cases for nonlinear $g$ where the $g$-expectation of distributions can be explicitly derived. As a related problem, we introduce the notion of law-invariant $g$-expectation and provide its sufficient conditions. Examples of application in financial dynamic portfolio choice are supplied.

preprint2022arXiv

Choquet regularization for reinforcement learning

We propose \emph{Choquet regularizers} to measure and manage the level of exploration for reinforcement learning (RL), and reformulate the continuous-time entropy-regularized RL problem of Wang et al. (2020, JMLR, 21(198)) in which we replace the differential entropy used for regularization with a Choquet regularizer. We derive the Hamilton--Jacobi--Bellman equation of the problem, and solve it explicitly in the linear--quadratic (LQ) case via maximizing statically a mean--variance constrained Choquet regularizer. Under the LQ setting, we derive explicit optimal distributions for several specific Choquet regularizers, and conversely identify the Choquet regularizers that generate a number of broadly used exploratory samplers such as $ε$-greedy, exponential, uniform and Gaussian.

preprint2022arXiv

Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach

We propose a unified framework to study policy evaluation (PE) and the associated temporal difference (TD) methods for reinforcement learning in continuous time and space. We show that PE is equivalent to maintaining the martingale condition of a process. From this perspective, we find that the mean--square TD error approximates the quadratic variation of the martingale and thus is not a suitable objective for PE. We present two methods to use the martingale characterization for designing PE algorithms. The first one minimizes a "martingale loss function", whose solution is proved to be the best approximation of the true value function in the mean--square sense. This method interprets the classical gradient Monte-Carlo algorithm. The second method is based on a system of equations called the "martingale orthogonality conditions" with test functions. Solving these equations in different ways recovers various classical TD algorithms, such as TD($λ$), LSTD, and GTD. Different choices of test functions determine in what sense the resulting solutions approximate the true value function. Moreover, we prove that any convergent time-discretized algorithm converges to its continuous-time counterpart as the mesh size goes to zero, and we provide the convergence rate. We demonstrate the theoretical results and corresponding algorithms with numerical experiments and applications.

preprint2021arXiv

Simulated annealing from continuum to discretization: a convergence analysis via the Eyring--Kramers law

We study the convergence rate of continuous-time simulated annealing $(X_t; \, t \ge 0)$ and its discretization $(x_k; \, k =0,1, \ldots)$ for approximating the global optimum of a given function $f$. We prove that the tail probability $\mathbb{P}(f(X_t) > \min f +δ)$ (resp. $\mathbb{P}(f(x_k) > \min f +δ)$) decays polynomial in time (resp. in cumulative step size), and provide an explicit rate as a function of the model parameters. Our argument applies the recent development on functional inequalities for the Gibbs measure at low temperatures -- the Eyring-Kramers law. In the discrete setting, we obtain a condition on the step size to ensure the convergence.

preprint2021arXiv

When to Quit Gambling, if You Must!

We develop an approach to solve Barberis (2012)'s casino gambling model in which a gambler whose preferences are specified by the cumulative prospect theory (CPT) must decide when to stop gambling by a prescribed deadline. We assume that the gambler can assist their decision using an independent randomization, and explain why it is a reasonable assumption. The problem is inherently time-inconsistent due to the probability weighting in CPT, and we study both precommitted and naive stopping strategies. We turn the original problem into a computationally tractable mathematical program, based on which we derive an optimal precommitted rule which is randomized and Markovian. The analytical treatment enables us to make several predictions regarding a gambler's behavior, including that with randomization they may enter the casino even when allowed to play only once, that whether they will play longer once they are granted more bets depends on whether they are in a gain or at a loss, and that it is prevalent that a naivite never stops loss.

preprint2020arXiv

Consistent Investment of Sophisticated Rank-Dependent Utility Agents in Continuous Time

We study portfolio selection in a complete continuous-time market where the preference is dictated by the rank-dependent utility. As such a model is inherently time inconsistent due to the underlying probability weighting, we study the investment behavior of sophisticated consistent planners who seek (subgame perfect) intra-personal equilibrium strategies. We provide sufficient conditions under which an equilibrium strategy is a replicating portfolio of a final wealth. We derive this final wealth profile explicitly, which turns out to be in the same form as in the classical Merton model with the market price of risk process properly scaled by a deterministic function in time. We present this scaling function explicitly through the solution to a highly nonlinear and singular ordinary differential equation, whose existence of solutions is established. Finally, we give a necessary and sufficient condition for the scaling function to be smaller than 1 corresponding to an effective reduction in risk premium due to probability weighting.

preprint2020arXiv

Variance Contracts

We study the design of an optimal insurance contract in which the insured maximizes her expected utility and the insurer limits the variance of his risk exposure while maintaining the principle of indemnity and charging the premium according to the expected value principle. We derive the optimal policy semi-analytically, which is coinsurance above a deductible when the variance bound is binding. This policy automatically satisfies the incentive-compatible condition, which is crucial to rule out ex post moral hazard. We also find that the deductible is absent if and only if the contract pricing is actuarially fair. Focusing on the actuarially fair case, we carry out comparative statics on the effects of the insured's initial wealth and the variance bound on insurance demand. Our results indicate that the expected coverage is always larger for a wealthier insured, implying that the underlying insurance is a normal good, which supports certain recent empirical findings. Moreover, as the variance constraint tightens, the insured who is prudent cedes less losses, while the insurer is exposed to less tail risk.

preprint2015arXiv

Time-Inconsistent Stochastic Linear--Quadratic Control: Characterization and Uniqueness of Equilibrium

In this paper, we continue our study on a general time-inconsistent stochastic linear--quadratic (LQ) control problem originally formulated in [6]. We derive a necessary and sufficient condition for equilibrium controls via a flow of forward--backward stochastic differential equations. When the state is one dimensional and the coefficients in the problem are all deterministic, we prove that the explicit equilibrium control constructed in \cite{HJZ} is indeed unique. Our proof is based on the derived equivalent condition for equilibria as well as a stochastic version of the Lebesgue differentiation theorem. Finally, we show that the equilibrium strategy is unique for a mean--variance portfolio selection model in a complete financial market where the risk-free rate is a deterministic function of time but all the other market parameters are possibly stochastic processes.

preprint2013arXiv

Optimal stopping under probability distortion

We formulate an optimal stopping problem for a geometric Brownian motion where the probability scale is distorted by a general nonlinear function. The problem is inherently time inconsistent due to the Choquet integration involved. We develop a new approach, based on a reformulation of the problem where one optimally chooses the probability distribution or quantile function of the stopped state. An optimal stopping time can then be recovered from the obtained distribution/quantile function, either in a straightforward way for several important cases or in general via the Skorokhod embedding. This approach enables us to solve the problem in a fairly general manner with different shapes of the payoff and probability distortion functions. We also discuss economical interpretations of the results. In particular, we justify several liquidation strategies widely adopted in stock trading, including those of "buy and hold", "cut loss or take profit", "cut loss and let profit run" and "sell on a percentage of historical high".

preprint2012arXiv

A Note on Indefinite Stochastic Riccati Equations

An indefinite stochastic Riccati Equation is a matrix-valued, highly nonlinear backward stochastic differential equation together with an algebraic, matrix positive definiteness constraint. We introduce a new approach to solve a class of such equations (including the existence of solutions) driven by one-dimensional Brownian motion. The idea is to replace the original equation by a system of BSDEs (without involving any algebraic constraint) whose existence of solutions automatically enforces the original algebraic constraint to be satisfied.

preprint2011arXiv

Time-Inconsistent Stochastic Linear--Quadratic Control

In this paper, we formulate a general time-inconsistent stochastic linear--quadratic (LQ) control problem. The time-inconsistency arises from the presence of a quadratic term of the expected state as well as a state-dependent term in the objective functional. We define an equilibrium, instead of optimal, solution within the class of open-loop controls, and derive a sufficient condition for equilibrium controls via a flow of forward--backward stochastic differential equations. When the state is one dimensional and the coefficients in the problem are all deterministic, we find an explicit equilibrium control. As an application, we then consider a mean-variance portfolio selection model in a complete financial market where the risk-free rate is a deterministic function of time but all the other market parameters are possibly stochastic processes. Applying the general sufficient condition, we obtain explicit equilibrium strategies when the risk premium is both deterministic and stochastic.

preprint2009arXiv

Continuous-Time Markowitz's Model with Transaction Costs

A continuous-time Markowitz's mean-variance portfolio selection problem is studied in a market with one stock, one bond, and proportional transaction costs. This is a singular stochastic control problem,inherently in a finite time horizon. With a series of transformations, the problem is turned into a so-called double obstacle problem, a well studied problem in physics and partial differential equation literature, featuring two time-varying free boundaries. The two boundaries, which define the buy, sell, and no-trade regions, are proved to be smooth in time. This in turn characterizes the optimal strategy, via a Skorokhod problem, as one that tries to keep a certain adjusted bond-stock position within the no-trade region. Several features of the optimal strategy are revealed that are remarkably different from its no-transaction-cost counterpart. It is shown that there exists a critical length in time, which is dependent on the stock excess return as well as the transaction fees but independent of the investment target and the stock volatility, so that an expected terminal return may not be achievable if the planning horizon is shorter than that critical length (while in the absence of transaction costs any expected return can be reached in an arbitrary period of time). It is further demonstrated that anyone following the optimal strategy should not buy the stock beyond the point when the time to maturity is shorter than the aforementioned critical length. Moreover, the investor would be less likely to buy the stock and more likely to sell the stock when the maturity date is getting closer. These features, while consistent with the widely accepted investment wisdom, suggest that the planning horizon is an integral part of the investment opportunities.

preprint2007arXiv

A Convex Stochastic Optimization Problem Arising from Portfolio Selection

A continuous-time financial portfolio selection model with expected utility maximization typically boils down to solving a (static) convex stochastic optimization problem in terms of the terminal wealth, with a budget constraint. In literature the latter is solved by assuming {\it a priori} that the problem is well-posed (i.e., the supremum value is finite) and a Lagrange multiplier exists (and as a consequence the optimal solution is attainable). In this paper it is first shown, via various counter-examples, neither of these two assumptions needs to hold, and an optimal solution does not necessarily exist. These anomalies in turn have important interpretations in and impacts on the portfolio selection modeling and solutions. Relations among the non-existence of the Lagrange multiplier, the ill-posedness of the problem, and the non-attainability of an optimal solution are then investigated. Finally, explicit and easily verifiable conditions are derived which lead to finding the unique optimal solution.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint