Source author record

Yongxin Chen

Yongxin Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Systems and Control math.OC eess.SY Machine Learning math-ph math.MP cond-mat.stat-mech math.PR Robotics math.FA Information Theory math.IT Artificial Intelligence Computation and Language Computer Vision math.ST Statistics Theory

Catalog footprint

What is connected

45works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

ACWM-Phys: Investigating Generalized Physical Interaction in Action-Conditioned Video World Models

Action-conditioned world models (ACWMs) have shown strong promise for video prediction and decision-making. However, existing benchmarks are largely restricted to egocentric navigation or narrow, task-specific robotics datasets, offering only limited coverage of the rich physical interactions required for generalized world understanding. We introduce ACWM-Phys, a new benchmark for evaluating action-conditioned prediction under diverse physical dynamics in a clean, controllable simulation environment with a carefully designed action space. ACWM-Phys contains training and evaluation data spanning rigid-body dynamics, kinematics, deformable-object interactions, and particle dynamics. To evaluate both interpolation and generalization, we design in-distribution and out-of-distribution protocols with controlled shifts in interaction patterns or scene configurations. By building the benchmark in a fully controllable simulator, ACWM-Phys enables precise data collection, reproducible evaluation, and systematic analysis of model capabilities for physically grounded world modeling. Through systematic experiments on ACWM-DiT, we find that OoD generalization depends not only on the physical regime but also on effective task complexity: models generalize well on visually simple, low-dimensional interactions with clear geometric structure, but suffer larger drops on deformable contacts, high-dimensional control, and complex articulated motion. This suggests that the model still relies heavily on visual appearance patterns instead of fully learning the underlying physics. Ablations show that cross-attention improves high-dimensional action conditioning, causal VAEs outperform frame-wise encoders, and larger action spaces are harder to model but can improve generalization by providing richer control signals. These findings guide the design of physically grounded world models.

preprint2026arXiv

Compositional Diffusion with Guided Search for Long-Horizon Planning

Generative models have emerged as powerful tools for planning, with compositional approaches offering particular promise for modeling long-horizon task distributions by composing together local, modular generative models. This compositional paradigm spans diverse domains, from multi-step manipulation planning to panoramic image synthesis to long video generation. However, compositional generative models face a critical challenge: when local distributions are multimodal, existing composition methods average incompatible modes, producing plans that are neither locally feasible nor globally coherent. We propose Compositional Diffusion with Guided Search (CDGS), which addresses this mode averaging problem by embedding search directly within the diffusion denoising process. Our method explores diverse combinations of local modes through population-based sampling, prunes infeasible candidates using likelihood-based filtering, and enforces global consistency through iterative resampling between overlapping segments. CDGS matches oracle performance on seven robot manipulation tasks, outperforming baselines that lack compositionality or require long-horizon training data. The approach generalizes across domains, enabling coherent text-guided panoramic images and long videos through effective local-to-global message passing. More details: https://cdgsearch.github.io/

preprint2026arXiv

Continuous Diffusion Scales Competitively with Discrete Diffusion for Language

While diffusion has drawn considerable recent attention from the language modeling community, continuous diffusion has appeared less scalable than discrete approaches. To challenge this belief we revisit Plaid, a likelihood-based continuous diffusion language model (DLM), and construct RePlaid by aligning the architecture of Plaid with modern discrete DLMs. In this unified setting, we establish the first scaling law for continuous DLMs that rivals discrete DLMs: RePlaid exhibits a compute gap of only $20\times$ compared to autoregressive models, outperforms Duo while using fewer parameters, and outperforms MDLM in the over-trained regime. We benchmark RePlaid against recent continuous DLMs: on OpenWebText, RePlaid achieves a new state-of-the-art PPL bound of $22.1$ among continuous DLMs and superior generation quality. These results suggest that continuous diffusion, when trained via likelihood, is a highly competitive and scalable alternative to discrete DLMs. Moreover, we offer theoretical insights to understand the advantage of likelihood-based training. We show that optimizing the noise schedule to minimize the ELBO's variance naturally yields linear cross-entropy (information loss) over time. This evenly distributes denoising difficulty without any case-specific time reparameterization. In addition, we find that optimizing embeddings via likelihood creates structured geometries and drives the most significant likelihood gain.

preprint2026arXiv

Efficient Adjoint Matching for Fine-tuning Diffusion Models

Reward fine-tuning has become a common approach for aligning pretrained diffusion and flow models with human preferences in text-to-image generation. Among reward-gradient-based methods, Adjoint Matching (AM) provides a principled formulation by casting reward fine-tuning as a stochastic optimal control (SOC) problem. However, AM inevitably requires a substantial computational cost: it requires (i) stochastic simulation of full generative trajectories under memoryless dynamics, resulting in a large number of function evaluations, and (ii) backward ODE simulation of the adjoint state along each sampled trajectory. In this work, we observe that both bottlenecks are closely tied to the \textit{non-trivial base drift} inherited from the pretrained model. Motivated by this observation, we propose \textbf{Efficient Adjoint Matching (EAM)}, which substantially improves training efficiency by reformulating the SOC problem with a \textit{linear base drift} and a correspondingly modified \textit{terminal cost}. This reformulation removes both sources of inefficiency; it enables training-time sampling with a few-step deterministic ODE solver and yields a closed-form adjoint solution that eliminates backward adjoint simulation. On standard text-to-image reward fine-tuning benchmarks, EAM converges up to 4x faster than AM and matches or surpasses it across various metrics including PickScore, ImageReward, HPSv2.1, CLIPScore and Aesthetics.

preprint2025arXiv

Efficient Iterative Proximal Variational Inference Motion Planning

We cast motion planning under uncertainty as a stochastic optimal control problem, where the optimal posterior distribution has an explicit form. To approximate this posterior, this work frames an optimization problem in the space of Gaussian distributions by solving a Variational Inference (VI) in the path distribution space. For linear-Gaussian stochastic dynamics, a proximal algorithm is proposed to solve for an optimal Gaussian proposal iteratively. The computational bottleneck is evaluating the gradients with respect to the proposal over a dense trajectory. To tackle this issue, the sparse planning factor graph and Gaussian Belief Propagation (GBP) are exploited, allowing for parallel computation of these gradients on Graphics Processing Units (GPUs). We term the novel paradigm the \textit{Parallel Gaussian Variational Inference Motion Planning (P-GVIMP)}. Building on the efficient algorithm for linear Gaussian systems, we then propose an iterative paradigm based on Statistical Linear Regression (SLR) techniques to solve planning problems for nonlinear stochastic systems, where the P-GVIMP serves as a sub-routine for the linearized time-varying system at each iteration. The proposed framework is validated on various robotic systems, demonstrating significant speed acceleration achieved by leveraging parallel computation and successful planning solutions for nonlinear systems under uncertainty. An open-sourced implementation is presented at \href{https://github.com/hzyu17/VIMP}{https://github.com/hzyu17/VIMP}.

preprint2023arXiv

Generative Skill Chaining: Long-Horizon Skill Planning with Diffusion Models

Long-horizon tasks, usually characterized by complex subtask dependencies, present a significant challenge in manipulation planning. Skill chaining is a practical approach to solving unseen tasks by combining learned skill priors. However, such methods are myopic if sequenced greedily and face scalability issues with search-based planning strategy. To address these challenges, we introduce Generative Skill Chaining~(GSC), a probabilistic framework that learns skill-centric diffusion models and composes their learned distributions to generate long-horizon plans during inference. GSC samples from all skill models in parallel to efficiently solve unseen tasks while enforcing geometric constraints. We evaluate the method on various long-horizon tasks and demonstrate its capability in reasoning about action dependencies, constraint handling, and generalization, along with its ability to replan in the face of perturbations. We show results in simulation and on real robot to validate the efficiency and scalability of GSC, highlighting its potential for advancing long-horizon task planning. More details are available at: https://generative-skill-chaining.github.io/

preprint2022arXiv

A Proximal Algorithm for Sampling from Non-smooth Potentials

In this work, we examine sampling problems with non-smooth potentials. We propose a novel Markov chain Monte Carlo algorithm for sampling from non-smooth potentials. We provide a non-asymptotical analysis of our algorithm and establish a polynomial-time complexity $\tilde {\cal O}(d\varepsilon^{-1})$ to obtain $\varepsilon$ total variation distance to the target density, better than most existing results under the same assumptions. Our method is based on the proximal bundle method and an alternating sampling framework. This framework requires the so-called restricted Gaussian oracle, which can be viewed as a sampling counterpart of the proximal mapping in convex optimization. One key contribution of this work is a fast algorithm that realizes the restricted Gaussian oracle for any convex non-smooth potential with bounded Lipschitz constant.

preprint2022arXiv

Data-Driven Optimal Control via Linear Transfer Operators: A Convex Approach

This paper is concerned with data-driven optimal control of nonlinear systems. We present a convex formulation to the optimal control problem (OCP) with a discounted cost function. We consider OCP with both positive and negative discount factor. The convex approach relies on lifting nonlinear system dynamics in the space of densities using the linear Perron-Frobenius (P-F) operator. This lifting leads to an infinite-dimensional convex optimization formulation of the optimal control problem. The data-driven approximation of the optimization problem relies on the approximation of the Koopman operator using the polynomial basis function. We write the approximate finite-dimensional optimization problem as a polynomial optimization which is then solved efficiently using a sum-of-squares-based optimization framework. Simulation results are presented to demonstrate the efficacy of the developed data-driven optimal control framework.

preprint2022arXiv

Improved analysis for a proximal algorithm for sampling

We study the proximal sampler of Lee, Shen, and Tian (2021) and obtain new convergence guarantees under weaker assumptions than strong log-concavity: namely, our results hold for (1) weakly log-concave targets, and (2) targets satisfying isoperimetric assumptions which allow for non-log-concavity. We demonstrate our results by obtaining new state-of-the-art sampling guarantees for several classes of target distributions. We also strengthen the connection between the proximal sampler and the proximal method in optimization by interpreting the proximal sampler as an entropically regularized Wasserstein proximal method, and the proximal point method as the limit of the proximal sampler with vanishing noise.

preprint2022arXiv

Inertialess Gyrating Engines

A typical model for a gyrating engine consists of an inertial wheel powered by an energy source that generates an angle-dependent torque. Examples of such engines include a pendulum with an externally applied torque, Stirling engines, and the Brownian gyrating engine. Variations in the torque are averaged out by the inertia of the system to produce limit cycle oscillations. While torque generating mechanisms are also ubiquitous in the biological world, where they typically feed on chemical gradients, inertia is not a property that one naturally associates with such processes. In the present work, seeking ways to dispense of the need for inertial effects, we study an inertia-less concept where the combined effect of coupled torque-producing components averages out variations in the ambient potential and helps overcome dissipative forces to allow sustained operation for vanishingly small inertia. We exemplify this inertia-less concept through analysis of two of the aforementioned engines, the Stirling engine and the Brownian gyrating engine. An analogous principle may be sought in biomolecular processes as well as in modern-day technological engines, where for the latter, the coupled torque-producing components reduce vibrations that stem from the variability of the generated torque.

preprint2022arXiv

Path Integral Sampler: a stochastic control approach for sampling

We present Path Integral Sampler~(PIS), a novel algorithm to draw samples from unnormalized probability density functions. The PIS is built on the Schrödinger bridge problem which aims to recover the most likely evolution of a diffusion process given its initial distribution and terminal distribution. The PIS draws samples from the initial distribution and then propagates the samples through the Schrödinger bridge to reach the terminal distribution. Applying the Girsanov theorem, with a simple prior diffusion, we formulate the PIS as a stochastic optimal control problem whose running cost is the control energy and terminal cost is chosen according to the target distribution. By modeling the control as a neural network, we establish a sampling algorithm that can be trained end-to-end. We provide theoretical justification of the sampling quality of PIS in terms of Wasserstein distance when sub-optimal control is used. Moreover, the path integrals theory is used to compute importance weights of the samples to compensate for the bias induced by the sub-optimality of the controller and time-discretization. We experimentally demonstrate the advantages of PIS compared with other start-of-the-art sampling methods on a variety of tasks.

preprint2022arXiv

Signed Graph Neural Networks: A Frequency Perspective

Graph convolutional networks (GCNs) and its variants are designed for unsigned graphs containing only positive links. Many existing GCNs have been derived from the spectral domain analysis of signals lying over (unsigned) graphs and in each convolution layer they perform low-pass filtering of the input features followed by a learnable linear transformation. Their extension to signed graphs with positive as well as negative links imposes multiple issues including computational irregularities and ambiguous frequency interpretation, making the design of computationally efficient low pass filters challenging. In this paper, we address these issues via spectral analysis of signed graphs and propose two different signed graph neural networks, one keeps only low-frequency information and one also retains high-frequency information. We further introduce magnetic signed Laplacian and use its eigendecomposition for spectral analysis of directed signed graphs. We test our methods for node classification and link sign prediction tasks on signed graphs and achieve state-of-the-art performances.

preprint2022arXiv

Stochastic thermodynamic engines under time-varying temperature profile

In the present paper, we study the power output and efficiency of overdamped stochastic thermodynamic engines that are in contact with a heat bath having a temperature that varies periodically with time. This is in contrast to most of the existing literature that considers the Carnot paradigm of alternating contact with heat baths having different fixed temperatures, hot and cold. Specifically, we consider a periodic and bounded but otherwise arbitrary temperature profile and derive explicit bounds on the power and efficiency achievable by a suitable controlling potential that couples the thermodynamic engine to the external world. Standing assumptions in our analysis are bounds on the norm of the gradient of effective potentials -- in the absence of any such constraint, the physically questionable conclusion of arbitrarily large power can be drawn.

preprint2022arXiv

Thermodynamic engine powered by anisotropic fluctuations

The purpose of this work is to present the concept of an autonomous Stirling-like engine powered by anisotropy of thermodynamic fluctuations. Specifically, simultaneous contact of a thermodynamic system with two heat baths along coupled degrees of freedom generates torque and circulatory currents -- an arrangement referred to as a Brownian gyrator. The embodiment that constitutes the engine includes an inertial wheel to sustain rotary motion and average out the generated fluctuating torque, ultimately delivering power to an external load. We detail an electrical model for such an engine that consists of two resistors in different temperatures and three reactive elements in the form of variable capacitors. The resistors generate Johnson-Nyquist current fluctuations that power the engine, while the capacitors generate driving forces via a coupling of their dielectric material with the inertial wheel. A proof-of-concept is established via stability analysis to ensure the existence of a stable periodic orbit generating sustained power output. We conclude by drawing a connection to the dynamics of a damped pendulum with constant torque and to those of a macroscopic Stirling engine. The sought insights aim at nano-engines and biological processes that are similarly powered by anisotropy in temperature and chemical potentials.

preprint2022arXiv

Underdamped stochastic thermodynamic engines in contact with a heat bath with arbitrary temperature profile

We study thermodynamic processes in contact with a heat bath that may have an arbitrary time-varying periodic temperature profile. Within the framework of stochastic thermodynamics, and for models of thermo-dynamic engines in the idealized case of underdamped particles in the low-friction regime, we derive explicit bounds as well as optimal control protocols that draw maximum power and achieve maximum efficiency at any specified level of power.

preprint2021arXiv

Feedback Particle Filter for Collective Inference

The purpose of this paper is to describe the feedback particle filter algorithm for problems where there are a large number ($M$) of non-interacting agents (targets) with a large number ($M$) of non-agent specific observations (measurements) that originate from these agents. In its basic form, the problem is characterized by data association uncertainty whereby the association between the observations and agents must be deduced in addition to the agent state. In this paper, the large-$M$ limit is interpreted as a problem of collective inference. This viewpoint is used to derive the equation for the empirical distribution of the hidden agent states. A feedback particle filter (FPF) algorithm for this problem is presented and illustrated via numerical simulations. Results are presented for the Euclidean and the finite state-space cases, both in continuous-time settings. The classical FPF algorithm is shown to be the special case (with $M=1$) of these more general results. The simulations help show that the algorithm well approximates the empirical distribution of the hidden states for large $M$.

preprint2021arXiv

Harvesting energy from a periodic heat bath

The context of the present paper is stochastic thermodynamics - an approach to nonequilibrium thermodynamics rooted within the broader framework of stochastic control. In contrast to the classical paradigm of Carnot engines, we herein propose to consider thermodynamic processes with periodic continuously varying temperature of a heat bath and study questions of maximal power and efficiency for two idealized cases, overdamped (first-order) and underdamped (second-order) stochastic models. We highlight properties of optimal periodic control, derive and numerically validate approximate formulae for the optimal performance (power and efficiency).

preprint2021arXiv

On the relation between information and power in stochastic thermodynamic engines

The common saying, that information is power, takes a rigorous form in stochastic thermodynamics, where a quantitative equivalence between the two helps explain the paradox of Maxwell's demon in its ability to reduce entropy. In the present paper, we build on earlier work on the interplay between the relative cost and benefits of information in producing work in cyclic operation of thermodynamic engines (by Sandberg etal. 2014). Specifically, we study the general case of overdamped particles in a time-varying potential (control action) in feedback that utilizes continuous measurements (nonlinear filtering) of a thermodynamic ensemble, to produce suitable adaptations of the second law of thermodynamics that involve information.

preprint2021arXiv

Optimal steering to invariant distributions for networks flows

We derive novel results on the ergodic theory of irreducible, aperiodic Markov chains. We show how to optimally steer the network flow to a stationary distribution over a finite or infinite time horizon. Optimality is with respect to an entropic distance between distributions on feasible paths. When the prior is reversible, it shown that solutions to this discrete time and space steering problem are reversible as well. A notion of temperature is defined for Boltzmann distributions on networks, and problems analogous to cooling (in this case, for evolutions in discrete space and time) are discussed.

preprint2020arXiv

A convex data-driven approach for nonlinear control synthesis

We consider a class of nonlinear control synthesis problems where the underlying mathematical models are not explicitly known. We propose a data-driven approach to stabilize the systems when only sample trajectories of the dynamics are accessible. Our method is founded on the density function based almost everywhere stability certificate that is dual to the Lyapunov function for dynamic systems. Unlike Lyapunov based methods, density functions lead to a convex formulation for a joint search of the control strategy and the stability certificate. This type of convex problem can be solved efficiently by invoking the machinery of the sum of squares (SOS). For the data-driven part, we exploit the fact that the duality results in the stability theory of the dynamical system can be understood using linear Perron-Frobenius and Koopman operators. This connection allows us to use data-driven methods developed to approximate these operators combined with the SOS techniques for the convex formulation of control synthesis. The efficacy of the proposed approach is demonstrated through several examples.

preprint2020arXiv

Improving Robustness via Risk Averse Distributional Reinforcement Learning

One major obstacle that precludes the success of reinforcement learning in real-world applications is the lack of robustness, either to model uncertainties or external disturbances, of the trained policies. Robustness is critical when the policies are trained in simulations instead of real world environment. In this work, we propose a risk-aware algorithm to learn robust policies in order to bridge the gap between simulation training and real-world implementation. Our algorithm is based on recently discovered distributional RL framework. We incorporate CVaR risk measure in sample based distributional policy gradients (SDPG) for learning risk-averse policies to achieve robustness against a range of system disturbances. We validate the robustness of risk-aware SDPG on multiple environments.

preprint2020arXiv

Incremental inference of collective graphical models

We consider incremental inference problems from aggregate data for collective dynamics. In particular, we address the problem of estimating the aggregate marginals of a Markov chain from noisy aggregate observations in an incremental (online) fashion. We propose a sliding window Sinkhorn belief propagation (SW-SBP) algorithm that utilizes a sliding window filter of the most recent noisy aggregate observations along with encoded information from discarded observations. Our algorithm is built upon the recently proposed multi-marginal optimal transport based SBP algorithm that leverages standard belief propagation and Sinkhorn algorithm to solve inference problems from aggregate data. We demonstrate the performance of our algorithm on applications such as inferring population flow from aggregate observations.

preprint2020arXiv

Maximal power output of a stochastic thermodynamic engine

Classical thermodynamics aimed to quantify the efficiency of thermodynamic engines by bounding the maximal amount of mechanical energy produced compared to the amount of heat required. While this was accomplished early on, by Carnot and Clausius, the more practical problem to quantify limits of power that can be delivered, remained elusive due to the fact that quasistatic processes require infinitely slow cycling, resulting in a vanishing power output. Recent insights, drawn from stochastic models, appear to bridge the gap between theory and practice in that they lead to physically meaningful expressions for the dissipation cost in operating a thermodynamic engine over a finite time window. Building on this framework of {\em stochastic thermodynamics} we derive bounds on the maximal power that can be drawn by cycling an overdamped ensemble of particles via a time-varying potential while alternating contact with heat baths of different temperature ($T_c$ cold, and $T_h$ hot). Specifically, assuming a suitable bound $M$ on the spatial gradient of the controlling potential, we show that the maximal achievable power is bounded by $\frac{M}{8}(\frac{T_h}{T_c}-1)$. Moreover, we show that this bound can be reached to within a factor of $(\frac{T_h}{T_c}-1)/(\frac{T_h}{T_c}+1)$ by operating the cyclic thermodynamic process with a quadratic potential.

preprint2020arXiv

Multi-marginal optimal transport and probabilistic graphical models

We study multi-marginal optimal transport problems from a probabilistic graphical model perspective. We point out an elegant connection between the two when the underlying cost for optimal transport allows a graph structure. In particular, an entropy regularized multi-marginal optimal transport is equivalent to a Bayesian marginal inference problem for probabilistic graphical models with the additional requirement that some of the marginal distributions are specified. This relation on the one hand extends the optimal transport as well as the probabilistic graphical model theories, and on the other hand leads to fast algorithms for multi-marginal optimal transport by leveraging the well-developed algorithms in Bayesian inference. Several numerical examples are provided to highlight the results.

preprint2020arXiv

On Spectral Properties of Signed Laplacians with Connections to Eventual Positivity

Signed graphs have appeared in a broad variety of applications, ranging from social networks to biological networks, from distributed control and computation to power systems. In this paper, we investigate spectral properties of signed Laplacians for undirected signed graphs. We find conditions on the negative weights under which a signed Laplacian is positive semidefinite via the Kron reduction and multiport network theory. For signed Laplacians that are indefinite, we characterize their inertias with the same framework. Furthermore, we build connections between signed Laplacians, generalized M-matrices, and eventually exponentially positive matrices.

preprint2020arXiv

Probabilistic Kernel Support Vector Machines

We propose a probabilistic enhancement of standard kernel Support Vector Machines for binary classification, in order to address the case when, along with given data sets, a description of uncertainty (e.g., error bounds) may be available on each datum. In the present paper, we specifically consider Gaussian distributions to model uncertainty. Thereby, our data consist of pairs $(x_i,Σ_i)$, $i\in\{1,\ldots,N\}$, along with an indicator $y_i\in\{-1,1\}$ to declare membership in one of two categories for each pair. These pairs may be viewed to represent the mean and covariance, respectively, of random vectors $ξ_i$ taking values in a suitable linear space (typically $\mathbb R^n$). Thus, our setting may also be viewed as a modification of Support Vector Machines to classify distributions, albeit, at present, only Gaussian ones. We outline the formalism that allows computing suitable classifiers via a natural modification of the standard "kernel trick." The main contribution of this work is to point out a suitable kernel function for applying Support Vector techniques to the setting of uncertain data for which a detailed uncertainty description is also available (herein, "Gaussian points").

preprint2020arXiv

Regularized transport between singular covariance matrices

We consider the problem of steering a linear stochastic system between two end-point degenerate Gaussian distributions in finite time. This accounts for those situations in which some but not all of the state entries are uncertain at the initial, t = 0, and final time, t = T . This problem entails non-trivial technical challenges as the singularity of terminal state-covariance causes the control to grow unbounded at the final time T. Consequently, the entropic interpolation (Schroedinger Bridge) is provided by a diffusion process which is not finite-energy, thereby placing this case outside of most of the current theory. In this paper, we show that a feasible interpolation can be derived as a limiting case of earlier results for non-degenerate cases, and that it can be expressed in closed form. Moreover, we show that such interpolation belongs to the same reciprocal class of the uncontrolled evolution. By doing so we also highlight a time-symmetry of the problem, contrasting dual formulations in the forward and reverse time-directions, where in each the control grows unbounded as time approaches the end-point (in the forward and reverse time-direction, respectively).

preprint2020arXiv

Sample-based Distributional Policy Gradient

Distributional reinforcement learning (DRL) is a recent reinforcement learning framework whose success has been supported by various empirical studies. It relies on the key idea of replacing the expected return with the return distribution, which captures the intrinsic randomness of the long term rewards. Most of the existing literature on DRL focuses on problems with discrete action space and value based methods. In this work, motivated by applications in robotics with continuous action space control settings, we propose sample-based distributional policy gradient (SDPG) algorithm. It models the return distribution using samples via a reparameterization technique widely used in generative modeling and inference. We compare SDPG with the state-of-art policy gradient method in DRL, distributed distributional deterministic policy gradients (D4PG), which has demonstrated state-of-art performance. We apply SDPG and D4PG to multiple OpenAI Gym environments and observe that our algorithm shows better sample efficiency as well as higher reward for most tasks.

preprint2016arXiv

Matrix Optimal Mass Transport: A Quantum Mechanical Approach

In this paper, we describe a possible generalization of the Wasserstein 2-metric, originally defined on the space of scalar probability densities, to the space of Hermitian matrices with trace one, and to the space of matrix-valued probability densities. Our approach follows a computational fluid dynamical formulation of the Wasserstein-2 metric and utilizes certain results from the quantum mechanics of open systems, in particular the Lindblad equation. It allows determining the gradient flow for the quantum entropy relative to this matricial Wasserstein metric. This may have implications to some key issues in quantum information theory.

preprint2016arXiv

Optimal steering of a linear stochastic system to a final probability distribution, Part III

The subject of this work has its roots in the so called Schroedginer Bridge Problem (SBP) which asks for the most likely distribution of Brownian particles in their passage between observed empirical marginal distributions at two distinct points in time. Renewed interest in this problem was sparked by a reformulation in the language of stochastic control. In earlier works, presented as Part I and Part II, we explored a generalization of the original SBP that amounts to optimal steering of linear stochastic dynamical systems between state-distributions, at two points in time, under full state feedback. In these works the cost was quadratic in the control input. The purpose of the present work is to detail the technical steps in extending the framework to the case where a quadratic cost in the state is also present. In the zero-noise limit, we obtain the solution of a (deterministic) mass transport problem with general quadratic cost.

preprint2016arXiv

Regularization and Interpolation of Positive Matrices

We consider certain matricial analogues of optimal mass transport of positive definite matrices of equal trace. The framework is motivated by the need to devise a suitable geometry for interpolating positive definite matrices in ways that allow controlling the apparent tradeoff between "aligning up their eigenstructure" and "scaling the corresponding eigenvalues". Indeed, motivation for this work is provided by power spectral analysis of multivariate time series where, linear interpolation between matrix-valued power spectra generates push-pop artifacts. Push-pop of power distribuion is objectionable as it corresponds to unrealistic response of scatterers.

preprint2016arXiv

Robust transport over networks

We consider transport over a strongly connected, directed graph. The scheduling amounts to selecting transition probabilities for a discrete-time Markov evolution which is designed to be consistent with certain initial and final marginals. The random evolution is selected to be closest to a prior measure on paths in the relative entropy sense, i.e., a Schroedinger bridge between the two marginals. This is an atypical stochastic control problem where the control consists in suitably modifying the transition mechanism. The prior can incorporate cost of traversing edges or allocate equal probability to all paths of equal length connecting any two given nodes, i.e., a uniform measure on paths. This latter choice relies on the so-called Ruelle-Bowen random walk and gives rise to a scheduling that tends to utilize all paths as uniformly as the topology allows. Thus, when the Ruelle-Bowen law is taken as prior, the transportation plan tends to lessen congestion and ensure a level of robustness. We show that the Ruelle-Bowen law is itself a Schroedinger bridge albeit with a prior that is not a probability measure. The paradigm of Schroedinger bridges as a mechanism for scheduling transport on networks can be adapted to graphs that are not strongly connected as well as to weighted graphs. The latter leads to transportation plans that effect a compromise between robustness and transportation cost.

preprint2016arXiv

Stochastic control, entropic interpolation and gradient flows on Wasserstein product spaces

Since the early nineties, it has been observed that the Schroedinger bridge problem can be formulated as a stochastic control problem with atypical boundary constraints. This in turn has a fluid dynamic counterpart where the flow of probability densities represents an entropic interpolation between the given initial and final marginals. In the zero noise limit, such entropic interpolation converges in a suitable sense to the displacement interpolation of optimal mass transport (OMT). We consider two absolutely continuous curves in Wasserstein space ${\cal W}_2$ and study the evolution of the relative entropy on ${\cal W}_2\times {\cal W}_2$ on a finite time interval. Thus, this study differs from previous work in OMT theory concerning relative entropy from a fixed (often equilibrium) distribution (density). We derive a gradient flow on Wasserstein product space. We find the remarkable property that fluxes in the two components are opposite. Plugging in the "steepest descent" into the evolution of the relative entropy we get what appears to be a new formula: The two flows approach each other at a faster rate than that of two solutions of the same Fokker-Planck. We then study the evolution of relative entropy in the case of uncontrolled-controlled diffusions. In two special cases of the Schroedinger bridge problem, we show that such relative entropy may be monotonically decreasing or monotonically increasing.

preprint2015arXiv

Entropic and displacement interpolation: a computational approach using the Hilbert metric

Monge-Kantorovich optimal mass transport (OMT) provides a blueprint for geometries in the space of positive densities -- it quantifies the cost of transporting a mass distribution into another. In particular, it provides natural options for interpolation of distributions (displacement interpolation) and for modeling flows. As such it has been the cornerstone of recent developments in physics, probability theory, image processing, time-series analysis, and several other fields. In spite of extensive work and theoretical developments, the computation of OMT for large scale problems has remained a challenging task. An alternative framework for interpolating distributions, rooted in statistical mechanics and large deviations, is that of Schroedinger bridges (entropic interpolation). This may be seen as a stochastic regularization of OMT and can be cast as the stochastic control problem of steering the probability density of the state-vector of a dynamical system between two marginals. In this approach, however, the actual computation of flows had hardly received any attention. In recent work on Schroedinger bridges for Markov chains and quantum evolutions, we noted that the solution can be efficiently obtained from the fixed-point of a map which is contractive in the Hilbert metric. Thus, the purpose of this paper is to show that a similar approach can be taken in the context of diffusion processes which i) leads to a new proof of a classical result on Schroedinger bridges and ii) provides an efficient computational scheme for both, Schroedinger bridges and OMT. We illustrate this new computational approach by obtaining interpolation of densities in representative examples such as interpolation of images.

preprint2015arXiv

Fast cooling for a system of stochastic oscillators

We study feedback control of coupled nonlinear stochastic oscillators in a force field. We first consider the problem of asymptotically driving the system to a desired {\em steady state} corresponding to reduced thermal noise. Among the feedback controls achieving the desired asymptotic transfer, we find that the most efficient one {from an energy point of view} is characterized by {\em time-reversibility}. We also extend the theory of Schrödinger bridges to this model, thereby steering the system in {\em finite time} and with minimum effort to a target steady-state distribution. The system can then be maintained in this state through the optimal steady-state feedback control. The solution, in the finite-horizon case, involves a space-time harmonic function $φ$, and $-\logφ$ plays the role of an artificial, time-varying potential in which the desired evolution occurs. This framework appears extremely general and flexible and can be viewed as a considerable generalization of existing active control strategies such as macromolecular cooling. In the case of a quadratic potential, the results assume a form particularly attractive from the algorithmic viewpoint as the optimal control can be computed via deterministic matricial differential equations. An example involving inertial particles illustrates both transient and steady state optimal feedback control.

preprint2015arXiv

Optimal control of the state statistics for a linear stochastic system

We consider a variant of the classical linear quadratic Gaussian regulator (LQG) in which penalties on the endpoint state are replaced by the specification of the terminal state distribution. The resulting theory considerably differs from LQG as well as from formulations that bound the probability of violating state constraints. We develop results for optimal state-feedback control in the two cases where i) steering of the state distribution is to take place over a finite window of time with minimum energy, and ii) the goal is to maintain the state at a stationary distribution over an infinite horizon with minimum power. For both problems the distribution of noise and state are Gaussian. In the first case, we show that provided the system is controllable, the state can be steered to any terminal Gaussian distribution over any specified finite time-interval. In the second case, we characterize explicitly the covariance of admissible stationary state distributions that can be maintained with constant state-feedback control. The conditions for optimality are expressed in terms of a system of dynamically coupled Riccati equations in the finite horizon case and in terms of algebraic conditions for the stationary case. In the case where the noise and control share identical input channels, the Riccati equations for finite-horizon steering become homogeneous and can be solved in closed form. The present paper is largely based on our recent work in arxiv.org/abs/1408.2222, arxiv.org/abs/1410.3447 and presents an overview of certain key results.

preprint2015arXiv

Optimal mass transport over bridges

We present an overview of our recent work on implementable solutions to the Schroedinger bridge problem and their potential application to optimal transport and various generalizations.

preprint2015arXiv

Optimal transport over a linear dynamical system

We consider the problem of steering an initial probability density for the state vector of a linear system to a final one, in finite time, using minimum energy control. In the case where the dynamics correspond to an integrator ($\dot x(t) = u(t)$) this amounts to a Monge-Kantorovich Optimal Mass Transport (OMT) problem. In general, we show that the problem can again be reduced to solving an OMT problem and that it has a unique solution. In parallel, we study the optimal steering of the state-density of a linear stochastic system with white noise disturbance; this is known to correspond to a Schrödinger bridge. As the white noise intensity tends to zero, the flow of densities converges to that of the deterministic dynamics and can serve as a way to compute the solution of its deterministic counterpart. The solution can be expressed in closed-form for Gaussian initial and final state densities in both cases.

preprint2015arXiv

Steering state statistics with output feedback

Consider a linear stochastic system whose initial state is a random vector with a specified Gaussian distribution. Such a distribution may represent a collection of particles abiding by the specified system dynamics. In recent publications, we have shown that, provided the system is controllable, it is always possible to steer the state covariance to any specified terminal Gaussian distribution using state feedback. The purpose of the present work is to show that, in the case where only partial state observation is available, a necessary and sufficient condition for being able to steer the system to a specified terminal Gaussian distribution for the state vector is that the terminal state covariance be greater (in the positive-definite sense) than the error covariance of a corresponding Kalman filter.

preprint2015arXiv

The role of the time-arrow in mean-square estimation of stochastic processes

The purpose of this paper is to explain a certain dichotomy between the information that the past and future values of a multivariate stochastic process carry about the present. More specifically, vector-valued, second-order stochastic processes may be deterministic in one time-direction and not the other. This phenomenon, which is absent in scalar-valued processes, is deeply rooted in the geometry of the shift-operator. The exposition and the examples we discuss are based on the work of Douglas, Shapiro and Shields on cyclic vectors of the backward shift and relate to classical ideas going back to Wiener and Kolmogorov. We focus on rank-one stochastic processes for which we present a characterization of all regular processes that are deterministic in the reverse time-direction. The paper builds on examples and the goal is to provide pertinent insights to a control engineering audience.

preprint2014arXiv

On the relation between optimal transport and Schrödinger bridges: A stochastic control viewpoint

We take a new look at the relation between the optimal transport problem and the Schrödinger bridge problem from the stochastic control perspective. We show that the connections are richer and deeper than described in existing literature. In particular: a) We give an elementary derivation of the Benamou-Brenier fluid dynamics version of the optimal transport problem; b) We provide a new fluid dynamics version of the Schrödinger bridge problem; c) We observe that the latter provides an important connection with optimal transport without zero noise limits; d) We propose and solve a fluid dynamic version of optimal transport with prior; e) We can then view optimal transport with prior as the zero noise limit of Schrödinger bridges when the prior is any Markovian evolution. In particular, we work out the Gaussian case. A numerical example of the latter convergence involving Brownian particles is also provided.

preprint2014arXiv

Optimal steering of a linear stochastic system to a final probability distribution

We consider the problem to steer a linear dynamical system with full state observation from an initial gaussian distribution in state-space to a final one with minimum energy control. The system is stochastically driven through the control channels; an example for such a system is that of an inertial particle experiencing random "white noise" forcing. We show that a target probability distribution can always be achieved in finite time. The optimal control is given in state-feedback form and is computed explicitely by solving a pair of differential Lyapunov equations that are coupled through their boundary values. This result, given its attractive algorithmic nature, appears to have several potential applications such as to active control of nanomechanical systems and molecular cooling. The problem to steer a diffusion process between end-point marginals has a long history (Schrödinger bridges) and therefore, the present case of steering a linear stochastic system constitutes a Schrödinger bridge for possibly degenerate diffusions. Our results, however, provide the first implementable form of the optimal control for a general Gauss-Markov process. Illustrative examples of the optimal evolution and control for inertial particles and a stochastic oscillator are provided. A final result establishes directly the property of Schrödinger bridges as the most likely random evolution between given marginals to the present context of linear stochastic systems.

preprint2014arXiv

Optimal steering of a linear stochastic system to a final probability distribution, part II

We consider the problem of minimum energy steering of a linear stochastic system to a final prescribed distribution over a finite horizon and to maintain a stationary distribution over an infinite horizon. We present sufficient conditions for optimality in terms of a system of dynamically coupled Riccati equations in the finite horizon case and algebraic in the stationary case. We then address the question of feasibility for both problems. For the finite-horizon case, provided the system is controllable, we prove that without any restriction on the directionality of the stochastic disturbance it is always possible to steer the state to any arbitrary Gaussian distribution over any specified finite time-interval. For the stationary infinite horizon case, it is not always possible to maintain the state at an arbitrary Gaussian distribution through constant state-feedback. It is shown that covariances of admissible stationary Gaussian distributions are characterized by a certain Lyapunov-like equation. We finally present an alternative to solving the system of coupled Riccati equations, by expressing the optimal controls in the form of solutions to (convex) semi-definite programs for both cases. We conclude with an example to steer the state covariance of the distribution of inertial particles to an admissible stationary Gaussian distribution over a finite interval, to be maintained at that stationary distribution thereafter by constant-gain state-feedback control.

preprint2014arXiv

Optimal steering of inertial particles diffusing anisotropically with losses

Exploiting a fluid dynamic formulation for which a probabilistic counterpart might not be available, we extend the theory of Schroedinger bridges to the case of inertial particles with losses and general, possibly singular diffusion coefficient. We find that, as for the case of constant diffusion coefficient matrix, the optimal control law is obtained by solving a system of two p.d.e.'s involving adjoint operators and coupled through their boundary values. In the linear case with quadratic loss function, the system turns into two matrix Riccati equations with coupled split boundary conditions. An alternative formulation of the control problem as a semidefinite programming problem allows computation of suboptimal solutions. This is illustrated in one example of inertial particles subject to a constant rate killing.

preprint2014arXiv

Stochastic bridges of linear systems

We study a generalization of the Brownian bridge as a stochastic process that models the position and velocity of inertial particles between the two end-points of a time interval. The particles experience random acceleration and are assumed to have known states at the boundary. Thus, the movement of the particles can be modeled as an Ornstein-Uhlenbeck process conditioned on position and velocity measurements at the two end-points. It is shown that optimal stochastic control provides a stochastic differential equation (SDE) that generates such a bridge as a degenerate diffusion process. Generalizations to higher order linear diffusions are considered.

Yongxin Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

45 published item(s)

ACWM-Phys: Investigating Generalized Physical Interaction in Action-Conditioned Video World Models

Compositional Diffusion with Guided Search for Long-Horizon Planning

Continuous Diffusion Scales Competitively with Discrete Diffusion for Language

Efficient Adjoint Matching for Fine-tuning Diffusion Models

Efficient Iterative Proximal Variational Inference Motion Planning

Generative Skill Chaining: Long-Horizon Skill Planning with Diffusion Models

A Proximal Algorithm for Sampling from Non-smooth Potentials

Data-Driven Optimal Control via Linear Transfer Operators: A Convex Approach

Improved analysis for a proximal algorithm for sampling

Inertialess Gyrating Engines

Path Integral Sampler: a stochastic control approach for sampling

Signed Graph Neural Networks: A Frequency Perspective

Stochastic thermodynamic engines under time-varying temperature profile

Thermodynamic engine powered by anisotropic fluctuations

Underdamped stochastic thermodynamic engines in contact with a heat bath with arbitrary temperature profile

Feedback Particle Filter for Collective Inference

Harvesting energy from a periodic heat bath

On the relation between information and power in stochastic thermodynamic engines

Optimal steering to invariant distributions for networks flows

A convex data-driven approach for nonlinear control synthesis

Improving Robustness via Risk Averse Distributional Reinforcement Learning

Incremental inference of collective graphical models

Maximal power output of a stochastic thermodynamic engine

Multi-marginal optimal transport and probabilistic graphical models

On Spectral Properties of Signed Laplacians with Connections to Eventual Positivity

Probabilistic Kernel Support Vector Machines

Regularized transport between singular covariance matrices

Sample-based Distributional Policy Gradient

Matrix Optimal Mass Transport: A Quantum Mechanical Approach

Optimal steering of a linear stochastic system to a final probability distribution, Part III

Regularization and Interpolation of Positive Matrices

Robust transport over networks

Stochastic control, entropic interpolation and gradient flows on Wasserstein product spaces

Entropic and displacement interpolation: a computational approach using the Hilbert metric

Fast cooling for a system of stochastic oscillators

Optimal control of the state statistics for a linear stochastic system

Optimal mass transport over bridges

Optimal transport over a linear dynamical system

Steering state statistics with output feedback

The role of the time-arrow in mean-square estimation of stochastic processes

On the relation between optimal transport and Schrödinger bridges: A stochastic control viewpoint

Optimal steering of a linear stochastic system to a final probability distribution

Optimal steering of a linear stochastic system to a final probability distribution, part II

Optimal steering of inertial particles diffusing anisotropically with losses

Stochastic bridges of linear systems