Researcher profile

Yongxin Chen

Yongxin Chen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
28works
0followers
17topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

28 published item(s)

preprint2026arXiv

ACWM-Phys: Investigating Generalized Physical Interaction in Action-Conditioned Video World Models

Action-conditioned world models (ACWMs) have shown strong promise for video prediction and decision-making. However, existing benchmarks are largely restricted to egocentric navigation or narrow, task-specific robotics datasets, offering only limited coverage of the rich physical interactions required for generalized world understanding. We introduce ACWM-Phys, a new benchmark for evaluating action-conditioned prediction under diverse physical dynamics in a clean, controllable simulation environment with a carefully designed action space. ACWM-Phys contains training and evaluation data spanning rigid-body dynamics, kinematics, deformable-object interactions, and particle dynamics. To evaluate both interpolation and generalization, we design in-distribution and out-of-distribution protocols with controlled shifts in interaction patterns or scene configurations. By building the benchmark in a fully controllable simulator, ACWM-Phys enables precise data collection, reproducible evaluation, and systematic analysis of model capabilities for physically grounded world modeling. Through systematic experiments on ACWM-DiT, we find that OoD generalization depends not only on the physical regime but also on effective task complexity: models generalize well on visually simple, low-dimensional interactions with clear geometric structure, but suffer larger drops on deformable contacts, high-dimensional control, and complex articulated motion. This suggests that the model still relies heavily on visual appearance patterns instead of fully learning the underlying physics. Ablations show that cross-attention improves high-dimensional action conditioning, causal VAEs outperform frame-wise encoders, and larger action spaces are harder to model but can improve generalization by providing richer control signals. These findings guide the design of physically grounded world models.

preprint2026arXiv

Compositional Diffusion with Guided Search for Long-Horizon Planning

Generative models have emerged as powerful tools for planning, with compositional approaches offering particular promise for modeling long-horizon task distributions by composing together local, modular generative models. This compositional paradigm spans diverse domains, from multi-step manipulation planning to panoramic image synthesis to long video generation. However, compositional generative models face a critical challenge: when local distributions are multimodal, existing composition methods average incompatible modes, producing plans that are neither locally feasible nor globally coherent. We propose Compositional Diffusion with Guided Search (CDGS), which addresses this mode averaging problem by embedding search directly within the diffusion denoising process. Our method explores diverse combinations of local modes through population-based sampling, prunes infeasible candidates using likelihood-based filtering, and enforces global consistency through iterative resampling between overlapping segments. CDGS matches oracle performance on seven robot manipulation tasks, outperforming baselines that lack compositionality or require long-horizon training data. The approach generalizes across domains, enabling coherent text-guided panoramic images and long videos through effective local-to-global message passing. More details: https://cdgsearch.github.io/

preprint2026arXiv

Continuous Diffusion Scales Competitively with Discrete Diffusion for Language

While diffusion has drawn considerable recent attention from the language modeling community, continuous diffusion has appeared less scalable than discrete approaches. To challenge this belief we revisit Plaid, a likelihood-based continuous diffusion language model (DLM), and construct RePlaid by aligning the architecture of Plaid with modern discrete DLMs. In this unified setting, we establish the first scaling law for continuous DLMs that rivals discrete DLMs: RePlaid exhibits a compute gap of only $20\times$ compared to autoregressive models, outperforms Duo while using fewer parameters, and outperforms MDLM in the over-trained regime. We benchmark RePlaid against recent continuous DLMs: on OpenWebText, RePlaid achieves a new state-of-the-art PPL bound of $22.1$ among continuous DLMs and superior generation quality. These results suggest that continuous diffusion, when trained via likelihood, is a highly competitive and scalable alternative to discrete DLMs. Moreover, we offer theoretical insights to understand the advantage of likelihood-based training. We show that optimizing the noise schedule to minimize the ELBO's variance naturally yields linear cross-entropy (information loss) over time. This evenly distributes denoising difficulty without any case-specific time reparameterization. In addition, we find that optimizing embeddings via likelihood creates structured geometries and drives the most significant likelihood gain.

preprint2026arXiv

Efficient Adjoint Matching for Fine-tuning Diffusion Models

Reward fine-tuning has become a common approach for aligning pretrained diffusion and flow models with human preferences in text-to-image generation. Among reward-gradient-based methods, Adjoint Matching (AM) provides a principled formulation by casting reward fine-tuning as a stochastic optimal control (SOC) problem. However, AM inevitably requires a substantial computational cost: it requires (i) stochastic simulation of full generative trajectories under memoryless dynamics, resulting in a large number of function evaluations, and (ii) backward ODE simulation of the adjoint state along each sampled trajectory. In this work, we observe that both bottlenecks are closely tied to the \textit{non-trivial base drift} inherited from the pretrained model. Motivated by this observation, we propose \textbf{Efficient Adjoint Matching (EAM)}, which substantially improves training efficiency by reformulating the SOC problem with a \textit{linear base drift} and a correspondingly modified \textit{terminal cost}. This reformulation removes both sources of inefficiency; it enables training-time sampling with a few-step deterministic ODE solver and yields a closed-form adjoint solution that eliminates backward adjoint simulation. On standard text-to-image reward fine-tuning benchmarks, EAM converges up to 4x faster than AM and matches or surpasses it across various metrics including PickScore, ImageReward, HPSv2.1, CLIPScore and Aesthetics.

preprint2025arXiv

Efficient Iterative Proximal Variational Inference Motion Planning

We cast motion planning under uncertainty as a stochastic optimal control problem, where the optimal posterior distribution has an explicit form. To approximate this posterior, this work frames an optimization problem in the space of Gaussian distributions by solving a Variational Inference (VI) in the path distribution space. For linear-Gaussian stochastic dynamics, a proximal algorithm is proposed to solve for an optimal Gaussian proposal iteratively. The computational bottleneck is evaluating the gradients with respect to the proposal over a dense trajectory. To tackle this issue, the sparse planning factor graph and Gaussian Belief Propagation (GBP) are exploited, allowing for parallel computation of these gradients on Graphics Processing Units (GPUs). We term the novel paradigm the \textit{Parallel Gaussian Variational Inference Motion Planning (P-GVIMP)}. Building on the efficient algorithm for linear Gaussian systems, we then propose an iterative paradigm based on Statistical Linear Regression (SLR) techniques to solve planning problems for nonlinear stochastic systems, where the P-GVIMP serves as a sub-routine for the linearized time-varying system at each iteration. The proposed framework is validated on various robotic systems, demonstrating significant speed acceleration achieved by leveraging parallel computation and successful planning solutions for nonlinear systems under uncertainty. An open-sourced implementation is presented at \href{https://github.com/hzyu17/VIMP}{https://github.com/hzyu17/VIMP}.

preprint2023arXiv

Generative Skill Chaining: Long-Horizon Skill Planning with Diffusion Models

Long-horizon tasks, usually characterized by complex subtask dependencies, present a significant challenge in manipulation planning. Skill chaining is a practical approach to solving unseen tasks by combining learned skill priors. However, such methods are myopic if sequenced greedily and face scalability issues with search-based planning strategy. To address these challenges, we introduce Generative Skill Chaining~(GSC), a probabilistic framework that learns skill-centric diffusion models and composes their learned distributions to generate long-horizon plans during inference. GSC samples from all skill models in parallel to efficiently solve unseen tasks while enforcing geometric constraints. We evaluate the method on various long-horizon tasks and demonstrate its capability in reasoning about action dependencies, constraint handling, and generalization, along with its ability to replan in the face of perturbations. We show results in simulation and on real robot to validate the efficiency and scalability of GSC, highlighting its potential for advancing long-horizon task planning. More details are available at: https://generative-skill-chaining.github.io/

preprint2022arXiv

A Proximal Algorithm for Sampling from Non-smooth Potentials

In this work, we examine sampling problems with non-smooth potentials. We propose a novel Markov chain Monte Carlo algorithm for sampling from non-smooth potentials. We provide a non-asymptotical analysis of our algorithm and establish a polynomial-time complexity $\tilde {\cal O}(d\varepsilon^{-1})$ to obtain $\varepsilon$ total variation distance to the target density, better than most existing results under the same assumptions. Our method is based on the proximal bundle method and an alternating sampling framework. This framework requires the so-called restricted Gaussian oracle, which can be viewed as a sampling counterpart of the proximal mapping in convex optimization. One key contribution of this work is a fast algorithm that realizes the restricted Gaussian oracle for any convex non-smooth potential with bounded Lipschitz constant.

preprint2022arXiv

Data-Driven Optimal Control via Linear Transfer Operators: A Convex Approach

This paper is concerned with data-driven optimal control of nonlinear systems. We present a convex formulation to the optimal control problem (OCP) with a discounted cost function. We consider OCP with both positive and negative discount factor. The convex approach relies on lifting nonlinear system dynamics in the space of densities using the linear Perron-Frobenius (P-F) operator. This lifting leads to an infinite-dimensional convex optimization formulation of the optimal control problem. The data-driven approximation of the optimization problem relies on the approximation of the Koopman operator using the polynomial basis function. We write the approximate finite-dimensional optimization problem as a polynomial optimization which is then solved efficiently using a sum-of-squares-based optimization framework. Simulation results are presented to demonstrate the efficacy of the developed data-driven optimal control framework.

preprint2022arXiv

Improved analysis for a proximal algorithm for sampling

We study the proximal sampler of Lee, Shen, and Tian (2021) and obtain new convergence guarantees under weaker assumptions than strong log-concavity: namely, our results hold for (1) weakly log-concave targets, and (2) targets satisfying isoperimetric assumptions which allow for non-log-concavity. We demonstrate our results by obtaining new state-of-the-art sampling guarantees for several classes of target distributions. We also strengthen the connection between the proximal sampler and the proximal method in optimization by interpreting the proximal sampler as an entropically regularized Wasserstein proximal method, and the proximal point method as the limit of the proximal sampler with vanishing noise.

preprint2022arXiv

Inertialess Gyrating Engines

A typical model for a gyrating engine consists of an inertial wheel powered by an energy source that generates an angle-dependent torque. Examples of such engines include a pendulum with an externally applied torque, Stirling engines, and the Brownian gyrating engine. Variations in the torque are averaged out by the inertia of the system to produce limit cycle oscillations. While torque generating mechanisms are also ubiquitous in the biological world, where they typically feed on chemical gradients, inertia is not a property that one naturally associates with such processes. In the present work, seeking ways to dispense of the need for inertial effects, we study an inertia-less concept where the combined effect of coupled torque-producing components averages out variations in the ambient potential and helps overcome dissipative forces to allow sustained operation for vanishingly small inertia. We exemplify this inertia-less concept through analysis of two of the aforementioned engines, the Stirling engine and the Brownian gyrating engine. An analogous principle may be sought in biomolecular processes as well as in modern-day technological engines, where for the latter, the coupled torque-producing components reduce vibrations that stem from the variability of the generated torque.

preprint2022arXiv

Path Integral Sampler: a stochastic control approach for sampling

We present Path Integral Sampler~(PIS), a novel algorithm to draw samples from unnormalized probability density functions. The PIS is built on the Schrödinger bridge problem which aims to recover the most likely evolution of a diffusion process given its initial distribution and terminal distribution. The PIS draws samples from the initial distribution and then propagates the samples through the Schrödinger bridge to reach the terminal distribution. Applying the Girsanov theorem, with a simple prior diffusion, we formulate the PIS as a stochastic optimal control problem whose running cost is the control energy and terminal cost is chosen according to the target distribution. By modeling the control as a neural network, we establish a sampling algorithm that can be trained end-to-end. We provide theoretical justification of the sampling quality of PIS in terms of Wasserstein distance when sub-optimal control is used. Moreover, the path integrals theory is used to compute importance weights of the samples to compensate for the bias induced by the sub-optimality of the controller and time-discretization. We experimentally demonstrate the advantages of PIS compared with other start-of-the-art sampling methods on a variety of tasks.

preprint2022arXiv

Signed Graph Neural Networks: A Frequency Perspective

Graph convolutional networks (GCNs) and its variants are designed for unsigned graphs containing only positive links. Many existing GCNs have been derived from the spectral domain analysis of signals lying over (unsigned) graphs and in each convolution layer they perform low-pass filtering of the input features followed by a learnable linear transformation. Their extension to signed graphs with positive as well as negative links imposes multiple issues including computational irregularities and ambiguous frequency interpretation, making the design of computationally efficient low pass filters challenging. In this paper, we address these issues via spectral analysis of signed graphs and propose two different signed graph neural networks, one keeps only low-frequency information and one also retains high-frequency information. We further introduce magnetic signed Laplacian and use its eigendecomposition for spectral analysis of directed signed graphs. We test our methods for node classification and link sign prediction tasks on signed graphs and achieve state-of-the-art performances.

preprint2022arXiv

Stochastic thermodynamic engines under time-varying temperature profile

In the present paper, we study the power output and efficiency of overdamped stochastic thermodynamic engines that are in contact with a heat bath having a temperature that varies periodically with time. This is in contrast to most of the existing literature that considers the Carnot paradigm of alternating contact with heat baths having different fixed temperatures, hot and cold. Specifically, we consider a periodic and bounded but otherwise arbitrary temperature profile and derive explicit bounds on the power and efficiency achievable by a suitable controlling potential that couples the thermodynamic engine to the external world. Standing assumptions in our analysis are bounds on the norm of the gradient of effective potentials -- in the absence of any such constraint, the physically questionable conclusion of arbitrarily large power can be drawn.

preprint2022arXiv

Thermodynamic engine powered by anisotropic fluctuations

The purpose of this work is to present the concept of an autonomous Stirling-like engine powered by anisotropy of thermodynamic fluctuations. Specifically, simultaneous contact of a thermodynamic system with two heat baths along coupled degrees of freedom generates torque and circulatory currents -- an arrangement referred to as a Brownian gyrator. The embodiment that constitutes the engine includes an inertial wheel to sustain rotary motion and average out the generated fluctuating torque, ultimately delivering power to an external load. We detail an electrical model for such an engine that consists of two resistors in different temperatures and three reactive elements in the form of variable capacitors. The resistors generate Johnson-Nyquist current fluctuations that power the engine, while the capacitors generate driving forces via a coupling of their dielectric material with the inertial wheel. A proof-of-concept is established via stability analysis to ensure the existence of a stable periodic orbit generating sustained power output. We conclude by drawing a connection to the dynamics of a damped pendulum with constant torque and to those of a macroscopic Stirling engine. The sought insights aim at nano-engines and biological processes that are similarly powered by anisotropy in temperature and chemical potentials.

preprint2022arXiv

Underdamped stochastic thermodynamic engines in contact with a heat bath with arbitrary temperature profile

We study thermodynamic processes in contact with a heat bath that may have an arbitrary time-varying periodic temperature profile. Within the framework of stochastic thermodynamics, and for models of thermo-dynamic engines in the idealized case of underdamped particles in the low-friction regime, we derive explicit bounds as well as optimal control protocols that draw maximum power and achieve maximum efficiency at any specified level of power.

preprint2021arXiv

Feedback Particle Filter for Collective Inference

The purpose of this paper is to describe the feedback particle filter algorithm for problems where there are a large number ($M$) of non-interacting agents (targets) with a large number ($M$) of non-agent specific observations (measurements) that originate from these agents. In its basic form, the problem is characterized by data association uncertainty whereby the association between the observations and agents must be deduced in addition to the agent state. In this paper, the large-$M$ limit is interpreted as a problem of collective inference. This viewpoint is used to derive the equation for the empirical distribution of the hidden agent states. A feedback particle filter (FPF) algorithm for this problem is presented and illustrated via numerical simulations. Results are presented for the Euclidean and the finite state-space cases, both in continuous-time settings. The classical FPF algorithm is shown to be the special case (with $M=1$) of these more general results. The simulations help show that the algorithm well approximates the empirical distribution of the hidden states for large $M$.

preprint2021arXiv

Harvesting energy from a periodic heat bath

The context of the present paper is stochastic thermodynamics - an approach to nonequilibrium thermodynamics rooted within the broader framework of stochastic control. In contrast to the classical paradigm of Carnot engines, we herein propose to consider thermodynamic processes with periodic continuously varying temperature of a heat bath and study questions of maximal power and efficiency for two idealized cases, overdamped (first-order) and underdamped (second-order) stochastic models. We highlight properties of optimal periodic control, derive and numerically validate approximate formulae for the optimal performance (power and efficiency).

preprint2021arXiv

On the relation between information and power in stochastic thermodynamic engines

The common saying, that information is power, takes a rigorous form in stochastic thermodynamics, where a quantitative equivalence between the two helps explain the paradox of Maxwell's demon in its ability to reduce entropy. In the present paper, we build on earlier work on the interplay between the relative cost and benefits of information in producing work in cyclic operation of thermodynamic engines (by Sandberg etal. 2014). Specifically, we study the general case of overdamped particles in a time-varying potential (control action) in feedback that utilizes continuous measurements (nonlinear filtering) of a thermodynamic ensemble, to produce suitable adaptations of the second law of thermodynamics that involve information.

preprint2021arXiv

Optimal steering to invariant distributions for networks flows

We derive novel results on the ergodic theory of irreducible, aperiodic Markov chains. We show how to optimally steer the network flow to a stationary distribution over a finite or infinite time horizon. Optimality is with respect to an entropic distance between distributions on feasible paths. When the prior is reversible, it shown that solutions to this discrete time and space steering problem are reversible as well. A notion of temperature is defined for Boltzmann distributions on networks, and problems analogous to cooling (in this case, for evolutions in discrete space and time) are discussed.

preprint2020arXiv

A convex data-driven approach for nonlinear control synthesis

We consider a class of nonlinear control synthesis problems where the underlying mathematical models are not explicitly known. We propose a data-driven approach to stabilize the systems when only sample trajectories of the dynamics are accessible. Our method is founded on the density function based almost everywhere stability certificate that is dual to the Lyapunov function for dynamic systems. Unlike Lyapunov based methods, density functions lead to a convex formulation for a joint search of the control strategy and the stability certificate. This type of convex problem can be solved efficiently by invoking the machinery of the sum of squares (SOS). For the data-driven part, we exploit the fact that the duality results in the stability theory of the dynamical system can be understood using linear Perron-Frobenius and Koopman operators. This connection allows us to use data-driven methods developed to approximate these operators combined with the SOS techniques for the convex formulation of control synthesis. The efficacy of the proposed approach is demonstrated through several examples.

preprint2020arXiv

Improving Robustness via Risk Averse Distributional Reinforcement Learning

One major obstacle that precludes the success of reinforcement learning in real-world applications is the lack of robustness, either to model uncertainties or external disturbances, of the trained policies. Robustness is critical when the policies are trained in simulations instead of real world environment. In this work, we propose a risk-aware algorithm to learn robust policies in order to bridge the gap between simulation training and real-world implementation. Our algorithm is based on recently discovered distributional RL framework. We incorporate CVaR risk measure in sample based distributional policy gradients (SDPG) for learning risk-averse policies to achieve robustness against a range of system disturbances. We validate the robustness of risk-aware SDPG on multiple environments.

preprint2020arXiv

Incremental inference of collective graphical models

We consider incremental inference problems from aggregate data for collective dynamics. In particular, we address the problem of estimating the aggregate marginals of a Markov chain from noisy aggregate observations in an incremental (online) fashion. We propose a sliding window Sinkhorn belief propagation (SW-SBP) algorithm that utilizes a sliding window filter of the most recent noisy aggregate observations along with encoded information from discarded observations. Our algorithm is built upon the recently proposed multi-marginal optimal transport based SBP algorithm that leverages standard belief propagation and Sinkhorn algorithm to solve inference problems from aggregate data. We demonstrate the performance of our algorithm on applications such as inferring population flow from aggregate observations.

preprint2020arXiv

Maximal power output of a stochastic thermodynamic engine

Classical thermodynamics aimed to quantify the efficiency of thermodynamic engines by bounding the maximal amount of mechanical energy produced compared to the amount of heat required. While this was accomplished early on, by Carnot and Clausius, the more practical problem to quantify limits of power that can be delivered, remained elusive due to the fact that quasistatic processes require infinitely slow cycling, resulting in a vanishing power output. Recent insights, drawn from stochastic models, appear to bridge the gap between theory and practice in that they lead to physically meaningful expressions for the dissipation cost in operating a thermodynamic engine over a finite time window. Building on this framework of {\em stochastic thermodynamics} we derive bounds on the maximal power that can be drawn by cycling an overdamped ensemble of particles via a time-varying potential while alternating contact with heat baths of different temperature ($T_c$ cold, and $T_h$ hot). Specifically, assuming a suitable bound $M$ on the spatial gradient of the controlling potential, we show that the maximal achievable power is bounded by $\frac{M}{8}(\frac{T_h}{T_c}-1)$. Moreover, we show that this bound can be reached to within a factor of $(\frac{T_h}{T_c}-1)/(\frac{T_h}{T_c}+1)$ by operating the cyclic thermodynamic process with a quadratic potential.

preprint2020arXiv

Multi-marginal optimal transport and probabilistic graphical models

We study multi-marginal optimal transport problems from a probabilistic graphical model perspective. We point out an elegant connection between the two when the underlying cost for optimal transport allows a graph structure. In particular, an entropy regularized multi-marginal optimal transport is equivalent to a Bayesian marginal inference problem for probabilistic graphical models with the additional requirement that some of the marginal distributions are specified. This relation on the one hand extends the optimal transport as well as the probabilistic graphical model theories, and on the other hand leads to fast algorithms for multi-marginal optimal transport by leveraging the well-developed algorithms in Bayesian inference. Several numerical examples are provided to highlight the results.

preprint2020arXiv

On Spectral Properties of Signed Laplacians with Connections to Eventual Positivity

Signed graphs have appeared in a broad variety of applications, ranging from social networks to biological networks, from distributed control and computation to power systems. In this paper, we investigate spectral properties of signed Laplacians for undirected signed graphs. We find conditions on the negative weights under which a signed Laplacian is positive semidefinite via the Kron reduction and multiport network theory. For signed Laplacians that are indefinite, we characterize their inertias with the same framework. Furthermore, we build connections between signed Laplacians, generalized M-matrices, and eventually exponentially positive matrices.

preprint2020arXiv

Probabilistic Kernel Support Vector Machines

We propose a probabilistic enhancement of standard kernel Support Vector Machines for binary classification, in order to address the case when, along with given data sets, a description of uncertainty (e.g., error bounds) may be available on each datum. In the present paper, we specifically consider Gaussian distributions to model uncertainty. Thereby, our data consist of pairs $(x_i,Σ_i)$, $i\in\{1,\ldots,N\}$, along with an indicator $y_i\in\{-1,1\}$ to declare membership in one of two categories for each pair. These pairs may be viewed to represent the mean and covariance, respectively, of random vectors $ξ_i$ taking values in a suitable linear space (typically $\mathbb R^n$). Thus, our setting may also be viewed as a modification of Support Vector Machines to classify distributions, albeit, at present, only Gaussian ones. We outline the formalism that allows computing suitable classifiers via a natural modification of the standard "kernel trick." The main contribution of this work is to point out a suitable kernel function for applying Support Vector techniques to the setting of uncertain data for which a detailed uncertainty description is also available (herein, "Gaussian points").

preprint2020arXiv

Regularized transport between singular covariance matrices

We consider the problem of steering a linear stochastic system between two end-point degenerate Gaussian distributions in finite time. This accounts for those situations in which some but not all of the state entries are uncertain at the initial, t = 0, and final time, t = T . This problem entails non-trivial technical challenges as the singularity of terminal state-covariance causes the control to grow unbounded at the final time T. Consequently, the entropic interpolation (Schroedinger Bridge) is provided by a diffusion process which is not finite-energy, thereby placing this case outside of most of the current theory. In this paper, we show that a feasible interpolation can be derived as a limiting case of earlier results for non-degenerate cases, and that it can be expressed in closed form. Moreover, we show that such interpolation belongs to the same reciprocal class of the uncontrolled evolution. By doing so we also highlight a time-symmetry of the problem, contrasting dual formulations in the forward and reverse time-directions, where in each the control grows unbounded as time approaches the end-point (in the forward and reverse time-direction, respectively).

preprint2020arXiv

Sample-based Distributional Policy Gradient

Distributional reinforcement learning (DRL) is a recent reinforcement learning framework whose success has been supported by various empirical studies. It relies on the key idea of replacing the expected return with the return distribution, which captures the intrinsic randomness of the long term rewards. Most of the existing literature on DRL focuses on problems with discrete action space and value based methods. In this work, motivated by applications in robotics with continuous action space control settings, we propose sample-based distributional policy gradient (SDPG) algorithm. It models the return distribution using samples via a reparameterization technique widely used in generative modeling and inference. We compare SDPG with the state-of-art policy gradient method in DRL, distributed distributional deterministic policy gradients (D4PG), which has demonstrated state-of-art performance. We apply SDPG and D4PG to multiple OpenAI Gym environments and observe that our algorithm shows better sample efficiency as well as higher reward for most tasks.