Researcher profile

Dmitri Goloubentsev

Dmitri Goloubentsev contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Machine Learning math.OC q-fin.CP q-fin.MF q-fin.RM

Trust snapshot

Quick read

Trust 11 - UnverifiedVerification L1Unclaimed author

1works

0followers

5topics

1close collaborators

Actions

Decide how to stay connected

Follow researcher0

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

SNAPO: Smooth Neural Adjoint Policy Optimization for Optimal Control via Differentiable Simulation

Many real-world problems require sequential decisions under uncertainty: when to inject or withdraw gas from storage, how to rebalance a pension portfolio each month, what temperature profile to run through a pharmaceutical reactor chain. Dynamic programming solves small instances exactly but scales exponentially in state dimensions. Black-box reinforcement learning handles high-dimensional states but trains slowly and produces no sensitivities. We introduce SNAPO (Smooth Neural Adjoint Policy Optimization), a framework that embeds a neural policy inside a known, differentiable simulator, replaces hard constraints with smooth approximations, and computes exact gradients of the objective with respect to all policy parameters and all inputs in a single adjoint pass. We demonstrate SNAPO on three domains: natural gas storage (training in under a minute, 365 forward curve sensitivities at no additional cost per sensitivity), pension fund asset-liability management (6.5x-200x sensitivity speedup over bump-and-revalue, scaling with the number of risk factors), and pharmaceutical manufacturing (cross-unit sensitivities through a 4-unit process chain, with 20 ICH Q8 regulatory sensitivities from 5 adjoint passes in 74.5 milliseconds). All sensitivities are produced by the same backward pass that trains the policy, at a cost proportional to one reverse pass regardless of how many sensitivities are computed.

Dmitri Goloubentsev

Quick read

Decide how to stay connected

How to connect with this researcher

Open a focused conversation when the fit is right

See the researcher in context

Building this graph slice

1 published item(s)

SNAPO: Smooth Neural Adjoint Policy Optimization for Optimal Control via Differentiable Simulation