Source author record

Lars Ruthotto

Lars Ruthotto appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning math.NA Numerical Analysis Computer Vision math.DS math.PR Mathematical Software

Catalog footprint

What is connected

9works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Differentiating through Stochastic Differential Equations: A Primer

Dynamical systems are essential to model various phenomena in physics, finance, economics, and are also of current interest in machine learning. A central modeling task is investigating parameter sensitivity, whether tuning atmospheric coefficients, computing financial Greeks, or optimizing neural networks. These sensitivities are mathematically expressed as derivatives of an objective function with respect to parameters of interest and are rarely available analytically, necessitating numerical methods for approximating them. While the literature for differentiation of deterministic systems is well-covered, the treatment of stochastic systems, such as stochastic differential equations (SDEs), in most curricula is less comprehensive than the subtleties arising from the interplay of noise and discretization require. This paper provides a primer on numerical differentiation of SDEs organized as a two-tale narrative. Tale 1 demonstrates differentiating through discretized SDEs, known the discretize-optimize approach, is reliable for both Itô and Stratonovich calculus. Tale 2 examines the optimize-discretize approach, investigating the continuous limit of backward equations from Tale 1 corresponding to the desired gradients. Our aim is to equip readers with a clear guide on the numerical differentiation of SDEs: computing gradients correctly in both Itô and Stratonovich settings, understanding when discretize-optimize and optimize-discretize agree or diverge, and developing intuition for reasoning about stochastic differentiation beyond the cases explicitly covered.

preprint2024arXiv

Differential Equations for Continuous-Time Deep Learning

This short, self-contained article seeks to introduce and survey continuous-time deep learning approaches that are based on neural ordinary differential equations (neural ODEs). It primarily targets readers familiar with ordinary and partial differential equations and their analysis who are curious to see their role in machine learning. Using three examples from machine learning and applied mathematics, we will see how neural ODEs can provide new insights into deep learning and a foundation for more efficient algorithms.

preprint2022arXiv

A Neural Network Approach for High-Dimensional Optimal Control Applied to Multi-Agent Path Finding

We propose a neural network approach that yields approximate solutions for high-dimensional optimal control problems and demonstrate its effectiveness using examples from multi-agent path finding. Our approach yields controls in a feedback form, where the policy function is given by a neural network (NN). Specifically, we fuse the Hamilton-Jacobi-Bellman (HJB) and Pontryagin Maximum Principle (PMP) approaches by parameterizing the value function with an NN. Our approach enables us to obtain approximately optimal controls in real-time without having to solve an optimization problem. Once the policy function is trained, generating a control at a given space-time location takes milliseconds; in contrast, efficient nonlinear programming methods typically perform the same task in seconds. We train the NN offline using the objective function of the control problem and penalty terms that enforce the HJB equations. Therefore, our training algorithm does not involve data generated by another algorithm. By training on a distribution of initial states, we ensure the controls' optimality on a large portion of the state-space. Our grid-free approach scales efficiently to dimensions where grids become impractical or infeasible. We apply our approach to several multi-agent collision-avoidance problems in up to 150 dimensions. Furthermore, we empirically observe that the number of parameters in our approach scales linearly with the dimension of the control problem, thereby mitigating the curse of dimensionality.

preprint2021arXiv

A Neural Network Approach Applied to Multi-Agent Optimal Control

We propose a neural network approach for solving high-dimensional optimal control problems. In particular, we focus on multi-agent control problems with obstacle and collision avoidance. These problems immediately become high-dimensional, even for moderate phase-space dimensions per agent. Our approach fuses the Pontryagin Maximum Principle and Hamilton-Jacobi-Bellman (HJB) approaches and parameterizes the value function with a neural network. Our approach yields controls in a feedback form for quick calculation and robustness to moderate disturbances to the system. We train our model using the objective function and optimality conditions of the control problem. Therefore, our training algorithm neither involves a data generation phase nor solutions from another algorithm. Our model uses empirically effective HJB penalizers for efficient training. By training on a distribution of initial states, we ensure the controls' optimality is achieved on a large portion of the state-space. Our approach is grid-free and scales efficiently to dimensions where grids become impractical or infeasible. We demonstrate our approach's effectiveness on a 150-dimensional multi-agent problem with obstacles.

preprint2020arXiv

A Machine Learning Framework for Solving High-Dimensional Mean Field Game and Mean Field Control Problems

Mean field games (MFG) and mean field control (MFC) are critical classes of multi-agent models for efficient analysis of massive populations of interacting agents. Their areas of application span topics in economics, finance, game theory, industrial engineering, crowd motion, and more. In this paper, we provide a flexible machine learning framework for the numerical solution of potential MFG and MFC models. State-of-the-art numerical methods for solving such problems utilize spatial discretization that leads to a curse-of-dimensionality. We approximately solve high-dimensional problems by combining Lagrangian and Eulerian viewpoints and leveraging recent advances from machine learning. More precisely, we work with a Lagrangian formulation of the problem and enforce the underlying Hamilton-Jacobi-Bellman (HJB) equation that is derived from the Eulerian formulation. Finally, a tailored neural network parameterization of the MFG/MFC solution helps us avoid any spatial discretization. Our numerical results include the approximate solution of 100-dimensional instances of optimal transport and crowd motion problems on a standard work station and a validation using an Eulerian solver in two dimensions. These results open the door to much-anticipated applications of MFG and MFC models that were beyond reach with existing numerical methods.

preprint2020arXiv

Discretize-Optimize vs. Optimize-Discretize for Time-Series Regression and Continuous Normalizing Flows

We compare the discretize-optimize (Disc-Opt) and optimize-discretize (Opt-Disc) approaches for time-series regression and continuous normalizing flows (CNFs) using neural ODEs. Neural ODEs are ordinary differential equations (ODEs) with neural network components. Training a neural ODE is an optimal control problem where the weights are the controls and the hidden features are the states. Every training iteration involves solving an ODE forward and another backward in time, which can require large amounts of computation, time, and memory. Comparing the Opt-Disc and Disc-Opt approaches in image classification tasks, Gholami et al. (2019) suggest that Disc-Opt is preferable due to the guaranteed accuracy of gradients. In this paper, we extend the comparison to neural ODEs for time-series regression and CNFs. Unlike in classification, meaningful models in these tasks must also satisfy additional requirements beyond accurate final-time output, e.g., the invertibility of the CNF. Through our numerical experiments, we demonstrate that with careful numerical treatment, Disc-Opt methods can achieve similar performance as Opt-Disc at inference with drastically reduced training costs. Disc-Opt reduced costs in six out of seven separate problems with training time reduction ranging from 39% to 97%, and in one case, Disc-Opt reduced training from nine days to less than one day.

preprint2020arXiv

LeanConvNets: Low-cost Yet Effective Convolutional Neural Networks

Convolutional Neural Networks (CNNs) have become indispensable for solving machine learning tasks in speech recognition, computer vision, and other areas that involve high-dimensional data. A CNN filters the input feature using a network containing spatial convolution operators with compactly supported stencils. In practice, the input data and the hidden features consist of a large number of channels, which in most CNNs are fully coupled by the convolution operators. This coupling leads to immense computational cost in the training and prediction phase. In this paper, we introduce LeanConvNets that are derived by sparsifying fully-coupled operators in existing CNNs. Our goal is to improve the efficiency of CNNs by reducing the number of weights, floating point operations and latency times, with minimal loss of accuracy. Our lean convolution operators involve tuning parameters that controls the trade-off between the network's accuracy and computational costs. These convolutions can be used in a wide range of existing networks, and we exemplify their use in residual networks (ResNets). Using a range of benchmark problems from image classification and semantic segmentation, we demonstrate that the resulting LeanConvNet's accuracy is close to state-of-the-art networks while being computationally less expensive. In our tests, the lean versions of ResNet in most cases outperform comparable reduced architectures such as MobileNets and ShuffleNets.

preprint2016arXiv

Efficient Numerical Optimization For Susceptibility Artifact Correction Of EPI-MRI

We present two efficient numerical methods for susceptibility artifact correction applicable in Echo Planar Imaging (EPI), an ultra fast Magnetic Resonance Imaging (MRI) technique widely used in clinical applications. Both methods address a major practical drawback of EPI, the so-called susceptibility artifacts, which consist of geometrical transformations and intensity modulations. We consider a tailored variational image registration problem that is based on a physical distortion model and aims at minimizing the distance of two oppositely distorted images subject to invertibility constraints. We follow a discretize-then-optimize approach and present a novel face-staggered discretization yielding a separable structure in the discretized distance function and the invertibility constraints. The presence of a smoothness regularizer renders the overall optimization problem non-separable, but we present two optimization schemes that exploit the partial separability. First, we derive a block-Jacobi preconditioner to be used in a Gauss-Newton-PCG method. Second, we consider a splitting of the separable and non-separable part and solve the resulting problem using the Alternating Direction Method of Multipliers (ADMM). We provide a detailed convergence proof for ADMM for this non-convex optimization problem. Both schemes are of essentially linear complexity and are suitable for parallel computing. A considerable advantage of the proposed schemes over established methods is the reduced time-to-solution. In our numerical experiment using high-resolution 3D imaging data, our parallel implementation of the ADMM method solves a 3D problem with more than 5 million degrees of freedom in less than 50 seconds on a standard laptop, which is a considerable improvement over existing methods.

preprint2016arXiv

jInv -- a flexible Julia package for PDE parameter estimation

Estimating parameters of Partial Differential Equations (PDEs) from noisy and indirect measurements often requires solving ill-posed inverse problems. These so called parameter estimation or inverse medium problems arise in a variety of applications such as geophysical, medical imaging, and nondestructive testing. Their solution is computationally intense since the underlying PDEs need to be solved numerous times until the reconstruction of the parameters is sufficiently accurate. Typically, the computational demand grows significantly when more measurements are available, which poses severe challenges to inversion algorithms as measurement devices become more powerful. In this paper we present jInv, a flexible framework and open source software that provides parallel algorithms for solving parameter estimation problems with many measurements. Being written in the expressive programming language Julia, jInv is portable, easy to understand and extend, cross-platform tested, and well-documented. It provides novel parallelization schemes that exploit the inherent structure of many parameter estimation problems and can be used to solve multiphysics inversion problems as is demonstrated using numerical experiments motivated by geophysical imaging.

Lars Ruthotto

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Differentiating through Stochastic Differential Equations: A Primer

Differential Equations for Continuous-Time Deep Learning

A Neural Network Approach for High-Dimensional Optimal Control Applied to Multi-Agent Path Finding

A Neural Network Approach Applied to Multi-Agent Optimal Control

A Machine Learning Framework for Solving High-Dimensional Mean Field Game and Mean Field Control Problems

Discretize-Optimize vs. Optimize-Discretize for Time-Series Regression and Continuous Normalizing Flows

LeanConvNets: Low-cost Yet Effective Convolutional Neural Networks

Efficient Numerical Optimization For Susceptibility Artifact Correction Of EPI-MRI

jInv -- a flexible Julia package for PDE parameter estimation