Source author record

Arnulf Jentzen

Arnulf Jentzen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.NA math.PR Numerical Analysis Machine Learning math.OC math.AP math.ST Statistics Theory Neural and Evolutionary Computing Artificial Intelligence math.LO

Catalog footprint

What is connected

34works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

SAD Neural Networks: Divergent Gradient Flows and Asymptotic Optimality via o-minimal Structures

We study gradient flows for loss landscapes of fully connected feedforward neural networks with commonly used continuously differentiable activation functions such as the logistic, hyperbolic tangent, softplus or GELU function. We prove that the gradient flow either converges to a critical point or diverges to infinity while the loss converges to an asymptotic critical value. Moreover, we prove the existence of a threshold $\varepsilon>0$ such that the loss value of any gradient flow initialized at most $\varepsilon$ above the optimal level converges to it. For polynomial target functions and sufficiently big architecture and data set, we prove that the optimal loss value is zero and can only be realized asymptotically. From this setting, we deduce our main result that any gradient flow with sufficiently good initialization diverges to infinity. Our proof heavily relies on the geometry of o-minimal structures. We confirm these theoretical findings with numerical experiments and extend our investigation to more realistic scenarios, where we observe an analogous behavior.

preprint2023arXiv

The necessity of depth for artificial neural networks to approximate certain classes of smooth and bounded functions without the curse of dimensionality

In this article we study high-dimensional approximation capacities of shallow and deep artificial neural networks (ANNs) with the rectified linear unit (ReLU) activation. In particular, it is a key contribution of this work to reveal that for all $a,b\in\mathbb{R}$ with $b-a\geq 7$ we have that the functions $[a,b]^d\ni x=(x_1,\dots,x_d)\mapsto\prod_{i=1}^d x_i\in\mathbb{R}$ for $d\in\mathbb{N}$ as well as the functions $[a,b]^d\ni x =(x_1,\dots, x_d)\mapsto\sin(\prod_{i=1}^d x_i) \in \mathbb{R} $ for $ d \in \mathbb{N} $ can neither be approximated without the curse of dimensionality by means of shallow ANNs nor insufficiently deep ANNs with ReLU activation but can be approximated without the curse of dimensionality by sufficiently deep ANNs with ReLU activation. We show that the product functions and the sine of the product functions are polynomially tractable approximation problems among the approximating class of deep ReLU ANNs with the number of hidden layers being allowed to grow in the dimension $ d \in \mathbb{N} $. We establish the above outlined statements not only for the product functions and the sine of the product functions but also for other classes of target functions, in particular, for classes of uniformly globally bounded $ C^{ \infty } $-functions with compact support on any $[a,b]^d$ with $a\in\mathbb{R}$, $b\in(a,\infty)$. Roughly speaking, in this work we lay open that simple approximation problems such as approximating the sine or cosine of products cannot be solved in standard implementation frameworks by shallow or insufficiently deep ANNs with ReLU activation in polynomial time, but can be approximated by sufficiently deep ReLU ANNs with the number of parameters growing at most polynomially.

preprint2022arXiv

Deep learning approximations for non-local nonlinear PDEs with Neumann boundary conditions

Nonlinear partial differential equations (PDEs) are used to model dynamical processes in a large number of scientific fields, ranging from finance to biology. In many applications standard local models are not sufficient to accurately account for certain non-local phenomena such as, e.g., interactions at a distance. In order to properly capture these phenomena non-local nonlinear PDE models are frequently employed in the literature. In this article we propose two numerical methods based on machine learning and on Picard iterations, respectively, to approximately solve non-local nonlinear PDEs. The proposed machine learning-based method is an extended variant of a deep learning-based splitting-up type approximation method previously introduced in the literature and utilizes neural networks to provide approximate solutions on a subset of the spatial domain of the solution. The Picard iterations-based method is an extended variant of the so-called full history recursive multilevel Picard approximation scheme previously introduced in the literature and provides an approximate solution for a single point of the domain. Both methods are mesh-free and allow non-local nonlinear PDEs with Neumann boundary conditions to be solved in high dimensions. In the two methods, the numerical difficulties arising due to the dimensionality of the PDEs are avoided by (i) using the correspondence between the expected trajectory of reflected stochastic processes and the solution of PDEs (given by the Feynman-Kac formula) and by (ii) using a plain vanilla Monte Carlo integration to handle the non-local term. We evaluate the performance of the two methods on five different PDEs arising in physics and biology. In all cases, the methods yield good results in up to 10 dimensions with short run times. Our work extends recently developed methods to overcome the curse of dimensionality in solving PDEs.

preprint2022arXiv

Landscape analysis for shallow neural networks: complete classification of critical points for affine target functions

In this paper, we analyze the landscape of the true loss of neural networks with one hidden layer and ReLU, leaky ReLU, or quadratic activation. In all three cases, we provide a complete classification of the critical points in the case where the target function is affine and one-dimensional. In particular, we show that there exist no local maxima and clarify the structure of saddle points. Moreover, we prove that non-global local minima can only be caused by `dead' ReLU neurons. In particular, they do not appear in the case of leaky ReLU or quadratic activation. Our approach is of a combinatorial nature and builds on a careful analysis of the different types of hidden neurons that can occur.

preprint2022arXiv

Normalized gradient flow optimization in the training of ReLU artificial neural networks

The training of artificial neural networks (ANNs) is nowadays a highly relevant algorithmic procedure with many applications in science and industry. Roughly speaking, ANNs can be regarded as iterated compositions between affine linear functions and certain fixed nonlinear functions, which are usually multidimensional versions of a one-dimensional so-called activation function. The most popular choice of such a one-dimensional activation function is the rectified linear unit (ReLU) activation function which maps a real number to its positive part $ \mathbb{R} \ni x \mapsto \max\{ x, 0 \} \in \mathbb{R} $. In this article we propose and analyze a modified variant of the standard training procedure of such ReLU ANNs in the sense that we propose to restrict the negative gradient flow dynamics to a large submanifold of the ANN parameter space, which is a strict $ C^{ \infty } $-submanifold of the entire ANN parameter space that seems to enjoy better regularity properties than the entire ANN parameter space but which is also sufficiently large and sufficiently high dimensional so that it can represent all ANN realization functions that can be represented through the entire ANN parameter space. In the special situation of shallow ANNs with just one-dimensional ANN layers we also prove for every Lipschitz continuous target function that every gradient flow trajectory on this large submanifold of the ANN parameter space is globally bounded. For the standard gradient flow on the entire ANN parameter space with Lipschitz continuous target functions it remains an open problem of research to prove or disprove the global boundedness of gradient flow trajectories even in the situation of shallow ANNs with just one-dimensional ANN layers.

preprint2022arXiv

On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks

In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number of hidden layers and we prove convergence of the risk of the GD optimization method with random initializations in the training of such ANNs under the assumption that the unnormalized probability density function of the probability distribution of the input data of the considered supervised learning problem is piecewise polynomial, under the assumption that the target function (describing the relationship between input data and the output data) is piecewise polynomial, and under the assumption that the risk function of the considered supervised learning problem admits at least one regular global minimum. In addition, in the special situation of shallow ANNs with just one hidden layer and one-dimensional input we also verify this assumption by proving in the training of such shallow ANNs that for every Lipschitz continuous target function there exists a global minimum in the risk landscape. Finally, in the training of deep ANNs with ReLU activation we also study solutions of gradient flow (GF) differential equations and we prove that every non-divergent GF trajectory converges with a polynomial rate of convergence to a critical point (in the sense of limiting Fréchet subdifferentiability). Our mathematical convergence analysis builds up on ideas from our previous article Eberle et al., on tools from real algebraic geometry such as the concept of semi-algebraic functions and generalized Kurdyka-Lojasiewicz inequalities, on tools from functional analysis such as the Arzelà-Ascoli theorem, on tools from nonsmooth analysis such as the concept of limiting Fréchet subgradients, as well as on the fact that the set of realization functions of shallow ReLU ANNs with fixed architecture forms a closed subset of the set of continuous functions revealed by Petersen et al.

preprint2022arXiv

On the existence of infinitely many realization functions of non-global local minima in the training of artificial neural networks with ReLU activation

Gradient descent (GD) type optimization schemes are the standard instruments to train fully connected feedforward artificial neural networks (ANNs) with rectified linear unit (ReLU) activation and can be considered as temporal discretizations of solutions of gradient flow (GF) differential equations. It has recently been proved that the risk of every bounded GF trajectory converges in the training of ANNs with one hidden layer and ReLU activation to the risk of a critical point. Taking this into account it is one of the key research issues in the mathematical convergence analysis of GF trajectories and GD type optimization schemes, respectively, to study sufficient and necessary conditions for critical points of the risk function and, thereby, to obtain an understanding about the appearance of critical points in dependence of the problem parameters such as the target function. In the first main result of this work we prove in the training of ANNs with one hidden layer and ReLU activation that for every $ a, b \in \mathbb{R} $ with $ a < b $ and every arbitrarily large $ δ> 0 $ we have that there exists a Lipschitz continuous target function $ f \colon [a,b] \to \mathbb{R} $ such that for every number $ H > 1 $ of neurons on the hidden layer we have that the risk function has uncountably many different realization functions of non-global local minimum points whose risks are strictly larger than the sum of the risk of the global minimum points and the arbitrarily large $ δ$. In the second main result of this work we show in the training of ANNs with one hidden layer and ReLU activation in the special situation where there is only one neuron on the hidden layer and where the target function is continuous and piecewise polynomial that there exist at most finitely many different realization functions of critical points.

preprint2021arXiv

A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions

Gradient descent optimization algorithms are the standard ingredients that are used to train artificial neural networks (ANNs). Even though a huge number of numerical simulations indicate that gradient descent optimization methods do indeed convergence in the training of ANNs, until today there is no rigorous theoretical analysis which proves (or disproves) this conjecture. In particular, even in the case of the most basic variant of gradient descent optimization algorithms, the plain vanilla gradient descent method, it remains an open problem to prove or disprove the conjecture that gradient descent converges in the training of ANNs. In this article we solve this problem in the special situation where the target function under consideration is a constant function. More specifically, in the case of constant target functions we prove in the training of rectified fully-connected feedforward ANNs with one-hidden layer that the risk function of the gradient descent method does indeed converge to zero. Our mathematical analysis strongly exploits the property that the rectifier function is the activation function used in the considered ANNs. A key contribution of this work is to explicitly specify a Lyapunov function for the gradient flow system of the ANN parameters. This Lyapunov function is the central tool in our convergence proof of the gradient descent method.

preprint2021arXiv

Full history recursive multilevel Picard approximations for ordinary differential equations with expectations

We consider ordinary differential equations (ODEs) which involve expectations of a random variable. These ODEs are special cases of McKean-Vlasov stochastic differential equations (SDEs). A plain vanilla Monte Carlo approximation method for such ODEs requires a computational cost of order $\varepsilon^{-3}$ to achieve a root-mean-square error of size $\varepsilon$. In this work we adapt recently introduced full history recursive multilevel Picard (MLP) algorithms to reduce this computational complexity. Our main result shows for every $δ>0$ that the proposed MLP approximation algorithm requires only a computational effort of order $\varepsilon^{-(2+δ)}$ to achieve a root-mean-square error of size $\varepsilon$.

preprint2021arXiv

Lower bounds for artificial neural network approximations: A proof that shallow neural networks fail to overcome the curse of dimensionality

Artificial neural networks (ANNs) have become a very powerful tool in the approximation of high-dimensional functions. Especially, deep ANNs, consisting of a large number of hidden layers, have been very successfully used in a series of practical relevant computational problems involving high-dimensional input data ranging from classification tasks in supervised learning to optimal decision problems in reinforcement learning. There are also a number of mathematical results in the scientific literature which study the approximation capacities of ANNs in the context of high-dimensional target functions. In particular, there are a series of mathematical results in the scientific literature which show that sufficiently deep ANNs have the capacity to overcome the curse of dimensionality in the approximation of certain target function classes in the sense that the number of parameters of the approximating ANNs grows at most polynomially in the dimension $d \in \mathbb{N}$ of the target functions under considerations. In the proofs of several of such high-dimensional approximation results it is crucial that the involved ANNs are sufficiently deep and consist a sufficiently large number of hidden layers which grows in the dimension of the considered target functions. It is the topic of this work to look a bit more detailed to the deepness of the involved ANNs in the approximation of high-dimensional target functions. In particular, the main result of this work proves that there exists a concretely specified sequence of functions which can be approximated without the curse of dimensionality by sufficiently deep ANNs but which cannot be approximated without the curse of dimensionality if the involved ANNs are shallow or not deep enough.

preprint2020arXiv

Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation

In spite of the accomplishments of deep learning based algorithms in numerous applications and very broad corresponding research interest, at the moment there is still no rigorous understanding of the reasons why such algorithms produce useful results in certain situations. A thorough mathematical analysis of deep learning based algorithms seems to be crucial in order to improve our understanding and to make their implementation more effective and efficient. In this article we provide a mathematically rigorous full error analysis of deep learning based empirical risk minimisation with quadratic loss function in the probabilistically strong sense, where the underlying deep neural networks are trained using stochastic gradient descent with random initialisation. The convergence speed we obtain is presumably far from optimal and suffers under the curse of dimensionality. To the best of our knowledge, we establish, however, the first full error analysis in the scientific literature for a deep learning based algorithm in the probabilistically strong sense and, moreover, the first full error analysis in the scientific literature for a deep learning based algorithm where stochastic gradient descent with random initialisation is the employed optimisation method.

preprint2020arXiv

Overcoming the curse of dimensionality in the numerical approximation of high-dimensional semilinear elliptic partial differential equations

Recently, so-called full-history recursive multilevel Picard (MLP) approximation schemes have been introduced and shown to overcome the curse of dimensionality in the numerical approximation of semilinear parabolic partial differential equations (PDEs) with Lipschitz nonlinearities. The key contribution of this article is to introduce and analyze a new variant of MLP approximation schemes for certain semilinear elliptic PDEs with Lipschitz nonlinearities and to prove that the proposed approximation schemes overcome the curse of dimensionality in the numerical approximation of such semilinear elliptic PDEs.

preprint2020arXiv

Overcoming the curse of dimensionality in the numerical approximation of semilinear parabolic partial differential equations

For a long time it is well-known that high-dimensional linear parabolic partial differential equations (PDEs) can be approximated by Monte Carlo methods with a computational effort which grows polynomially both in the dimension and in the reciprocal of the prescribed accuracy. In other words, linear PDEs do not suffer from the curse of dimensionality. For general semilinear PDEs with Lipschitz coefficients, however, it remained an open question whether these suffer from the curse of dimensionality. In this paper we partially solve this open problem. More precisely, we prove in the case of semilinear heat equations with gradient-independent and globally Lipschitz continuous nonlinearities that the computational effort of a variant of the recently introduced multilevel Picard approximations grows polynomially both in the dimension and in the reciprocal of the required accuracy.

preprint2020arXiv

Strong convergence rates on the whole probability space for space-time discrete numerical approximation schemes for stochastic Burgers equations

The main result of this article establishes strong convergence rates on the whole probability space for explicit space-time discrete numerical approximations for a class of stochastic evolution equations with possibly non-globally monotone coefficients such as stochastic Burgers equations with additive trace-class noise. The key idea in the proof of our main result is (i) to bring the classical Alekseev-Gröbner formula from deterministic analysis into play and (ii) to employ uniform exponential moment estimates for the numerical approximations.

preprint2020arXiv

Weak error analysis for stochastic gradient descent optimization algorithms

Stochastic gradient descent (SGD) type optimization schemes are fundamental ingredients in a large number of machine learning based algorithms. In particular, SGD type optimization schemes are frequently employed in applications involving natural language processing, object and face recognition, fraud detection, computational advertisement, and numerical approximations of partial differential equations. In mathematical convergence results for SGD type optimization schemes there are usually two types of error criteria studied in the scientific literature, that is, the error in the strong sense and the error with respect to the objective function. In applications one is often not only interested in the size of the error with respect to the objective function but also in the size of the error with respect to a test function which is possibly different from the objective function. The analysis of the size of this error is the subject of this article. In particular, the main result of this article proves under suitable assumptions that the size of this error decays at the same speed as in the special case where the test function coincides with the objective function.

preprint2019arXiv

Convergence in Hölder norms with applications to Monte Carlo methods in infinite dimensions

We show that if a sequence of piecewise affine linear processes converges in the strong sense with a positive rate to a stochastic process which is strongly Hölder continuous in time, then this sequence converges in the strong sense even with respect to much stronger Hölder norms and the convergence rate is essentially reduced by the Hölder exponent. Our first application hereof establishes pathwise convergence rates for spectral Galerkin approximations of stochastic partial differential equations. Our second application derives strong convergence rates of multilevel Monte Carlo approximations of expectations of Banach space valued stochastic processes.

preprint2019arXiv

Deep neural network approximations for Monte Carlo algorithms

Recently, it has been proposed in the literature to employ deep neural networks (DNNs) together with stochastic gradient descent methods to approximate solutions of PDEs. There are also a few results in the literature which prove that DNNs can approximate solutions of certain PDEs without the curse of dimensionality in the sense that the number of real parameters used to describe the DNN grows at most polynomially both in the PDE dimension and the reciprocal of the prescribed approximation accuracy. One key argument in most of these results is, first, to use a Monte Carlo approximation scheme which can approximate the solution of the PDE under consideration at a fixed space-time point without the curse of dimensionality and, thereafter, to prove that DNNs are flexible enough to mimic the behaviour of the used approximation scheme. Having this in mind, one could aim for a general abstract result which shows under suitable assumptions that if a certain function can be approximated by any kind of (Monte Carlo) approximation scheme without the curse of dimensionality, then this function can also be approximated with DNNs without the curse of dimensionality. It is a key contribution of this article to make a first step towards this direction. In particular, the main result of this paper, essentially, shows that if a function can be approximated by means of some suitable discrete approximation scheme without the curse of dimensionality and if there exist DNNs which satisfy certain regularity properties and which approximate this discrete approximation scheme without the curse of dimensionality, then the function itself can also be approximated with DNNs without the curse of dimensionality. As an application of this result we establish that solutions of suitable Kolmogorov PDEs can be approximated with DNNs without the curse of dimensionality.

preprint2019arXiv

Overcoming the curse of dimensionality in the numerical approximation of Allen-Cahn partial differential equations via truncated full-history recursive multilevel Picard approximations

One of the most challenging problems in applied mathematics is the approximate solution of nonlinear partial differential equations (PDEs) in high dimensions. Standard deterministic approximation methods like finite differences or finite elements suffer from the curse of dimensionality in the sense that the computational effort grows exponentially in the dimension. In this work we overcome this difficulty in the case of reaction-diffusion type PDEs with a locally Lipschitz continuous coervice nonlinearity (such as Allen-Cahn PDEs) by introducing and analyzing truncated variants of the recently introduced full-history recursive multilevel Picard approximation schemes.

preprint2018arXiv

Solving high-dimensional partial differential equations using deep learning

Developing algorithms for solving high-dimensional partial differential equations (PDEs) has been an exceedingly difficult task for a long time, due to the notoriously difficult problem known as the "curse of dimensionality". This paper introduces a deep learning-based approach that can handle general high-dimensional parabolic PDEs. To this end, the PDEs are reformulated using backward stochastic differential equations and the gradient of the unknown solution is approximated by neural networks, very much in the spirit of deep reinforcement learning with the gradient acting as the policy function. Numerical results on examples including the nonlinear Black-Scholes equation, the Hamilton-Jacobi-Bellman equation, and the Allen-Cahn equation suggest that the proposed algorithm is quite effective in high dimensions, in terms of both accuracy and cost. This opens up new possibilities in economics, finance, operational research, and physics, by considering all participating agents, assets, resources, or particles together at the same time, instead of making ad hoc assumptions on their inter-relationships.

preprint2017arXiv

Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations

We propose a new algorithm for solving parabolic partial differential equations (PDEs) and backward stochastic differential equations (BSDEs) in high dimension, by making an analogy between the BSDE and reinforcement learning with the gradient of the solution playing the role of the policy function, and the loss function given by the error between the prescribed terminal condition and the solution of the BSDE. The policy function is then approximated by a neural network, as is done in deep reinforcement learning. Numerical results using TensorFlow illustrate the efficiency and accuracy of the proposed algorithms for several 100-dimensional nonlinear PDEs from physics and finance such as the Allen-Cahn equation, the Hamilton-Jacobi-Bellman equation, and a nonlinear pricing model for financial derivatives.

preprint2017arXiv

Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations

High-dimensional partial differential equations (PDE) appear in a number of models from the financial industry, such as in derivative pricing models, credit valuation adjustment (CVA) models, or portfolio optimization models. The PDEs in such applications are high-dimensional as the dimension corresponds to the number of financial assets in a portfolio. Moreover, such PDEs are often fully nonlinear due to the need to incorporate certain nonlinear phenomena in the model such as default risks, transaction costs, volatility uncertainty (Knightian uncertainty), or trading constraints in the model. Such high-dimensional fully nonlinear PDEs are exceedingly difficult to solve as the computational effort for standard approximation methods grows exponentially with the dimension. In this work we propose a new method for solving high-dimensional fully nonlinear second-order PDEs. Our method can in particular be used to sample from high-dimensional nonlinear expectations. The method is based on (i) a connection between fully nonlinear second-order PDEs and second-order backward stochastic differential equations (2BSDEs), (ii) a merged formulation of the PDE and the 2BSDE problem, (iii) a temporal forward discretization of the 2BSDE and a spatial approximation via deep neural nets, and (iv) a stochastic gradient descent-type optimization procedure. Numerical results obtained using ${\rm T{\small ENSOR}F{\small LOW}}$ in ${\rm P{\small YTHON}}$ illustrate the efficiency and the accuracy of the method in the cases of a $100$-dimensional Black-Scholes-Barenblatt equation, a $100$-dimensional Hamilton-Jacobi-Bellman equation, and a nonlinear expectation of a $ 100 $-dimensional $ G $-Brownian motion.

preprint2017arXiv

On stochastic differential equations with arbitrarily slow convergence rates for strong approximation in two space dimensions

In the recent article [Jentzen, A., Müller-Gronbach, T., and Yaroslavtseva, L., Commun. Math. Sci., 14(6), 1477--1500, 2016] it has been established that for every arbitrarily slow convergence speed and every natural number $d \in \{4,5,\ldots\}$ there exist $d$-dimensional stochastic differential equations (SDEs) with infinitely often differentiable and globally bounded coefficients such that no approximation method based on finitely many observations of the driving Brownian motion can converge in absolute mean to the solution faster than the given speed of convergence. In this paper we strengthen the above result by proving that this slow convergence phenomena also arises in two ($d=2$) and three ($d=3$) space dimensions.

preprint2017arXiv

Strong convergence for explicit space-time discrete numerical approximation methods for stochastic Burgers equations

In this paper we propose and analyze explicit space-time discrete numerical approximations for additive space-time white noise driven stochastic partial differential equations (SPDEs) with non-globally monotone nonlinearities such as the stochastic Burgers equation with space-time white noise. The main result of this paper proves that the proposed explicit space-time discrete approximation method converges strongly to the solution process of the stochastic Burgers equation with space-time white noise. To the best of our knowledge, the main result of this work is the first result in the literature which establishes strong convergence for a space-time discrete approximation method in the case of the stochastic Burgers equations with space-time white noise.

preprint2017arXiv

Strong convergence of full-discrete nonlinearity-truncated accelerated exponential Euler-type approximations for stochastic Kuramoto-Sivashinsky equations

This article introduces and analyzes a new explicit, easily implementable, and full discrete accelerated exponential Euler-type approximation scheme for additive space-time white noise driven stochastic partial differential equations (SPDEs) with possibly non-globally monotone nonlinearities such as stochastic Kuramoto-Sivashinsky equations. The main result of this article proves that the proposed approximation scheme converges strongly and numerically weakly to the solution process of such an SPDE. Key ingredients in the proof of our convergence result are a suitable generalized coercivity-type condition, the specific design of the accelerated exponential Euler-type approximation scheme, and an application of Fernique's theorem.

preprint2016arXiv

Exponential integrability properties of numerical approximation processes for nonlinear stochastic differential equations

Exponential integrability properties of numerical approximations are a key tool for establishing positive rates of strong and numerically weak convergence for a large class of nonlinear stochastic differential equations. It turns out that well-known numerical approximation processes such as Euler-Maruyama approximations, linear-implicit Euler approximations, and some tamed Euler approximations from the literature rarely preserve exponential integrability properties of the exact solution. The main contribution of this article is to identify a class of stopped increment-tamed Euler approximations which preserve exponential integrability properties of the exact solution under minor additional assumptions on the involved functions.

preprint2016arXiv

Weak convergence rates for numerical approximations of stochastic partial differential equations with nonlinear diffusion coefficients in UMD Banach spaces

Strong convergence rates for numerical approximations of semilinear stochastic partial differential equations (SPDEs) with smooth and regular nonlinearities are well understood in the literature. Weak convergence rates for numerical approximations of such SPDEs have been investigated for about two decades and are still not yet fully understood. In particular, no essentially sharp weak convergence rates are known for temporal or spatial numerical approximations of space-time white noise driven SPDEs with nonlinear multiplication operators in the diffusion coefficients. In this article we overcome this problem by establishing essentially sharp weak convergence rates for exponential Euler approximations of semilinear SPDEs with nonlinear multiplication operators in the diffusion coefficients. Key ingredients of our approach are applications of the mild Itô type formula in UMD Banach spaces with type 2.

preprint2015arXiv

Loss of regularity for Kolmogorov equations

The celebrated Hörmander condition is a sufficient (and nearly necessary) condition for a second-order linear Kolmogorov partial differential equation (PDE) with smooth coefficients to be hypoelliptic. As a consequence, the solutions of Kolmogorov PDEs are smooth at all positive times if the coefficients of the PDE are smooth and satisfy Hörmander's condition even if the initial function is only continuous but not differentiable. First-order linear Kolmogorov PDEs with smooth coefficients do not have this smoothing effect but at least preserve regularity in the sense that solutions are smooth if their initial functions are smooth. In this article, we consider the intermediate regime of nonhypoelliptic second-order Kolmogorov PDEs with smooth coefficients. The main observation of this article is that there exist counterexamples to regularity preservation in that case. More precisely, we give an example of a second-order linear Kolmogorov PDE with globally bounded and smooth coefficients and a smooth initial function with compact support such that the unique globally bounded viscosity solution of the PDE is not even locally Hölder continuous. From the perspective of probability theory, the existence of this example PDE has the consequence that there exists a stochastic differential equation (SDE) with globally bounded and smooth coefficients and a smooth function with compact support which is mapped by the corresponding transition semigroup to a function which is not locally Hölder continuous. In other words, degenerate noise can have a roughening effect. A further implication of this loss of regularity phenomenon is that numerical approximations may converge without any arbitrarily small polynomial rate of convergence to the true solution of the SDE. More precisely, we prove for an example SDE with globally bounded and smooth coefficients that the standard Euler approximations converge to the exact solution of the SDE in the strong and numerically weak sense, but at a rate that is slower then any power law.

preprint2014arXiv

Strong convergence rates and temporal regularity for Cox-Ingersoll-Ross processes and Bessel processes with accessible boundaries

Cox-Ingersoll-Ross (CIR) processes are widely used in financial modeling such as in the Heston model for the approximative pricing of financial derivatives. Moreover, CIR processes are mathematically interesting due to the irregular square root function in the diffusion coefficient. In the literature, positive strong convergence rates for numerical approximations of CIR processes have been established in the case of an inaccessible boundary point. Since calibrations of the Heston model frequently result in parameters such that the boundary is accessible, we focus on this interesting case. Our main result shows for every $p \in (0, \infty)$ that the drift-implicit square-root Euler approximations proposed in Alfonsi (2005) converge in the strong $L^p$-distance with a positive rate for half of the parameter regime in which the boundary point is accessible. A key step in our proof is temporal regularity of Bessel processes. More precisely, we prove for every $p \in (0, \infty)$ that Bessel processes are temporally $1/2$-Hölder continuous in $L^p$.

preprint2013arXiv

Divergence of the multilevel Monte Carlo Euler method for nonlinear stochastic differential equations

The Euler-Maruyama scheme is known to diverge strongly and numerically weakly when applied to nonlinear stochastic differential equations (SDEs) with superlinearly growing and globally one-sided Lipschitz continuous drift coefficients. Classical Monte Carlo simulations do, however, not suffer from this divergence behavior of Euler's method because this divergence behavior happens on rare events. Indeed, for such nonlinear SDEs the classical Monte Carlo Euler method has been shown to converge by exploiting that the Euler approximations diverge only on events whose probabilities decay to zero very rapidly. Significantly more efficient than the classical Monte Carlo Euler method is the recently introduced multilevel Monte Carlo Euler method. The main observation of this article is that this multilevel Monte Carlo Euler method does - in contrast to classical Monte Carlo methods - not converge in general in the case of such nonlinear SDEs. More precisely, we establish divergence of the multilevel Monte Carlo Euler method for a family of SDEs with superlinearly growing and globally one-sided Lipschitz continuous drift coefficients. In particular, the multilevel Monte Carlo Euler method diverges for these nonlinear SDEs on an event that is not at all rare but has probability one. As a consequence for applications, we recommend not to use the multilevel Monte Carlo Euler method for SDEs with superlinearly growing nonlinearities. Instead we propose to combine the multilevel Monte Carlo method with a slightly modified Euler method. More precisely, we show that the multilevel Monte Carlo method combined with a tamed Euler method converges for nonlinear SDEs with globally one-sided Lipschitz continuous drift coefficients and preserves its strikingly higher order convergence rate from the Lipschitz case.

preprint2012arXiv

Efficient simulation of nonlinear parabolic SPDEs with additive noise

Recently, in a paper by Jentzen and Kloeden [Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 465 (2009) 649-667], a new method for simulating nearly linear stochastic partial differential equations (SPDEs) with additive noise has been introduced. The key idea was to use suitable linear functionals of the noise process in the numerical scheme which allow a higher approximation order to be obtained. Following this approach, a new simplified version of the scheme in the above named reference is proposed and analyzed in this article. The main advantage of the convergence result given here is the higher convergence order for nonlinear parabolic SPDEs with additive noise, although the used numerical scheme is very simple to simulate and implement.

preprint2012arXiv

Strong convergence of an explicit numerical method for SDEs with nonglobally Lipschitz continuous coefficients

On the one hand, the explicit Euler scheme fails to converge strongly to the exact solution of a stochastic differential equation (SDE) with a superlinearly growing and globally one-sided Lipschitz continuous drift coefficient. On the other hand, the implicit Euler scheme is known to converge strongly to the exact solution of such an SDE. Implementations of the implicit Euler scheme, however, require additional computational effort. In this article we therefore propose an explicit and easily implementable numerical method for such an SDE and show that this method converges strongly with the standard order one-half to the exact solution of the SDE. Simulations reveal that this explicit strongly convergent numerical scheme is considerably faster than the implicit Euler scheme.

preprint2011arXiv

Convergence of the stochastic Euler scheme for locally Lipschitz coefficients

Stochastic differential equations are often simulated with the Monte Carlo Euler method. Convergence of this method is well understood in the case of globally Lipschitz continuous coefficients of the stochastic differential equation. The important case of superlinearly growing coefficients, however, has remained an open question. The main difficulty is that numerically weak convergence fails to hold in many cases of superlinearly growing coefficients. In this paper we overcome this difficulty and establish convergence of the Monte Carlo Euler method for a large class of one-dimensional stochastic differential equations whose drift functions have at most polynomial growth.

preprint2011arXiv

Regularity analysis for stochastic partial differential equations with nonlinear multiplicative trace class noise

In this article spatial and temporal regularity of the solution process of a stochastic partial differential equation (SPDE) of evolutionary type with nonlinear multiplicative trace class noise is analyzed.

preprint2010arXiv

Taylor expansions of solutions of stochastic partial differential equations with additive noise

The solution of a parabolic stochastic partial differential equation (SPDE) driven by an infinite-dimensional Brownian motion is in general not a semi-martingale anymore and does in general not satisfy an Itô formula like the solution of a finite-dimensional stochastic ordinary differential equation (SODE). In particular, it is not possible to derive stochastic Taylor expansions as for the solution of a SODE using an iterated application of the Itô formula. Consequently, until recently, only low order numerical approximation results for such a SPDE have been available. Here, the fact that the solution of a SPDE driven by additive noise can be interpreted in the mild sense with integrals involving the exponential of the dominant linear operator in the SPDE provides an alternative approach for deriving stochastic Taylor expansions for the solution of such a SPDE. Essentially, the exponential factor has a mollifying effect and ensures that all integrals take values in the Hilbert space under consideration. The iteration of such integrals allows us to derive stochastic Taylor expansions of arbitrarily high order, which are robust in the sense that they also hold for other types of driving noise processes such as fractional Brownian motion. Combinatorial concepts of trees and woods provide a compact formulation of the Taylor expansions.

Arnulf Jentzen

What is connected

Connect this record

See the researcher in context

Building this map preview

34 published item(s)

SAD Neural Networks: Divergent Gradient Flows and Asymptotic Optimality via o-minimal Structures

The necessity of depth for artificial neural networks to approximate certain classes of smooth and bounded functions without the curse of dimensionality

Deep learning approximations for non-local nonlinear PDEs with Neumann boundary conditions

Landscape analysis for shallow neural networks: complete classification of critical points for affine target functions

Normalized gradient flow optimization in the training of ReLU artificial neural networks

On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks

On the existence of infinitely many realization functions of non-global local minima in the training of artificial neural networks with ReLU activation

A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions

Full history recursive multilevel Picard approximations for ordinary differential equations with expectations

Lower bounds for artificial neural network approximations: A proof that shallow neural networks fail to overcome the curse of dimensionality

Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation

Overcoming the curse of dimensionality in the numerical approximation of high-dimensional semilinear elliptic partial differential equations

Overcoming the curse of dimensionality in the numerical approximation of semilinear parabolic partial differential equations

Strong convergence rates on the whole probability space for space-time discrete numerical approximation schemes for stochastic Burgers equations

Weak error analysis for stochastic gradient descent optimization algorithms

Convergence in Hölder norms with applications to Monte Carlo methods in infinite dimensions

Deep neural network approximations for Monte Carlo algorithms

Overcoming the curse of dimensionality in the numerical approximation of Allen-Cahn partial differential equations via truncated full-history recursive multilevel Picard approximations

Solving high-dimensional partial differential equations using deep learning

Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations

Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations

On stochastic differential equations with arbitrarily slow convergence rates for strong approximation in two space dimensions

Strong convergence for explicit space-time discrete numerical approximation methods for stochastic Burgers equations

Strong convergence of full-discrete nonlinearity-truncated accelerated exponential Euler-type approximations for stochastic Kuramoto-Sivashinsky equations

Exponential integrability properties of numerical approximation processes for nonlinear stochastic differential equations

Weak convergence rates for numerical approximations of stochastic partial differential equations with nonlinear diffusion coefficients in UMD Banach spaces

Loss of regularity for Kolmogorov equations

Strong convergence rates and temporal regularity for Cox-Ingersoll-Ross processes and Bessel processes with accessible boundaries

Divergence of the multilevel Monte Carlo Euler method for nonlinear stochastic differential equations

Efficient simulation of nonlinear parabolic SPDEs with additive noise

Strong convergence of an explicit numerical method for SDEs with nonglobally Lipschitz continuous coefficients

Convergence of the stochastic Euler scheme for locally Lipschitz coefficients

Regularity analysis for stochastic partial differential equations with nonlinear multiplicative trace class noise

Taylor expansions of solutions of stochastic partial differential equations with additive noise