Source author record

Stephan Wojtowytsch

Stephan Wojtowytsch appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.AP math.DG math.FA

Catalog footprint

What is connected

9works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Optimal bump functions for shallow ReLU networks: Weight decay, depth separation and the curse of dimensionality

In this note, we study how neural networks with a single hidden layer and ReLU activation interpolate data drawn from a radially symmetric distribution with target labels 1 at the origin and 0 outside the unit ball, if no labels are known inside the unit ball. With weight decay regularization and in the infinite neuron, infinite data limit, we prove that a unique radially symmetric minimizer exists, whose weight decay regularizer and Lipschitz constant grow as $d$ and $\sqrt{d}$ respectively. We furthermore show that the weight decay regularizer grows exponentially in $d$ if the label $1$ is imposed on a ball of radius $\varepsilon$ rather than just at the origin. By comparison, a neural networks with two hidden layers can approximate the target function without encountering the curse of dimensionality.

preprint2022arXiv

Qualitative neural network approximation over R and C: Elementary proofs for analytic and polynomial activation

In this article, we prove approximation theorems in classes of deep and shallow neural networks with analytic activation functions by elementary arguments. We prove for both real and complex networks with non-polynomial activation that the closure of the class of neural networks coincides with the closure of the space of polynomials. The closure can further be characterized by the Stone-Weierstrass theorem (in the real case) and Mergelyan's theorem (in the complex case). In the real case, we further prove approximation results for networks with higher-dimensional harmonic activation and orthogonally projected linear maps. We further show that fully connected and residual networks of large depth with polynomial activation functions can approximate any polynomial under certain width requirements. All proofs are entirely elementary.

preprint2020arXiv

Can Shallow Neural Networks Beat the Curse of Dimensionality? A mean field training perspective

We prove that the gradient descent training of a two-layer neural network on empirical or population risk may not decrease population risk at an order faster than $t^{-4/(d-2)}$ under mean field scaling. Thus gradient descent training for fitting reasonably smooth, but truly high-dimensional data may be subject to the curse of dimensionality. We present numerical evidence that gradient descent training with general Lipschitz target functions becomes slower and slower as the dimension increases, but converges at approximately the same rate in all dimensions when the target function lies in the natural function space for two-layer ReLU networks.

preprint2020arXiv

On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics

We develop Banach spaces for ReLU neural networks of finite depth $L$ and infinite width. The spaces contain all finite fully connected $L$-layer networks and their $L^2$-limiting objects under bounds on the natural path-norm. Under this norm, the unit ball in the space for $L$-layer networks has low Rademacher complexity and thus favorable generalization properties. Functions in these spaces can be approximated by multi-layer neural networks with dimension-independent convergence rates. The key to this work is a new way of representing functions in some form of expectations, motivated by multi-layer neural networks. This representation allows us to define a new class of continuous models for machine learning. We show that the gradient flow defined this way is the natural continuous analog of the gradient descent dynamics for the associated multi-layer neural networks. We show that the path-norm increases at most polynomially under this continuous gradient flow dynamics.

preprint2020arXiv

On the Convergence of Gradient Descent Training for Two-layer ReLU-networks in the Mean Field Regime

We describe a necessary and sufficient condition for the convergence to minimum Bayes risk when training two-layer ReLU-networks by gradient descent in the mean field regime with omni-directional initial parameter distribution. This article extends recent results of Chizat and Bach to ReLU-activated networks and to the situation in which there are no parameters which exactly achieve MBR. The condition does not depend on the initalization of parameters and concerns only the weak convergence of the realization of the neural network, not its parameter distribution.

preprint2020arXiv

On the motion of curved dislocations in three dimensions: Simplified linearized elasticity

It is shown that in core-radius cutoff regularized simplified elasticity (where the elastic energy depends quadratically on the full displacement gradient rather than its symmetrized version), the force on a dislocation curve by the negative gradient of the elastic energy asymptotically approaches the mean curvature of the curve as the cutoff radius converges to zero. Rigorous error bounds in Hölder spaces are provided. As an application, convergence of dislocations moving by the gradient flow of the elastic energy to dislocations moving by the gradient flow of the arclength functional, when the motion law is given by an $H^1$-type dissipation, and convergence to curve shortening flow in co-dimension $2$ for the usual $L^2$-dissipation is established. In the second scenario, existence and regularity are assumed while the $H^1$-gradient flow is treated in full generality (for short time). The methods developed here are a blueprint for the more physical setting of linearized isotropic elasticity.

preprint2016arXiv

Helfrich's Energy and Constrained Minimisation

For every $g\in\mathbb{N}_0$ and $ε>0$, we construct a smooth genus $g$ surface embedded into the unit ball with area $8π$ and Willmore energy smaller than $8π+ ε$. From this we deduce that a minimising sequence for Willmore's energy in the class of genus $g$ surfaces embedded in the unit ball with area $8π$ converges to a doubly covered sphere for all $g\in\mathbb{N}_0$. We obtain the same result for certain Canham-Helfrich energies with $χ_K\leq 0$ without genus constraint and show that Canham-Helfrich energies with $χ_K>0$ are not bounded from below in the class of smooth surfaces with area $S$ embedded into a domain $Ω\Subset \mathbb{R}^3$. Furthermore, we prove that the class of connected surfaces embedded in a domain $Ω\Subset\mathbb{R}^3$ with uniformly bounded Willmore energy and area is compact under varifold convergence.

preprint2016arXiv

Phase field models for thin elastic structures with topological constraint

This article is concerned with the problem of minimising the Willmore energy in the class of \emph{connected} surfaces with prescribed area which are confined to a small container. We propose a phase field approximation based on De Giorgi's diffuse Willmore functional to this variational problem. Our main contribution is a penalisation term which ensures connectedness in the sharp interface limit. The penalisation of disconnectedness is based on a geodesic distance chosen to be small between two points that lie on the same connected component of the transition layer of the phase field. We prove that in two dimensions, sequences of phase fields with uniformly bounded diffuse Willmore energy and diffuse area converge uniformly to the zeros of a double-well potential away from the support of a limiting measure. In three dimensions, we show that they converge $\mathcal{H}^1$-almost everywhere on curves. This enables us to show $Γ$-convergence to a sharp interface problem that only allows for connected structures. The results also imply Hausdorff convergence of the level sets in two dimensions and a similar result in three dimensions. We furthermore present numerical evidence of the effectiveness of our model. The implementation relies on a coupling of Dijkstra's algorithm in order to compute the topological penalty to a finite element approach for the Willmore term.

preprint2013arXiv

On the Alexandrov Topology of sub-Lorentzian Manifolds

It is commonly known that in Riemannian and sub-Riemannian Geometry, the metric tensor on a manifold defines a distance function. In Lorentzian Geometry, instead of a distance function it provides causal relations and the Lorentzian time-separation function. Both lead to the definition of the Alexandrov topology, which is linked to the property of strong causality of a space-time. We studied three possible ways to define the Alexandrov topology on sub-Lorentzian manifolds, which usually give different topologies, but agree in the Lorentzian case. We investigated their relationships to each other and the manifold's original topology and their link to causality.

Stephan Wojtowytsch

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Optimal bump functions for shallow ReLU networks: Weight decay, depth separation and the curse of dimensionality

Qualitative neural network approximation over R and C: Elementary proofs for analytic and polynomial activation

Can Shallow Neural Networks Beat the Curse of Dimensionality? A mean field training perspective

On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics

On the Convergence of Gradient Descent Training for Two-layer ReLU-networks in the Mean Field Regime

On the motion of curved dislocations in three dimensions: Simplified linearized elasticity

Helfrich's Energy and Constrained Minimisation

Phase field models for thin elastic structures with topological constraint

On the Alexandrov Topology of sub-Lorentzian Manifolds