Researcher profile

Stephan Wojtowytsch

Stephan Wojtowytsch contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2022arXiv

Optimal bump functions for shallow ReLU networks: Weight decay, depth separation and the curse of dimensionality

In this note, we study how neural networks with a single hidden layer and ReLU activation interpolate data drawn from a radially symmetric distribution with target labels 1 at the origin and 0 outside the unit ball, if no labels are known inside the unit ball. With weight decay regularization and in the infinite neuron, infinite data limit, we prove that a unique radially symmetric minimizer exists, whose weight decay regularizer and Lipschitz constant grow as $d$ and $\sqrt{d}$ respectively. We furthermore show that the weight decay regularizer grows exponentially in $d$ if the label $1$ is imposed on a ball of radius $\varepsilon$ rather than just at the origin. By comparison, a neural networks with two hidden layers can approximate the target function without encountering the curse of dimensionality.

preprint2022arXiv

Qualitative neural network approximation over R and C: Elementary proofs for analytic and polynomial activation

In this article, we prove approximation theorems in classes of deep and shallow neural networks with analytic activation functions by elementary arguments. We prove for both real and complex networks with non-polynomial activation that the closure of the class of neural networks coincides with the closure of the space of polynomials. The closure can further be characterized by the Stone-Weierstrass theorem (in the real case) and Mergelyan's theorem (in the complex case). In the real case, we further prove approximation results for networks with higher-dimensional harmonic activation and orthogonally projected linear maps. We further show that fully connected and residual networks of large depth with polynomial activation functions can approximate any polynomial under certain width requirements. All proofs are entirely elementary.

preprint2020arXiv

Can Shallow Neural Networks Beat the Curse of Dimensionality? A mean field training perspective

We prove that the gradient descent training of a two-layer neural network on empirical or population risk may not decrease population risk at an order faster than $t^{-4/(d-2)}$ under mean field scaling. Thus gradient descent training for fitting reasonably smooth, but truly high-dimensional data may be subject to the curse of dimensionality. We present numerical evidence that gradient descent training with general Lipschitz target functions becomes slower and slower as the dimension increases, but converges at approximately the same rate in all dimensions when the target function lies in the natural function space for two-layer ReLU networks.

preprint2020arXiv

On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics

We develop Banach spaces for ReLU neural networks of finite depth $L$ and infinite width. The spaces contain all finite fully connected $L$-layer networks and their $L^2$-limiting objects under bounds on the natural path-norm. Under this norm, the unit ball in the space for $L$-layer networks has low Rademacher complexity and thus favorable generalization properties. Functions in these spaces can be approximated by multi-layer neural networks with dimension-independent convergence rates. The key to this work is a new way of representing functions in some form of expectations, motivated by multi-layer neural networks. This representation allows us to define a new class of continuous models for machine learning. We show that the gradient flow defined this way is the natural continuous analog of the gradient descent dynamics for the associated multi-layer neural networks. We show that the path-norm increases at most polynomially under this continuous gradient flow dynamics.

preprint2020arXiv

On the Convergence of Gradient Descent Training for Two-layer ReLU-networks in the Mean Field Regime

We describe a necessary and sufficient condition for the convergence to minimum Bayes risk when training two-layer ReLU-networks by gradient descent in the mean field regime with omni-directional initial parameter distribution. This article extends recent results of Chizat and Bach to ReLU-activated networks and to the situation in which there are no parameters which exactly achieve MBR. The condition does not depend on the initalization of parameters and concerns only the weak convergence of the realization of the neural network, not its parameter distribution.

preprint2020arXiv

On the motion of curved dislocations in three dimensions: Simplified linearized elasticity

It is shown that in core-radius cutoff regularized simplified elasticity (where the elastic energy depends quadratically on the full displacement gradient rather than its symmetrized version), the force on a dislocation curve by the negative gradient of the elastic energy asymptotically approaches the mean curvature of the curve as the cutoff radius converges to zero. Rigorous error bounds in Hölder spaces are provided. As an application, convergence of dislocations moving by the gradient flow of the elastic energy to dislocations moving by the gradient flow of the arclength functional, when the motion law is given by an $H^1$-type dissipation, and convergence to curve shortening flow in co-dimension $2$ for the usual $L^2$-dissipation is established. In the second scenario, existence and regularity are assumed while the $H^1$-gradient flow is treated in full generality (for short time). The methods developed here are a blueprint for the more physical setting of linearized isotropic elasticity.