Source author record

Philipp Grohs

Philipp Grohs appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.FA Machine Learning math.NA Numerical Analysis math.DG Computer Vision Information Theory math.AP math.CA math.IT math.OC math.PR Neural and Evolutionary Computing

Catalog footprint

What is connected

21works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Robust and Fast Training via Per-Sample Clipping

We propose a robust gradient estimator based on per-sample gradient clipping and analyze its properties both theoretically and empirically. We show that the resulting method, per-sample clipped SGD (PS-Clip-SGD), achieves optimal in-expectation convergence rates for non-convex optimization problems under heavy-tailed gradient noise. Moreover, we establish high-probability convergence guarantees that match the in-expectation rates up to polylogarithmic factors in the failure probability. We complement our theoretical results with multiple numerical experiments. In particular, we demonstrate that PS-Clip-SGD outperforms both vanilla SGD with momentum and standard gradient clipping when training AlexNet on the CIFAR-100 dataset, even after accounting for the additional computational time caused by per-sample clipping. We also empirically show that, in the presence of gradient accumulation, applying clipping at the mini-batch level can improve training performance while incurring virtually no additional computational cost. This finding is particularly interesting, as it contradicts the common practice of applying clipping only after all accumulation steps have been completed.

preprint2022arXiv

Injectivity of Gabor phase retrieval from lattice measurements

We establish novel uniqueness results for the Gabor phase retrieval problem: if $\mathcal{G} : L^2(\mathbb{R}) \to L^2(\mathbb{R}^2)$ denotes the Gabor transform then every $f \in L^4[-\tfrac{c}{2},\tfrac{c}{2}]$ is determined up to a global phase by the values $|\mathcal{G}f(x,ω)|$ where $(x,ω)$ are points on the lattice $b^{-1}\mathbb{Z} \times (2c)^{-1}\mathbb{Z}$ and $b>0$ is an arbitrary positive constant. This for the first time shows that compactly-supported, complex-valued functions can be uniquely reconstructed from lattice samples of their spectrogram. Moreover, by making use of recent developments related to sampling in shift-invariant spaces by Gröchenig, Romero and Stöckler, we prove analogous uniqueness results for functions in shift-invariant spaces with Gaussian generator. Generalizations to nonuniform sampling are also presented. Finally, we compare our results to the situation where the considered signals are assumed to be real-valued.

preprint2022arXiv

On foundational discretization barriers in STFT phase retrieval

We prove that there exists no window function $g \in L^2(\mathbb{R})$ and no lattice $\mathcal{L} \subset \mathbb{R}^2$ such that every $f \in L^2(\mathbb{R})$ is determined up to a global phase by spectrogram samples $|V_gf(\mathcal{L})|$ where $V_gf$ denotes the short-time Fourier transform of $f$ with respect to $g$. Consequently, the forward operator $f \mapsto |V_gf(\mathcal{L})|$ mapping a square-integrable function to its spectrogram samples on a lattice is never injective on the quotient space $L^2(\mathbb{R}) / {\sim}$ with $f \sim h$ identifying two functions which agree up to a multiplicative constant of modulus one. We will further elaborate this result and point out that under mild conditions on the lattice $\mathcal{L}$, functions which produce identical spectrogram samples but do not agree up to a unimodular constant can be chosen to be real-valued. The derived results highlight that in the discretization of the STFT phase retrieval problem from lattice measurements, a prior restriction of the underlying signal space to a proper subspace of $L^2(\mathbb{R})$ is inevitable.

preprint2021arXiv

Lower bounds for artificial neural network approximations: A proof that shallow neural networks fail to overcome the curse of dimensionality

Artificial neural networks (ANNs) have become a very powerful tool in the approximation of high-dimensional functions. Especially, deep ANNs, consisting of a large number of hidden layers, have been very successfully used in a series of practical relevant computational problems involving high-dimensional input data ranging from classification tasks in supervised learning to optimal decision problems in reinforcement learning. There are also a number of mathematical results in the scientific literature which study the approximation capacities of ANNs in the context of high-dimensional target functions. In particular, there are a series of mathematical results in the scientific literature which show that sufficiently deep ANNs have the capacity to overcome the curse of dimensionality in the approximation of certain target function classes in the sense that the number of parameters of the approximating ANNs grows at most polynomially in the dimension $d \in \mathbb{N}$ of the target functions under considerations. In the proofs of several of such high-dimensional approximation results it is crucial that the involved ANNs are sufficiently deep and consist a sufficiently large number of hidden layers which grows in the dimension of the considered target functions. It is the topic of this work to look a bit more detailed to the deepness of the involved ANNs in the approximation of high-dimensional target functions. In particular, the main result of this work proves that there exists a concretely specified sequence of functions which can be approximated without the curse of dimensionality by sufficiently deep ANNs but which cannot be approximated without the curse of dimensionality if the involved ANNs are shallow or not deep enough.

preprint2020arXiv

Approximations with deep neural networks in Sobolev time-space

Solutions of evolution equation generally lies in certain Bochner-Sobolev spaces, in which the solution may has regularity and integrability properties for the time variable that can be different for the space variables. Therefore, in this paper, we develop a framework shows that deep neural networks can approximate Sobolev-regular functions with respect to Bochner-Sobolev spaces. In our work we use the so-called Rectified Cubic Unit (ReCU) as an activation function in our networks, which allows us to deduce approximation results of the neural networks while avoiding issues caused by the non regularity of the most commonly used Rectivied Linear Unit (ReLU) activation function.

preprint2020arXiv

Deep neural network approximation for high-dimensional elliptic PDEs with boundary conditions

In recent work it has been established that deep neural networks are capable of approximating solutions to a large class of parabolic partial differential equations without incurring the curse of dimension. However, all this work has been restricted to problems formulated on the whole Euclidean domain. On the other hand, most problems in engineering and the sciences are formulated on finite domains and subjected to boundary conditions. The present paper considers an important such model problem, namely the Poisson equation on a domain $D\subset \mathbb{R}^d$ subject to Dirichlet boundary conditions. It is shown that deep neural networks are capable of representing solutions of that problem without incurring the curse of dimension. The proofs are based on a probabilistic representation of the solution to the Poisson equation as well as a suitable sampling method.

preprint2020arXiv

Gabor phase retrieval is severely ill-posed

The problem of reconstructing a function from the magnitudes of its frame coefficients has recently been shown to be never uniformly stable in infinite-dimensional spaces [5]. This result also holds for frames that are possibly continuous [2]. On the other hand, the problem is always stable in finite-dimensional settings. A prominent example of such a phase retrieval problem is the recovery of a signal from the modulus of its Gabor transform. In this paper, we study Gabor phase retrieval and ask how the stability degrades on a natural family of finite-dimensional subspaces of the signal domain $L^2(\mathbb{R})$. We prove that the stability constant scales at least quadratically exponentially in the dimension of the subspaces. Our construction also shows that typical priors such as sparsity or smoothness promoting penalties do not constitute regularization terms for phase retrieval.

preprint2020arXiv

Phase Retrieval: Uniqueness and Stability

The problem of phase retrieval, i.e., the problem of recovering a function from the magnitudes of its Fourier transform, naturally arises in various fields of physics, such as astronomy, radar, speech recognition, quantum mechanics and, perhaps most prominently, diffraction imaging. The mathematical study of phase retrieval problems possesses a long history with a number of beautiful and deep results drawing from different mathematical fields, such as harmonic analyis, complex analysis, or Riemannian geometry. The present paper aims to present a summary of some of these results with an emphasis on recent activities. In particular we aim to summarize our current understanding of uniqueness and stability properties of phase retrieval problems.

preprint2020arXiv

Phase Transitions in Rate Distortion Theory and Deep Learning

Rate distortion theory is concerned with optimally encoding a given signal class $\mathcal{S}$ using a budget of $R$ bits, as $R\to\infty$. We say that $\mathcal{S}$ can be compressed at rate $s$ if we can achieve an error of $\mathcal{O}(R^{-s})$ for encoding $\mathcal{S}$; the supremal compression rate is denoted $s^\ast(\mathcal{S})$. Given a fixed coding scheme, there usually are elements of $\mathcal{S}$ that are compressed at a higher rate than $s^\ast(\mathcal{S})$ by the given coding scheme; we study the size of this set of signals. We show that for certain "nice" signal classes $\mathcal{S}$, a phase transition occurs: We construct a probability measure $\mathbb{P}$ on $\mathcal{S}$ such that for every coding scheme $\mathcal{C}$ and any $s >s^\ast(\mathcal{S})$, the set of signals encoded with error $\mathcal{O}(R^{-s})$ by $\mathcal{C}$ forms a $\mathbb{P}$-null-set. In particular our results apply to balls in Besov and Sobolev spaces that embed compactly into $L^2(Ω)$ for a bounded Lipschitz domain $Ω$. As an application, we show that several existing sharpness results concerning function approximation using deep neural networks are generically sharp. We also provide quantitative and non-asymptotic bounds on the probability that a random $f\in\mathcal{S}$ can be encoded to within accuracy $\varepsilon$ using $R$ bits. This result is applied to the problem of approximately representing $f\in\mathcal{S}$ to within accuracy $\varepsilon$ by a (quantized) neural network that is constrained to have at most $W$ nonzero weights and is generated by an arbitrary "learning" procedure. We show that for any $s >s^\ast(\mathcal{S})$ there are constants $c,C$ such that, no matter how we choose the "learning" procedure, the probability of success is bounded from above by $\min\big\{1,2^{C\cdot W\lceil\log_2(1+W)\rceil^2 -c\cdot\varepsilon^{-1/s}}\big\}$.

preprint2020arXiv

Ruled Laguerre minimal surfaces

A Laguerre minimal surface is an immersed surface in the Euclidean space being an extremal of the functional \int (H^2/K - 1) dA. In the present paper, we prove that the only ruled Laguerre minimal surfaces are up to isometry the surfaces R(u,v) = (Au, Bu, Cu + D cos 2u) + v (sin u, cos u, 0), where A, B, C, D are fixed real numbers. To achieve invariance under Laguerre transformations, we also derive all Laguerre minimal surfaces that are enveloped by a family of cones. The methodology is based on the isotropic model of Laguerre geometry. In this model a Laguerre minimal surface enveloped by a family of cones corresponds to a graph of a biharmonic function carrying a family of isotropic circles. We classify such functions by showing that the top view of the family of circles is a pencil.

preprint2019arXiv

Deep neural network approximations for Monte Carlo algorithms

Recently, it has been proposed in the literature to employ deep neural networks (DNNs) together with stochastic gradient descent methods to approximate solutions of PDEs. There are also a few results in the literature which prove that DNNs can approximate solutions of certain PDEs without the curse of dimensionality in the sense that the number of real parameters used to describe the DNN grows at most polynomially both in the PDE dimension and the reciprocal of the prescribed approximation accuracy. One key argument in most of these results is, first, to use a Monte Carlo approximation scheme which can approximate the solution of the PDE under consideration at a fixed space-time point without the curse of dimensionality and, thereafter, to prove that DNNs are flexible enough to mimic the behaviour of the used approximation scheme. Having this in mind, one could aim for a general abstract result which shows under suitable assumptions that if a certain function can be approximated by any kind of (Monte Carlo) approximation scheme without the curse of dimensionality, then this function can also be approximated with DNNs without the curse of dimensionality. It is a key contribution of this article to make a first step towards this direction. In particular, the main result of this paper, essentially, shows that if a function can be approximated by means of some suitable discrete approximation scheme without the curse of dimensionality and if there exist DNNs which satisfy certain regularity properties and which approximate this discrete approximation scheme without the curse of dimensionality, then the function itself can also be approximated with DNNs without the curse of dimensionality. As an application of this result we establish that solutions of suitable Kolmogorov PDEs can be approximated with DNNs without the curse of dimensionality.

preprint2016arXiv

Discrete Deep Feature Extraction: A Theory and New Architectures

First steps towards a mathematical theory of deep convolutional neural networks for feature extraction were made---for the continuous-time case---in Mallat, 2012, and Wiatowski and Bölcskei, 2015. This paper considers the discrete case, introduces new convolutional neural network architectures, and proposes a mathematical framework for their analysis. Specifically, we establish deformation and translation sensitivity results of local and global nature, and we investigate how certain structural properties of the input signal are reflected in the corresponding feature vectors. Our theory applies to general filters and general Lipschitz-continuous non-linearities and pooling operators. Experiments on handwritten digit classification and facial landmark detection---including feature importance evaluation---complement the theoretical findings.

preprint2016arXiv

On the Approximation of Functions with Line Singularities by Ridgelets

In [GO15], the authors discussed the existence of numerically feasible solvers for advection equations that run in optimal computational complexity. In this paper, we complete the last remaining requirement to achieve this goal - by showing that ridgelets, on which the solver is based, approximate functions with line singularities (which may appear as solutions to the advection equation) with the best possible approximation rate. Structurally, the proof resembles [Can01], where a similar result was proved for a different ridgelet construction, which is however not well-suited for use in a PDE solver (and in particular, not suitable for the CDD-schemes [CDD01] we are interested in). Due to the differences between the two ridgelet constructions, we have to deal with quite a different set of issues, but are also able to relax the (support) conditions on the function being approximated. Finally, the proof employs a new convolution-type estimate that could be of independent interest due to its sharpness.

preprint2016arXiv

Phase retrieval in the general setting of continuous frames for Banach spaces

We develop a novel and unifying setting for phase retrieval problems that works in Banach spaces and for continuous frames and consider the questions of uniqueness and stability of the reconstruction from phaseless measurements. Our main result states that also in this framework, the problem of phase retrieval is never uniformly stable in infinite dimensions. On the other hand, we show weak stability of the problem. This complements recent work [9], where it has been shown that phase retrieval is always unstable for the setting of discrete frames in Hilbert spaces. In particular, our result implies that the stability properties cannot be improved by oversampling the underlying discrete frame. We generalize the notion of complement property (CP) to the setting of continuous frames for Banach spaces (over $\mathbb{K}=\mathbb{R}$ or $\mathbb{K}=\mathbb{C}$) and verify that it is a necessary condition for uniqueness of the phase retrieval problem; when $\mathbb{K}=\mathbb{R}$ the CP is also sufficient for uniqueness. In our general setting, we also prove a conjecture posed by Bandeira et al. [5], which was originally formulated for finite-dimensional spaces: for the case $\mathbb{K}=\mathbb{C}$ the strong complement property (SCP) is a necessary condition for stability. To prove our main result, we show that the SCP can never hold for frames of infinite-dimensional Banach spaces.

preprint2016arXiv

Reconstructing real-valued functions from unsigned coefficients with respect to wavelet and other frames

In this paper we consider the following problem of phase retrieval: Given a collection of real-valued band-limited functions $\{ψ_λ\}_{λ\in Λ}\subset L^2(\mathbb{R}^d)$ that constitutes a semi-discrete frame, we ask whether any real-valued function $f \in L^2(\mathbb{R}^d)$ can be uniquely recovered from its unsigned convolutions ${\{|f \ast ψ_λ|\}_{λ\in Λ}}$. We find that under some mild assumptions on the semi-discrete frame and if $f$ has exponential decay at $\infty$, it suffices to know $|f \ast ψ_λ|$ on suitably fine lattices to uniquely determine $f$ (up to a global sign factor). We further establish a local stability property of our reconstruction problem. Finally, for two concrete examples of a (discrete) frame of $L^2(\mathbb{R}^d)$, $d=1,2$, we show that through sufficient oversampling one obtains a frame such that any real-valued function with exponential decay can be uniquely recovered from its unsigned frame coefficients.

preprint2014arXiv

$α$-Molecules

Within the area of applied harmonic analysis, various multiscale systems such as wavelets, ridgelets, curvelets, and shearlets have been introduced and successfully applied. The key property of each of those systems are their (optimal) approximation properties in terms of the decay of the $L^2$-error of the best $N$-term approximation for a certain class of functions. In this paper, we introduce the general framework of $α$-molecules, which encompasses most multiscale systems from applied harmonic analysis, in particular, wavelets, ridgelets, curvelets, and shearlets as well as extensions of such with $α$ being a parameter measuring the degree of anisotropy, as a means to allow a unified treatment of approximation results within this area. Based on an $α$-scaled index distance, we first prove that two systems of $α$-molecules are almost orthogonal. This leads to a general methodology to transfer approximation results within this framework, provided that certain consistency and time-frequency localization conditions of the involved systems of $α$-molecules are satisfied. We finally utilize these results to enable the derivation of optimal sparse approximation results \msch{for} a specific class of cartoon-like functions by sufficient conditions on the 'control' parameters of a system of $α$-molecules.

preprint2014arXiv

Cartoon Approximation with $α$-Curvelets

It is well-known that curvelets provide optimal approximations for so-called cartoon images which are defined as piecewise $C^2$-functions, separated by a $C^2$ singularity curve. In this paper, we consider the more general case of piecewise $C^β$-functions, separated by a $C^β$ singularity curve for $β\in (1,2]$. We first prove a benchmark result for the possibly achievable best $N$-term approximation rate for this more general signal model. Then we introduce what we call $α$-curvelets, which are systems that interpolate between wavelet systems on the one hand ($α= 1$) and curvelet systems on the other hand ($α= \frac12$). Our main result states that those frames achieve this optimal rate for $α= \frac{1}β$, up to $\log$-factors.

preprint2014arXiv

Intrinsic Localization of Anisotropic Frames II: $α$-Molecules

This article is a continuation of the recent paper [Grohs, Intrinsic localization of anisotropic frames, ACHA, 2013], where off-diagonal-decay properties (often referred to as 'localization' in the literature) of Moore-Penrose pseudoinverses of (bi-infinite) matrices are established, whenever the latter possess similar off-diagonal-decay properties. This problem is especially interesting if the matrix arises as a discretization of an operator with respect to a frame or basis. Previous work on this problem has been restricted to wavelet- or Gabor frames. In the previous work we extended these results to frames of parabolic molecules, including curvelets or shearlets as special cases. The present paper extends and unifies these results by establishing analogous properties for frames of $α$-molecules as introduced in recent work [Grohs, Keiper, Kutyniok, Schäfer, Alpha molecules: curvelets, shearlets, ridgelets, and beyond, Proc. SPIE. 8858, 2013]. Since wavelets, curvelets, shearlets, ridgelets and hybrid shearlets all constitute instances of $α$-molecules, our results establish localization properties for all these systems simultaneously.

preprint2014arXiv

Optimal Adaptive Ridgelet Schemes for Linear Transport Equations

In this paper we present a novel method for the numerical solution of linear transport equations, which is based on ridgelets. Such equations arise for instance in radiative transfer or in phase contrast imaging. Due to the fact that ridgelet systems are well adapted to the structure of linear transport operators, it can be shown that our scheme operates in optimal complexity, even if line singularities are present in the solution. The key to this is showing that the system matrix (with diagonal preconditioning) is uniformly well-conditioned and compressible -- the proof for the latter represents the main part of the paper. We conclude with some numerical experiments about $N$-term approximations and how they are recovered by the solver, as well as localisation of singularities in the ridgelet frame.

preprint2010arXiv

Continuous Shearlet Tight Frames

Based on the shearlet transform we present a general construction of continuous tight frames for $L^2(\mathbb{R}^2)$ from any sufficiently smooth function with anisotropic moments. This includes for example compactly supported systems, piecewise polynomial systems, or both. From our earlier results it follows that these systems enjoy the same desirable approximation properties for directional data as the previous bandlimited and very specific constructions due to Kutyniok and Labate. We also show that the representation formulas we derive are in a sense optimal for the shearlet transform.

preprint2010arXiv

Definability and stability of multiscale decompositions for manifold-valued data

We discuss multiscale representations of discrete manifold-valued data. As it turns out that we cannot expect general manifold-analogues of biorthogonal wavelets to possess perfect reconstruction, we focus our attention on those constructions which are based on upscaling operators which are either interpolating or midpoint-interpolating. For definable multiscale decompositions we obtain a stability result.

Philipp Grohs

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

Robust and Fast Training via Per-Sample Clipping

Injectivity of Gabor phase retrieval from lattice measurements

On foundational discretization barriers in STFT phase retrieval

Lower bounds for artificial neural network approximations: A proof that shallow neural networks fail to overcome the curse of dimensionality

Approximations with deep neural networks in Sobolev time-space

Deep neural network approximation for high-dimensional elliptic PDEs with boundary conditions

Gabor phase retrieval is severely ill-posed

Phase Retrieval: Uniqueness and Stability

Phase Transitions in Rate Distortion Theory and Deep Learning

Ruled Laguerre minimal surfaces

Deep neural network approximations for Monte Carlo algorithms

Discrete Deep Feature Extraction: A Theory and New Architectures

On the Approximation of Functions with Line Singularities by Ridgelets

Phase retrieval in the general setting of continuous frames for Banach spaces

Reconstructing real-valued functions from unsigned coefficients with respect to wavelet and other frames

$α$-Molecules

Cartoon Approximation with $α$-Curvelets

Intrinsic Localization of Anisotropic Frames II: $α$-Molecules

Optimal Adaptive Ridgelet Schemes for Linear Transport Equations

Continuous Shearlet Tight Frames

Definability and stability of multiscale decompositions for manifold-valued data