Source author record

Felix Voigtlaender

Felix Voigtlaender appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.FA Machine Learning math.CA math-ph math.GN math.MP math.PR Neural and Evolutionary Computing

Catalog footprint

What is connected

18works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Anisotropic Triebel-Lizorkin spaces and wavelet coefficient decay over one-parameter dilation groups, I

This paper provides maximal function characterizations of anisotropic Triebel-Lizorkin spaces associated to general expansive matrices for the full range of parameters $p \in (0,\infty)$, $q \in (0,\infty]$ and $α\in \mathbb{R}$. The equivalent norm is defined in terms of the decay of wavelet coefficients, quantified by a Peetre-type space over a one-parameter dilation group. As an application, the existence of dual molecular frames and Riesz sequences is obtained; the wavelet systems are generated by translations and anisotropic dilations of a single function, where neither the translation nor dilation parameters are required to belong to a discrete subgroup. Explicit criteria for molecules are given in terms of mild decay, moment, and smoothness conditions.

preprint2023arXiv

Anisotropic Triebel-Lizorkin spaces and wavelet coefficient decay over one-parameter dilation groups, II

Continuing previous work, this paper provides maximal characterizations of anisotropic Triebel-Lizorkin spaces $\dot{\mathbf{F}}^α_{p,q}$ for the endpoint case of $p = \infty$ and the full scale of parameters $α\in \mathbb{R}$ and $q \in (0,\infty]$. In particular, a Peetre-type characterization of the anisotropic Besov space $\dot{\mathbf{B}}^α_{\infty,\infty} = \dot{\mathbf{F}}^α_{\infty,\infty}$ is obtained. As a consequence, it is shown that there exist dual molecular frames and Riesz sequences in $\dot{\mathbf{F}}^α_{\infty,q}$.

preprint2022arXiv

$L^p$ sampling numbers for the Fourier-analytic Barron space

In this paper, we consider Barron functions $f : [0,1]^d \to \mathbb{R}$ of smoothness $σ> 0$, which are functions that can be written as \[ f(x) = \int_{\mathbb{R}^d} F(ξ) \, e^{2 πi \langle x, ξ\rangle} \, d ξ \quad \text{with} \quad \int_{\mathbb{R}^d} |F(ξ)| \cdot (1 + |ξ|)^σ \, d ξ< \infty. \] For $σ= 1$, these functions play a prominent role in machine learning, since they can be efficiently approximated by (shallow) neural networks without suffering from the curse of dimensionality. For these functions, we study the following question: Given $m$ point samples $f(x_1),\dots,f(x_m)$ of an unknown Barron function $f : [0,1]^d \to \mathbb{R}$ of smoothness $σ$, how well can $f$ be recovered from these samples, for an optimal choice of the sampling points and the reconstruction procedure? Denoting the optimal reconstruction error measured in $L^p$ by $s_m (σ; L^p)$, we show that \[ m^{- \frac{1}{\max \{ p,2 \}} - \fracσ{d}} \lesssim s_m(σ;L^p) \lesssim (\ln (e + m))^{α(σ,d) / p} \cdot m^{- \frac{1}{\max \{ p,2 \}} - \fracσ{d}} , \] where the implied constants only depend on $σ$ and $d$ and where $α(σ,d)$ stays bounded as $d \to \infty$.

preprint2022arXiv

A fractal uncertainty principle for Bergman spaces and analytic wavelets

Motivated by results of Dyatlov on Fourier uncertainty principles for Cantor sets and by similar results of Knutsen for joint time-frequency representations (i.e., the short-time Fourier transform (STFT) with a Gaussian window, equivalent to Fock spaces), we suggest a general setting relating localization and uncertainty and prove, within this context, an uncertainty principle for Cantor sets in Bergman spaces on the unit disk, where the Cantor set is defined as a union of annuli that are equidistributed in the hyperbolic measure.The result can be written in terms of analytic Cauchy wavelets. As in the case of the STFT considered by Knutsen, our result consists of a two-sided bound for the norm of a localization operator involving the fractal dimension log 2 / log 3 in the exponent. As in the STFT case and in Dyatlov fractal uncertainty principle, the (hyperbolic) measure of the dilated iterates of the Cantor set in the disk tends to infinity, while the corresponding norm of the localization operator tends to zero.

preprint2022arXiv

Invertibility of frame operators on Besov-type decomposition spaces

We derive an extension of the Walnut-Daubechies criterion for the invertibility of frame operators. The criterion concerns general reproducing systems and Besov-type spaces. As an application, we conclude that $L^2$ frame expansions associated with smooth and fast-decaying reproducing systems on sufficiently fine lattices extend to Besov-type spaces. This simplifies and improves recent results on the existence of atomic decompositions, which only provide a particular dual reproducing system with suitable properties. In contrast, we conclude that the $L^2$ canonical frame expansions extend to many other function spaces, and, therefore, operations such as analyzing using the frame, thresholding the resulting coefficients, and then synthesizing using the canonical dual frame are bounded on these spaces.

preprint2022arXiv

Neural network approximation and estimation of classifiers with classification boundary in a Barron class

We prove bounds for the approximation and estimation of certain binary classification functions using ReLU neural networks. Our estimation bounds provide a priori performance guarantees for empirical risk minimization using networks of a suitable size, depending on the number of training samples available. The obtained approximation and estimation rates are independent of the dimension of the input, showing that the curse of dimensionality can be overcome in this setting; in fact, the input dimension only enters in the form of a polynomial factor. Regarding the regularity of the target classification function, we assume the interfaces between the different classes to be locally of Barron-type. We complement our results by studying the relations between various Barron-type spaces that have been proposed in the literature. These spaces differ substantially more from each other than the current literature suggests.

preprint2022arXiv

Time-Frequency Shift Invariance of Gabor Spaces with an $S_0$-Generator

We consider Gabor Riesz sequences generated by a lattice $Λ\subset \mathbb{R}^2$ and a window function $g \in L^2(\mathbb{R})$ which is well localized in both time and frequency. When $g$ belongs to the Feichtinger algebra, we prove that only those time-frequency shifts with parameters from the lattice $Λ$ leave the corresponding Gabor space invariant. This improves on earlier results where only lattices of rational density were considered. A slightly weaker result is proved - again for lattices of general density - under the regularity assumptions of the classical Balian-Low theorem, where both $g$ and its Fourier transform belong to the Sobolev space $H^1(\mathbb{R})$. The proof relies on a combination of methods from time-frequency analysis and the theory of $C^\ast$-algebras, specifically the so-called irrational rotation algebra.

preprint2021arXiv

Equivalence of approximation by convolutional neural networks and fully-connected networks

Convolutional neural networks are the most widely used type of neural networks in applications. In mathematical analysis, however, mostly fully-connected networks are studied. In this paper, we establish a connection between both network architectures. Using this connection, we show that all upper and lower bounds concerning approximation rates of {fully-connected} neural networks for functions $f \in \mathcal{C}$ -- for an arbitrary function class $\mathcal{C}$ -- translate to essentially the same bounds concerning approximation rates of convolutional neural networks for functions $f \in {\mathcal{C}^{equi}}$, with the class ${\mathcal{C}^{equi}}$ consisting of all translation equivariant functions whose first coordinate belongs to $\mathcal{C}$. All presented results consider exclusively the case of convolutional neural networks without any pooling operation and with circular convolutions, i.e., not based on zero-padding.

preprint2021arXiv

On dual molecules and convolution-dominated operators

We show that sampling or interpolation formulas in reproducing kernel Hilbert spaces can be obtained by reproducing kernels whose dual systems form molecules, ensuring that the size profile of a function is fully reflected by the size profile of its sampled values. The main tool is a local holomorphic calculus for convolution-dominated operators, valid for groups with possibly non-polynomial growth. Applied to the matrix coefficients of a group representation, our methods improve on classical results on atomic decompositions and bridge a gap between abstract and concrete methods.

preprint2020arXiv

A general version of Price's theorem

Assume that $X_Σ \in \mathbb{R}^{n}$ is a centered random vector following a multivariate normal distribution with positive definite covariance matrix $Σ$. Let $g : \mathbb{R}^{n} \to \mathbb{C}$ be measurable and of moderate growth, say $|g(x)| \lesssim (1 + |x|)^{N}$. We show that the map $Σ\mapsto \mathbb{E}[g(X_Σ)]$ is smooth, and we derive convenient expressions for its partial derivatives, in terms of certain expectations $\mathbb{E}[(\partial^αg)(X_Σ)]$ of partial (distributional) derivatives of $g$. As we discuss, this result can be used to derive bounds for the expectation $\mathbb{E}[g(X_Σ)]$ of a nonlinear function $g(X_Σ)$ of a Gaussian random vector $X_Σ$ with possibly correlated entries. For the case when $g\left(x\right) = g_{1}(x_{1}) \cdots g_{n}(x_{n})$ has tensor-product structure, the above result is known in the engineering literature as Price's theorem, originally published in 1958. For dimension $n = 2$, it was generalized in 1964 by McMahon to the general case $g : \mathbb{R}^{2} \to \mathbb{C}$. Our contribution is to unify these results, and to give a mathematically fully rigorous proof. Precisely, we consider a normally distributed random vector $X_Σ \in \mathbb{R}^{n}$ of arbitrary dimension $n \in \mathbb{N}$, and we allow the nonlinearity $g$ to be a general tempered distribution. To this end, we replace the expectation $\mathbb{E}\left[g(X_Σ)\right]$ by the dual pairing $\left\langle g,\,ϕ_Σ\right\rangle_{\mathcal{S}',\mathcal{S}}$, where $ϕ_Σ$ denotes the probability density function of $X_Σ$.

preprint2020arXiv

Approximation spaces of deep neural networks

We study the expressivity of deep neural networks. Measuring a network's complexity by its number of connections or by its number of neurons, we consider the class of functions for which the error of best approximation with networks of a given complexity decays at a certain rate when increasing the complexity budget. Using results from classical approximation theory, we show that this class can be endowed with a (quasi)-norm that makes it a linear function space, called approximation space. We establish that allowing the networks to have certain types of "skip connections" does not change the resulting approximation spaces. We also discuss the role of the network's nonlinearity (also known as activation function) on the resulting spaces, as well as the role of depth. For the popular ReLU nonlinearity and its powers, we relate the newly constructed spaces to classical Besov spaces. The established embeddings highlight that some functions of very low Besov smoothness can nevertheless be well approximated by neural networks, if these networks are sufficiently deep.

preprint2020arXiv

Phase Transitions in Rate Distortion Theory and Deep Learning

Rate distortion theory is concerned with optimally encoding a given signal class $\mathcal{S}$ using a budget of $R$ bits, as $R\to\infty$. We say that $\mathcal{S}$ can be compressed at rate $s$ if we can achieve an error of $\mathcal{O}(R^{-s})$ for encoding $\mathcal{S}$; the supremal compression rate is denoted $s^\ast(\mathcal{S})$. Given a fixed coding scheme, there usually are elements of $\mathcal{S}$ that are compressed at a higher rate than $s^\ast(\mathcal{S})$ by the given coding scheme; we study the size of this set of signals. We show that for certain "nice" signal classes $\mathcal{S}$, a phase transition occurs: We construct a probability measure $\mathbb{P}$ on $\mathcal{S}$ such that for every coding scheme $\mathcal{C}$ and any $s >s^\ast(\mathcal{S})$, the set of signals encoded with error $\mathcal{O}(R^{-s})$ by $\mathcal{C}$ forms a $\mathbb{P}$-null-set. In particular our results apply to balls in Besov and Sobolev spaces that embed compactly into $L^2(Ω)$ for a bounded Lipschitz domain $Ω$. As an application, we show that several existing sharpness results concerning function approximation using deep neural networks are generically sharp. We also provide quantitative and non-asymptotic bounds on the probability that a random $f\in\mathcal{S}$ can be encoded to within accuracy $\varepsilon$ using $R$ bits. This result is applied to the problem of approximately representing $f\in\mathcal{S}$ to within accuracy $\varepsilon$ by a (quantized) neural network that is constrained to have at most $W$ nonzero weights and is generated by an arbitrary "learning" procedure. We show that for any $s >s^\ast(\mathcal{S})$ there are constants $c,C$ such that, no matter how we choose the "learning" procedure, the probability of success is bounded from above by $\min\big\{1,2^{C\cdot W\lceil\log_2(1+W)\rceil^2 -c\cdot\varepsilon^{-1/s}}\big\}$.

preprint2020arXiv

Topological properties of the set of functions generated by neural networks of fixed size

We analyze the topological properties of the set of functions that can be implemented by neural networks of a fixed size. Surprisingly, this set has many undesirable properties. It is highly non-convex, except possibly for a few exotic activation functions. Moreover, the set is not closed with respect to $L^p$-norms, $0 < p < \infty$, for all practically-used activation functions, and also not closed with respect to the $L^\infty$-norm for all practically-used activation functions except for the ReLU and the parametric ReLU. Finally, the function that maps a family of weights to the function computed by the associated network is not inverse stable for every practically used activation function. In other words, if $f_1, f_2$ are two functions realized by neural networks and if $f_1, f_2$ are close in the sense that $\|f_1 - f_2\|_{L^\infty} \leq \varepsilon$ for $\varepsilon > 0$, it is, regardless of the size of $\varepsilon$, usually not possible to find weights $w_1, w_2$ close together such that each $f_i$ is realized by a neural network with weights $w_i$. Overall, our findings identify potential causes for issues in the training procedure of deep learning such as no guaranteed convergence, explosion of parameters, and slow convergence.

preprint2016arXiv

Embeddings of Decomposition Spaces into Sobolev and BV Spaces

In the present paper, we investigate whether an embedding of a decomposition space $\mathcal{D}\left(\mathcal{Q},L^{p},Y\right)$ into a given Sobolev space $W^{k,q}(\mathbb{R}^{d})$ exists. As special cases, this includes embeddings into Sobolev spaces of (homogeneous and inhomogeneous) Besov spaces, ($α$)-modulation spaces, shearlet smoothness spaces and also of a large class of wavelet coorbit spaces, in particular of shearlet-type coorbit spaces. Precisely, we will show that under extremely mild assumptions on the covering $\mathcal{Q}=\left(Q_{i}\right)_{i\in I}$, we have $\mathcal{D}\left(\mathcal{Q},L^{p},Y\right)\hookrightarrow W^{k,q}(\mathbb{R}^{d})$ as soon as $p\leq q$ and $Y\hookrightarrow\ell_{u^{\left(k,p,q\right)}}^{q^{\triangledown}}\left(I\right)$ hold. Here, $q^{\triangledown}=\min\left\{ q,q'\right\} $ and the weight $u^{\left(k,p,q\right)}$ can be easily computed, only based on the covering $\mathcal{Q}$ and on the parameters $k,p,q$. Conversely, a necessary condition for existence of the embedding is that $p\leq q$ and $Y\cap\ell_{0}\left(I\right)\hookrightarrow\ell_{u^{\left(k,p,q\right)}}^{q}\left(I\right)$ hold, where $\ell_{0}\left(I\right)$ denotes the space of finitely supported sequences on $I$. All in all, for the range $q \in (0,2]\cup\{\infty\}$, we obtain a complete characterization of existence of the embedding in terms of readily verifiable criteria. We can also completely characterize existence of an embedding of a decomposition space into a BV space.

preprint2016arXiv

From Frazier-Jawerth characterizations of Besov spaces to Wavelets and Decomposition spaces

This article describes how the ideas promoted by the fundamental papers published by M. Frazier and B. Jawerth in the eighties have influenced subsequent developments related to the theory of atomic decompositions and Banach frames for function spaces such as the modulation spaces and Besov-Triebel-Lizorkin spaces. Both of these classes of spaces arise as special cases of two different, general constructions of function spaces: coorbit spaces and decomposition spaces. Coorbit spaces are defined by imposing certain decay conditions on the so-called voice transform of the function/distribution under consideration. As a concrete example, one might think of the wavelet transform, leading to the theory of Besov-Triebel-Lizorkin spaces. Decomposition spaces, on the other hand, are defined using certain decompositions in the Fourier domain. For Besov-Triebel-Lizorkin spaces, one uses a dyadic decomposition, while a uniform decomposition yields modulation spaces. Only recently, the second author has established a fruitful connection between modern variants of wavelet theory with respect to general dilation groups (which can be treated in the context of coorbit theory) and a particular family of decomposition spaces. In this way, optimal inclusion results and invariance properties for a variety of smoothness spaces can be established. We will present an outline of these connections and comment on the basic results arising in this context.

preprint2016arXiv

Structured, compactly supported Banach frame decompositions of decomposition spaces

$\newcommand{mc}[1]{\mathcal{#1}}$ $\newcommand{D}{\mc{D}(\mc{Q},L^p,\ell_w^q)}$ We present a framework for the construction of structured, possibly compactly supported Banach frames and atomic decompositions for decomposition spaces. Such a space $\D$ is defined using a frequency covering $\mc{Q}=(Q_i)_{i\in I}$: If $(φ_i)_{i}$ is a suitable partition of unity subordinate to $\mc{Q}$, then $\Vert g\Vert_{\D}:=\left\Vert\left(\Vert\mc{F}^{-1}(φ_i\hat{g})\Vert_{L^p}\right)_{i}\right\Vert_{\ell_w^q}$. We assume $\mc{Q}=(T_iQ+b_i)_{i}$, with $T_i\in{\rm GL}(\Bbb{R}^d),b_i\in\Bbb{R}^d$. Given a prototype $γ$, we consider the system \[Ψ_{c}=(L_{c\cdot T_i^{-T}k}γ^{[i]})_{i\in I,k\in\Bbb{Z}^d}\text{ with }γ^{[i]}=|\det T_i|^{1/2}\, M_{b_i}(γ\circ T_i^T),\] with translation $L_x$ and modulation $M_ξ$. We provide verifiable conditions on $γ$ under which $Ψ_c$ forms a Banach frame or an atomic decomposition for $\D$, for small enough sampling density $c>0$. Our theory allows compactly supported prototypes and applies for arbitrary $p,q\in(0,\infty]$. Often, $Ψ_c$ is both a Banach frame and an atomic decomposition, so that analysis sparsity is equivalent to synthesis sparsity, i.e. the analysis coefficients $(\langle f,L_{c\cdot T_i^{-T}k}γ^{[i]}\rangle)_{i,k}$ lie in $\ell^p$ iff $f$ belongs to a certain decomposition space, iff $f=\sum_{i,k}c_k^{(i)}\cdot L_{c\cdot T_i^{-T}k}γ^{[i]}$ with $(c_k^{(i)})_{i,k}\in\ell^p$. This is convenient if only analysis sparsity is known to hold: Generally, this only yields synthesis sparsity w.r.t. the dual frame, about which often only little is known. But our theory yields synthesis sparsity w.r.t. the well-understood primal frame. In particular, our theory applies to $α$-modulation spaces and inhom. Besov spaces. It also applies to shearlet frames, as we show in a companion paper.

preprint2014arXiv

Resolution of the wavefront set using general continuous wavelet transforms

We consider the problem of characterizing the wavefront set of a tempered distribution $u\in\mathcal{S}'(\mathbb{R}^{d})$ in terms of its continuous wavelet transform, where the latter is defined with respect to a suitably chosen dilation group $H\subset{\rm GL}(\mathbb{R}^{d})$. In this paper we develop a comprehensive and unified approach that allows to establish characterizations of the wavefront set in terms of rapid coefficient decay, for a large variety of dilation groups. For this purpose, we introduce two technical conditions on the dual action of the group $H$, called microlocal admissibilty and (weak) cone approximation property. Essentially, microlocal admissibilty sets up a systematical relationship between the scales in a wavelet dilated by $h\in H$ on one side, and the matrix norm of $h$ on the other side. The (weak) cone approximation property describes the ability of the wavelet system to adapt its frequency-side localization to arbitrary frequency cones. Together, microlocal admissibility and the weak cone approximation property allow the characterization of points in the wavefront set using multiple wavelets. Replacing the weak cone approximation by its stronger counterpart gives access to single wavelet characterizations. We illustrate the scope of our results by discussing -- in any dimension $d\ge2$ -- the similitude, diagonal and shearlet dilation groups, for which we verify the pertinent conditions. As a result, similitude and diagonal groups can be employed for multiple wavelet characterizations, whereas for the shearlet groups a single wavelet suffices. In particular, the shearlet characterization (previously only established for $d=2$) holds in arbitrary dimensions.

preprint2014arXiv

Wavelet Coorbit Spaces viewed as Decomposition Spaces

In this paper we show that the Fourier transform induces an isomorphism between the coorbit spaces defined by Feichtinger and Gröchenig of the mixed, weighted Lebesgue spaces $L_{v}^{p,q}$ with respect to the quasi-regular representation of a semi-direct product $\mathbb{R}^{d}\rtimes H$ with suitably chosen dilation group $H$, and certain decomposition spaces $\mathcal{D}\left(\mathcal{Q},L^{p},\ell_{u}^{q}\right)$ (essentially as introduced by Feichtinger and Gröbner), where the localized ,,parts`` of a function are measured in the $\mathcal{F}L^{p}$-norm. This equivalence is useful in several ways: It provides access to a Fourier-analytic understanding of wavelet coorbit spaces, and it allows to discuss coorbit spaces associated to different dilation groups in a common framework. As an illustration of these points, we include a short discussion of dilation invariance properties of coorbit spaces associated to different types of dilation groups.

Felix Voigtlaender

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Anisotropic Triebel-Lizorkin spaces and wavelet coefficient decay over one-parameter dilation groups, I

Anisotropic Triebel-Lizorkin spaces and wavelet coefficient decay over one-parameter dilation groups, II

$L^p$ sampling numbers for the Fourier-analytic Barron space

A fractal uncertainty principle for Bergman spaces and analytic wavelets

Invertibility of frame operators on Besov-type decomposition spaces

Neural network approximation and estimation of classifiers with classification boundary in a Barron class

Time-Frequency Shift Invariance of Gabor Spaces with an $S_0$-Generator

Equivalence of approximation by convolutional neural networks and fully-connected networks

On dual molecules and convolution-dominated operators

A general version of Price's theorem

Approximation spaces of deep neural networks

Phase Transitions in Rate Distortion Theory and Deep Learning

Topological properties of the set of functions generated by neural networks of fixed size

Embeddings of Decomposition Spaces into Sobolev and BV Spaces

From Frazier-Jawerth characterizations of Besov spaces to Wavelets and Decomposition spaces

Structured, compactly supported Banach frame decompositions of decomposition spaces

Resolution of the wavefront set using general continuous wavelet transforms

Wavelet Coorbit Spaces viewed as Decomposition Spaces