Researcher profile

Felix Voigtlaender

Felix Voigtlaender contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2023arXiv

Anisotropic Triebel-Lizorkin spaces and wavelet coefficient decay over one-parameter dilation groups, I

This paper provides maximal function characterizations of anisotropic Triebel-Lizorkin spaces associated to general expansive matrices for the full range of parameters $p \in (0,\infty)$, $q \in (0,\infty]$ and $α\in \mathbb{R}$. The equivalent norm is defined in terms of the decay of wavelet coefficients, quantified by a Peetre-type space over a one-parameter dilation group. As an application, the existence of dual molecular frames and Riesz sequences is obtained; the wavelet systems are generated by translations and anisotropic dilations of a single function, where neither the translation nor dilation parameters are required to belong to a discrete subgroup. Explicit criteria for molecules are given in terms of mild decay, moment, and smoothness conditions.

preprint2023arXiv

Anisotropic Triebel-Lizorkin spaces and wavelet coefficient decay over one-parameter dilation groups, II

Continuing previous work, this paper provides maximal characterizations of anisotropic Triebel-Lizorkin spaces $\dot{\mathbf{F}}^α_{p,q}$ for the endpoint case of $p = \infty$ and the full scale of parameters $α\in \mathbb{R}$ and $q \in (0,\infty]$. In particular, a Peetre-type characterization of the anisotropic Besov space $\dot{\mathbf{B}}^α_{\infty,\infty} = \dot{\mathbf{F}}^α_{\infty,\infty}$ is obtained. As a consequence, it is shown that there exist dual molecular frames and Riesz sequences in $\dot{\mathbf{F}}^α_{\infty,q}$.

preprint2022arXiv

$L^p$ sampling numbers for the Fourier-analytic Barron space

In this paper, we consider Barron functions $f : [0,1]^d \to \mathbb{R}$ of smoothness $σ> 0$, which are functions that can be written as \[ f(x) = \int_{\mathbb{R}^d} F(ξ) \, e^{2 πi \langle x, ξ\rangle} \, d ξ \quad \text{with} \quad \int_{\mathbb{R}^d} |F(ξ)| \cdot (1 + |ξ|)^σ \, d ξ< \infty. \] For $σ= 1$, these functions play a prominent role in machine learning, since they can be efficiently approximated by (shallow) neural networks without suffering from the curse of dimensionality. For these functions, we study the following question: Given $m$ point samples $f(x_1),\dots,f(x_m)$ of an unknown Barron function $f : [0,1]^d \to \mathbb{R}$ of smoothness $σ$, how well can $f$ be recovered from these samples, for an optimal choice of the sampling points and the reconstruction procedure? Denoting the optimal reconstruction error measured in $L^p$ by $s_m (σ; L^p)$, we show that \[ m^{- \frac{1}{\max \{ p,2 \}} - \fracσ{d}} \lesssim s_m(σ;L^p) \lesssim (\ln (e + m))^{α(σ,d) / p} \cdot m^{- \frac{1}{\max \{ p,2 \}} - \fracσ{d}} , \] where the implied constants only depend on $σ$ and $d$ and where $α(σ,d)$ stays bounded as $d \to \infty$.

preprint2022arXiv

A fractal uncertainty principle for Bergman spaces and analytic wavelets

Motivated by results of Dyatlov on Fourier uncertainty principles for Cantor sets and by similar results of Knutsen for joint time-frequency representations (i.e., the short-time Fourier transform (STFT) with a Gaussian window, equivalent to Fock spaces), we suggest a general setting relating localization and uncertainty and prove, within this context, an uncertainty principle for Cantor sets in Bergman spaces on the unit disk, where the Cantor set is defined as a union of annuli that are equidistributed in the hyperbolic measure.The result can be written in terms of analytic Cauchy wavelets. As in the case of the STFT considered by Knutsen, our result consists of a two-sided bound for the norm of a localization operator involving the fractal dimension log 2 / log 3 in the exponent. As in the STFT case and in Dyatlov fractal uncertainty principle, the (hyperbolic) measure of the dilated iterates of the Cantor set in the disk tends to infinity, while the corresponding norm of the localization operator tends to zero.

preprint2022arXiv

Invertibility of frame operators on Besov-type decomposition spaces

We derive an extension of the Walnut-Daubechies criterion for the invertibility of frame operators. The criterion concerns general reproducing systems and Besov-type spaces. As an application, we conclude that $L^2$ frame expansions associated with smooth and fast-decaying reproducing systems on sufficiently fine lattices extend to Besov-type spaces. This simplifies and improves recent results on the existence of atomic decompositions, which only provide a particular dual reproducing system with suitable properties. In contrast, we conclude that the $L^2$ canonical frame expansions extend to many other function spaces, and, therefore, operations such as analyzing using the frame, thresholding the resulting coefficients, and then synthesizing using the canonical dual frame are bounded on these spaces.

preprint2022arXiv

Neural network approximation and estimation of classifiers with classification boundary in a Barron class

We prove bounds for the approximation and estimation of certain binary classification functions using ReLU neural networks. Our estimation bounds provide a priori performance guarantees for empirical risk minimization using networks of a suitable size, depending on the number of training samples available. The obtained approximation and estimation rates are independent of the dimension of the input, showing that the curse of dimensionality can be overcome in this setting; in fact, the input dimension only enters in the form of a polynomial factor. Regarding the regularity of the target classification function, we assume the interfaces between the different classes to be locally of Barron-type. We complement our results by studying the relations between various Barron-type spaces that have been proposed in the literature. These spaces differ substantially more from each other than the current literature suggests.

preprint2022arXiv

Time-Frequency Shift Invariance of Gabor Spaces with an $S_0$-Generator

We consider Gabor Riesz sequences generated by a lattice $Λ\subset \mathbb{R}^2$ and a window function $g \in L^2(\mathbb{R})$ which is well localized in both time and frequency. When $g$ belongs to the Feichtinger algebra, we prove that only those time-frequency shifts with parameters from the lattice $Λ$ leave the corresponding Gabor space invariant. This improves on earlier results where only lattices of rational density were considered. A slightly weaker result is proved - again for lattices of general density - under the regularity assumptions of the classical Balian-Low theorem, where both $g$ and its Fourier transform belong to the Sobolev space $H^1(\mathbb{R})$. The proof relies on a combination of methods from time-frequency analysis and the theory of $C^\ast$-algebras, specifically the so-called irrational rotation algebra.

preprint2021arXiv

Equivalence of approximation by convolutional neural networks and fully-connected networks

Convolutional neural networks are the most widely used type of neural networks in applications. In mathematical analysis, however, mostly fully-connected networks are studied. In this paper, we establish a connection between both network architectures. Using this connection, we show that all upper and lower bounds concerning approximation rates of {fully-connected} neural networks for functions $f \in \mathcal{C}$ -- for an arbitrary function class $\mathcal{C}$ -- translate to essentially the same bounds concerning approximation rates of convolutional neural networks for functions $f \in {\mathcal{C}^{equi}}$, with the class ${\mathcal{C}^{equi}}$ consisting of all translation equivariant functions whose first coordinate belongs to $\mathcal{C}$. All presented results consider exclusively the case of convolutional neural networks without any pooling operation and with circular convolutions, i.e., not based on zero-padding.

preprint2021arXiv

On dual molecules and convolution-dominated operators

We show that sampling or interpolation formulas in reproducing kernel Hilbert spaces can be obtained by reproducing kernels whose dual systems form molecules, ensuring that the size profile of a function is fully reflected by the size profile of its sampled values. The main tool is a local holomorphic calculus for convolution-dominated operators, valid for groups with possibly non-polynomial growth. Applied to the matrix coefficients of a group representation, our methods improve on classical results on atomic decompositions and bridge a gap between abstract and concrete methods.

preprint2020arXiv

A general version of Price&#39;s theorem

Assume that $X_Σ \in \mathbb{R}^{n}$ is a centered random vector following a multivariate normal distribution with positive definite covariance matrix $Σ$. Let $g : \mathbb{R}^{n} \to \mathbb{C}$ be measurable and of moderate growth, say $|g(x)| \lesssim (1 + |x|)^{N}$. We show that the map $Σ\mapsto \mathbb{E}[g(X_Σ)]$ is smooth, and we derive convenient expressions for its partial derivatives, in terms of certain expectations $\mathbb{E}[(\partial^αg)(X_Σ)]$ of partial (distributional) derivatives of $g$. As we discuss, this result can be used to derive bounds for the expectation $\mathbb{E}[g(X_Σ)]$ of a nonlinear function $g(X_Σ)$ of a Gaussian random vector $X_Σ$ with possibly correlated entries. For the case when $g\left(x\right) = g_{1}(x_{1}) \cdots g_{n}(x_{n})$ has tensor-product structure, the above result is known in the engineering literature as Price&#39;s theorem, originally published in 1958. For dimension $n = 2$, it was generalized in 1964 by McMahon to the general case $g : \mathbb{R}^{2} \to \mathbb{C}$. Our contribution is to unify these results, and to give a mathematically fully rigorous proof. Precisely, we consider a normally distributed random vector $X_Σ \in \mathbb{R}^{n}$ of arbitrary dimension $n \in \mathbb{N}$, and we allow the nonlinearity $g$ to be a general tempered distribution. To this end, we replace the expectation $\mathbb{E}\left[g(X_Σ)\right]$ by the dual pairing $\left\langle g,\,ϕ_Σ\right\rangle_{\mathcal{S}&#39;,\mathcal{S}}$, where $ϕ_Σ$ denotes the probability density function of $X_Σ$.

preprint2020arXiv

Approximation spaces of deep neural networks

We study the expressivity of deep neural networks. Measuring a network&#39;s complexity by its number of connections or by its number of neurons, we consider the class of functions for which the error of best approximation with networks of a given complexity decays at a certain rate when increasing the complexity budget. Using results from classical approximation theory, we show that this class can be endowed with a (quasi)-norm that makes it a linear function space, called approximation space. We establish that allowing the networks to have certain types of &#34;skip connections&#34; does not change the resulting approximation spaces. We also discuss the role of the network&#39;s nonlinearity (also known as activation function) on the resulting spaces, as well as the role of depth. For the popular ReLU nonlinearity and its powers, we relate the newly constructed spaces to classical Besov spaces. The established embeddings highlight that some functions of very low Besov smoothness can nevertheless be well approximated by neural networks, if these networks are sufficiently deep.

preprint2020arXiv

Phase Transitions in Rate Distortion Theory and Deep Learning

Rate distortion theory is concerned with optimally encoding a given signal class $\mathcal{S}$ using a budget of $R$ bits, as $R\to\infty$. We say that $\mathcal{S}$ can be compressed at rate $s$ if we can achieve an error of $\mathcal{O}(R^{-s})$ for encoding $\mathcal{S}$; the supremal compression rate is denoted $s^\ast(\mathcal{S})$. Given a fixed coding scheme, there usually are elements of $\mathcal{S}$ that are compressed at a higher rate than $s^\ast(\mathcal{S})$ by the given coding scheme; we study the size of this set of signals. We show that for certain &#34;nice&#34; signal classes $\mathcal{S}$, a phase transition occurs: We construct a probability measure $\mathbb{P}$ on $\mathcal{S}$ such that for every coding scheme $\mathcal{C}$ and any $s >s^\ast(\mathcal{S})$, the set of signals encoded with error $\mathcal{O}(R^{-s})$ by $\mathcal{C}$ forms a $\mathbb{P}$-null-set. In particular our results apply to balls in Besov and Sobolev spaces that embed compactly into $L^2(Ω)$ for a bounded Lipschitz domain $Ω$. As an application, we show that several existing sharpness results concerning function approximation using deep neural networks are generically sharp. We also provide quantitative and non-asymptotic bounds on the probability that a random $f\in\mathcal{S}$ can be encoded to within accuracy $\varepsilon$ using $R$ bits. This result is applied to the problem of approximately representing $f\in\mathcal{S}$ to within accuracy $\varepsilon$ by a (quantized) neural network that is constrained to have at most $W$ nonzero weights and is generated by an arbitrary &#34;learning&#34; procedure. We show that for any $s >s^\ast(\mathcal{S})$ there are constants $c,C$ such that, no matter how we choose the &#34;learning&#34; procedure, the probability of success is bounded from above by $\min\big\{1,2^{C\cdot W\lceil\log_2(1+W)\rceil^2 -c\cdot\varepsilon^{-1/s}}\big\}$.

preprint2020arXiv

Topological properties of the set of functions generated by neural networks of fixed size

We analyze the topological properties of the set of functions that can be implemented by neural networks of a fixed size. Surprisingly, this set has many undesirable properties. It is highly non-convex, except possibly for a few exotic activation functions. Moreover, the set is not closed with respect to $L^p$-norms, $0 < p < \infty$, for all practically-used activation functions, and also not closed with respect to the $L^\infty$-norm for all practically-used activation functions except for the ReLU and the parametric ReLU. Finally, the function that maps a family of weights to the function computed by the associated network is not inverse stable for every practically used activation function. In other words, if $f_1, f_2$ are two functions realized by neural networks and if $f_1, f_2$ are close in the sense that $\|f_1 - f_2\|_{L^\infty} \leq \varepsilon$ for $\varepsilon > 0$, it is, regardless of the size of $\varepsilon$, usually not possible to find weights $w_1, w_2$ close together such that each $f_i$ is realized by a neural network with weights $w_i$. Overall, our findings identify potential causes for issues in the training procedure of deep learning such as no guaranteed convergence, explosion of parameters, and slow convergence.

preprint2014arXiv

Wavelet Coorbit Spaces viewed as Decomposition Spaces

In this paper we show that the Fourier transform induces an isomorphism between the coorbit spaces defined by Feichtinger and Gröchenig of the mixed, weighted Lebesgue spaces $L_{v}^{p,q}$ with respect to the quasi-regular representation of a semi-direct product $\mathbb{R}^{d}\rtimes H$ with suitably chosen dilation group $H$, and certain decomposition spaces $\mathcal{D}\left(\mathcal{Q},L^{p},\ell_{u}^{q}\right)$ (essentially as introduced by Feichtinger and Gröbner), where the localized ,,parts`` of a function are measured in the $\mathcal{F}L^{p}$-norm. This equivalence is useful in several ways: It provides access to a Fourier-analytic understanding of wavelet coorbit spaces, and it allows to discuss coorbit spaces associated to different dilation groups in a common framework. As an illustration of these points, we include a short discussion of dilation invariance properties of coorbit spaces associated to different types of dilation groups.