Researcher profile

Gitta Kutyniok

Gitta Kutyniok contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
15topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2023arXiv

Computability of Optimizers

Optimization problems are a staple of today's scientific and technical landscape. However, at present, solvers of such problems are almost exclusively run on digital hardware. Using Turing machines as a mathematical model for any type of digital hardware, in this paper, we analyze fundamental limitations of this conceptual approach of solving optimization problems. Since in most applications, the optimizer itself is of significantly more interest than the optimal value of the corresponding function, we will focus on computability of the optimizer. In fact, we will show that in various situations the optimizer is unattainable on Turing machines and consequently on digital computers. Moreover, even worse, there does not exist a Turing machine, which approximates the optimizer itself up to a certain constant error. We prove such results for a variety of well-known problems from very different areas, including artificial intelligence, financial mathematics, and information theory, often deriving the even stronger result that such problems are not Banach-Mazur computable, also not even in an approximate sense.

preprint2022arXiv

Analyzing Finite Neural Networks: Can We Trust Neural Tangent Kernel Theory?

Neural Tangent Kernel (NTK) theory is widely used to study the dynamics of infinitely-wide deep neural networks (DNNs) under gradient descent. But do the results for infinitely-wide networks give us hints about the behavior of real finite-width ones? In this paper, we study empirically when NTK theory is valid in practice for fully-connected ReLU and sigmoid DNNs. We find out that whether a network is in the NTK regime depends on the hyperparameters of random initialization and the network's depth. In particular, NTK theory does not explain the behavior of sufficiently deep networks initialized so that their gradients explode as they propagate through the network's layers: the kernel is random at initialization and changes significantly during training in this case, contrary to NTK theory. On the other hand, in the case of vanishing gradients, DNNs are in the the NTK regime but become untrainable rapidly with depth. We also describe a framework to study generalization properties of DNNs, in particular the variance of network's output function, by means of NTK theory and discuss its limits.

preprint2022arXiv

Generalization Analysis of Message Passing Neural Networks on Large Random Graphs

Message passing neural networks (MPNN) have seen a steep rise in popularity since their introduction as generalizations of convolutional neural networks to graph-structured data, and are now considered state-of-the-art tools for solving a large variety of graph-focused problems. We study the generalization error of MPNNs in graph classification and regression. We assume that graphs of different classes are sampled from different random graph models. We show that, when training a MPNN on a dataset sampled from such a distribution, the generalization gap increases in the complexity of the MPNN, and decreases, not only with respect to the number of training samples, but also with the average number of nodes in the graphs. This shows how a MPNN with high complexity can generalize from a small dataset of graphs, as long as the graphs are large. The generalization bound is derived from a uniform convergence result, that shows that any MPNN, applied on a graph, approximates the MPNN applied on the geometric model that the graph discretizes.

preprint2022arXiv

LocUNet: Fast Urban Positioning Using Radio Maps and Deep Learning

This paper deals with the problem of localization in a cellular network in a dense urban scenario. Global Navigation Satellite Systems (GNSS) typically perform poorly in urban environments, where the likelihood of line-of-sight conditions is low, and thus alternative localization methods are required for good accuracy. We present LocUNet: A deep learning method for localization, based merely on Received Signal Strength (RSS) from Base Stations (BSs), which does not require any increase in computation complexity at the user devices with respect to the device standard operations, unlike methods that rely on time of arrival or angle of arrival information. In the proposed method, the user to be localized reports the RSS from BSs to a Central Processing Unit (CPU), which may be located in the cloud. Alternatively, the localization can be performed locally at the user. Using estimated pathloss radio maps of the BSs, LocUNet can localize users with state-of-the-art accuracy and enjoys high robustness to inaccuracies in the radio maps. The proposed method does not require pre-sampling of the environment; and is suitable for real-time applications, thanks to the RadioUNet, a neural network-based radio map estimator. We also introduce two datasets that allow numerical comparisons of RSS and Time of Arrival (ToA) methods in realistic urban environments.

preprint2022arXiv

Neural Tangent Kernel Beyond the Infinite-Width Limit: Effects of Depth and Initialization

Neural Tangent Kernel (NTK) is widely used to analyze overparametrized neural networks due to the famous result by Jacot et al. (2018): in the infinite-width limit, the NTK is deterministic and constant during training. However, this result cannot explain the behavior of deep networks, since it generally does not hold if depth and width tend to infinity simultaneously. In this paper, we study the NTK of fully-connected ReLU networks with depth comparable to width. We prove that the NTK properties depend significantly on the depth-to-width ratio and the distribution of parameters at initialization. In fact, our results indicate the importance of the three phases in the hyperparameter space identified in Poole et al. (2016): ordered, chaotic and the edge of chaos (EOC). We derive exact expressions for the NTK dispersion in the infinite-depth-and-width limit in all three phases and conclude that the NTK variability grows exponentially with depth at the EOC and in the chaotic phase but not in the ordered phase. We also show that the NTK of deep networks may stay constant during training only in the ordered phase and discuss how the structure of the NTK matrix changes during training.

preprint2022arXiv

The Mathematics of Artificial Intelligence

We currently witness the spectacular success of artificial intelligence in both science and public life. However, the development of a rigorous mathematical foundation is still at an early stage. In this survey article, which is based on an invited lecture at the International Congress of Mathematicians 2022, we will in particular focus on the current "workhorse" of artificial intelligence, namely deep neural networks. We will present the main theoretical directions along with several exemplary results and discuss key open problems.

preprint2022arXiv

Transferability of Graph Neural Networks: an Extended Graphon Approach

We study spectral graph convolutional neural networks (GCNNs), where filters are defined as continuous functions of the graph shift operator (GSO) through functional calculus. A spectral GCNN is not tailored to one specific graph and can be transferred between different graphs. It is hence important to study the GCNN transferability: the capacity of the network to have approximately the same repercussion on different graphs that represent the same phenomenon. Transferability ensures that GCNNs trained on certain graphs generalize if the graphs in the test set represent the same phenomena as the graphs in the training set. In this paper, we consider a model of transferability based on graphon analysis. Graphons are limit objects of graphs, and, in the graph paradigm, two graphs represent the same phenomenon if both approximate the same graphon. Our main contributions can be summarized as follows: 1) we prove that any fixed GCNN with continuous filters is transferable under graphs that approximate the same graphon, 2) we prove transferability for graphs that approximate unbounded graphon shift operators, which are defined in this paper, and, 3) we obtain non-asymptotic approximation results, proving linear stability of GCNNs. This extends current state-of-the-art results which show asymptotic transferability for polynomial filters under graphs that approximate bounded graphons.

preprint2021arXiv

$\ell^1$-Analysis Minimization and Generalized (Co-)Sparsity: When Does Recovery Succeed?

This paper investigates the problem of signal estimation from undersampled noisy sub-Gaussian measurements under the assumption of a cosparse model. Based on generalized notions of sparsity, we derive novel recovery guarantees for the $\ell^1$-analysis basis pursuit, enabling accurate predictions of its sample complexity. The corresponding bounds on the number of required measurements do explicitly depend on the Gram matrix of the analysis operator and therefore particularly account for its mutual coherence structure. Our findings defy conventional wisdom which promotes the sparsity of analysis coefficients as the crucial quantity to study. In fact, this common paradigm breaks down completely in many situations of practical interest, for instance, when applying a redundant (multilevel) frame as analysis prior. By extensive numerical experiments, we demonstrate that, in contrast, our theoretical sampling-rate bounds reliably capture the recovery capability of various examples, such as redundant wavelets systems, total variation, or random frames. The proofs of our main results build upon recent achievements in the convex geometry of data mining problems. More precisely, we establish a sophisticated upper bound on the conic Gaussian mean width that is associated with the underlying $\ell^1$-analysis polytope. Due to a novel localization argument, it turns out that the presented framework naturally extends to stable recovery, allowing us to incorporate compressible coefficient sequences as well.

preprint2020arXiv

A Theoretical Analysis of Deep Neural Networks and Parametric PDEs

We derive upper bounds on the complexity of ReLU neural networks approximating the solution maps of parametric partial differential equations. In particular, without any knowledge of its concrete shape, we use the inherent low-dimensionality of the solution manifold to obtain approximation rates which are significantly superior to those provided by classical neural network approximation results. Concretely, we use the existence of a small reduced basis to construct, for a large variety of parametric partial differential equations, neural networks that yield approximations of the parametric solution maps in such a way that the sizes of these networks essentially only depend on the size of the reduced basis.

preprint2020arXiv

Approximation spaces of deep neural networks

We study the expressivity of deep neural networks. Measuring a network's complexity by its number of connections or by its number of neurons, we consider the class of functions for which the error of best approximation with networks of a given complexity decays at a certain rate when increasing the complexity budget. Using results from classical approximation theory, we show that this class can be endowed with a (quasi)-norm that makes it a linear function space, called approximation space. We establish that allowing the networks to have certain types of "skip connections" does not change the resulting approximation spaces. We also discuss the role of the network's nonlinearity (also known as activation function) on the resulting spaces, as well as the role of depth. For the popular ReLU nonlinearity and its powers, we relate the newly constructed spaces to classical Besov spaces. The established embeddings highlight that some functions of very low Besov smoothness can nevertheless be well approximated by neural networks, if these networks are sufficiently deep.

preprint2020arXiv

Expressivity of Deep Neural Networks

In this review paper, we give a comprehensive overview of the large variety of approximation results for neural networks. Approximation rates for classical function spaces as well as benefits of deep neural networks over shallow ones for specifically structured function classes are discussed. While the mainbody of existing results is for general feedforward architectures, we also depict approximation results for convolutional, residual and recurrent neural networks.

preprint2020arXiv

In-Distribution Interpretability for Challenging Modalities

It is widely recognized that the predictions of deep neural networks are difficult to parse relative to simpler approaches. However, the development of methods to investigate the mode of operation of such models has advanced rapidly in the past few years. Recent work introduced an intuitive framework which utilizes generative models to improve on the meaningfulness of such explanations. In this work, we display the flexibility of this method to interpret diverse and challenging modalities: music and physical simulations of urban environments.

preprint2020arXiv

Interval Neural Networks: Uncertainty Scores

We propose a fast, non-Bayesian method for producing uncertainty scores in the output of pre-trained deep neural networks (DNNs) using a data-driven interval propagating network. This interval neural network (INN) has interval valued parameters and propagates its input using interval arithmetic. The INN produces sensible lower and upper bounds encompassing the ground truth. We provide theoretical justification for the validity of these bounds. Furthermore, its asymmetric uncertainty scores offer additional, directional information beyond what Gaussian-based, symmetric variance estimation can provide. We find that noise in the data is adequately captured by the intervals produced with our method. In numerical experiments on an image reconstruction task, we demonstrate the practical utility of INNs as a proxy for the prediction error in comparison to two state-of-the-art uncertainty quantification methods. In summary, INNs produce fast, theoretically justified uncertainty scores for DNNs that are easy to interpret, come with added information and pose as improved error proxies - features that may prove useful in advancing the usability of DNNs especially in sensitive applications such as health care.

preprint2020arXiv

Numerical Solution of the Parametric Diffusion Equation by Deep Neural Networks

We perform a comprehensive numerical study of the effect of approximation-theoretical results for neural networks on practical learning problems in the context of numerical analysis. As the underlying model, we study the machine-learning-based solution of parametric partial differential equations. Here, approximation theory predicts that the performance of the model should depend only very mildly on the dimension of the parameter space and is determined by the intrinsic dimension of the solution manifold of the parametric partial differential equation. We use various methods to establish comparability between test-cases by minimizing the effect of the choice of test-cases on the optimization and sampling aspects of the learning problem. We find strong support for the hypothesis that approximation-theoretical effects heavily influence the practical behavior of learning problems in numerical analysis.

preprint2020arXiv

Real-time Localization Using Radio Maps

This paper deals with the problem of localization in a cellular network in a dense urban scenario. Global Navigation Satellite System typically performs poorly in urban environments when there is no line-of-sight between the devices and the satellites, and thus alternative localization methods are often required. We present a simple yet effective method for localization based on pathloss. In our approach, the user to be localized reports the received signal strength from a set of base stations with known locations. For each base station we have a good approximation of the pathloss at each location in the map, provided by RadioUNet, an efficient deep learning-based simulator of pathloss functions in urban environment, akin to ray-tracing. Using the approximations of the pathloss functions of all base stations and the reported signal strengths, we are able to extract a very accurate approximation of the location of the user.

preprint2020arXiv

tfShearlab: The TensorFlow Digital Shearlet Transform for Deep Learning

The shearlet transform from applied harmonic analysis is currently the state of the art when analyzing multidimensional signals with anisotropic singularities. Its optimal sparse approximation properties and its faithful digitalization allow shearlets to be applied to different problems from imaging science, such as image denoising, image inpainting, and singularities detection. The shearlet transform has also be successfully utilized, for instance, as a feature extractor. As such it has been shown to be well suited for image preprocessing in combination with data-driven methods such as deep neural networks. This requires in particular an implementation of the shearlet transform in the current deep learning frameworks, such as TensorFlow. With this motivation we developed a tensor shearlet transform aiming to provide a faithful TensorFlow implementation. In addition to its usability in predictive models, we also observed an significant improvement in the performance of the transform, with a running time of almost 40 times the previous state-of-the-art implementation. In this paper, we will also present several numerical experiments such as image denoising and inpainting, where the TensorFlow version can be shown to outperform previous libraries as well as the learned primal-dual reconstruction method for low dose computed tomography in running time.

preprint2020arXiv

The Restricted Isometry of ReLU Networks: Generalization through Norm Concentration

While regression tasks aim at interpolating a relation on the entire input space, they often have to be solved with a limited amount of training data. Still, if the hypothesis functions can be sketched well with the data, one can hope for identifying a generalizing model. In this work, we introduce with the Neural Restricted Isometry Property (NeuRIP) a uniform concentration event, in which all shallow $\mathrm{ReLU}$ networks are sketched with the same quality. To derive the sample complexity for achieving NeuRIP, we bound the covering numbers of the networks in the Sub-Gaussian metric and apply chaining techniques. In case of the NeuRIP event, we then provide bounds on the expected risk, which hold for networks in any sublevel set of the empirical risk. We conclude that all networks with sufficiently small empirical risk generalize uniformly.