Researcher profile

Sara van de Geer

Sara van de Geer contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
4topics
2close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2021arXiv

Adaptive Rates for Total Variation Image Denoising

We study the theoretical properties of image denoising via total variation penalized least-squares. We define the total vatiation in terms of the two-dimensional total discrete derivative of the image and show that it gives rise to denoised images that are piecewise constant on rectangular sets. We prove that, if the true image is piecewise constant on just a few rectangular sets, the denoised image converges to the true image at a parametric rate, up to a log factor. More generally, we show that the denoised image enjoys oracle properties, that is, it is almost as good as if some aspects of the true image were known. In other words, image denoising with total variation regularization leads to an adaptive reconstruction of the true image.

preprint2021arXiv

Deep ReLU Programming

Feed-forward ReLU neural networks partition their input domain into finitely many "affine regions" of constant neuron activation pattern and affine behaviour. We analyze their mathematical structure and provide algorithmic primitives for an efficient application of linear programming related techniques for iterative minimization of such non-convex functions. In particular, we propose an extension of the Simplex algorithm which is iterating on induced vertices but, in addition, is able to change its feasible region computationally efficiently to adjacent "affine regions". This way, we obtain the Barrodale-Roberts algorithm for LAD regression as a special case, but also are able to train the first layer of neural networks with L1 training loss decreasing in every step.

preprint2021arXiv

Tensor denoising with trend filtering

We extend the notion of trend filtering to tensors by considering the $k^{\rm th}$-order Vitali variation, a discretized version of the integral of the absolute value of the $k^{\rm th}$-order total derivative. We prove adaptive $\ell^0$-rates and not-so-slow $\ell^1$-rates for tensor denoising with trend filtering. For $k=\{1,2,3,4\}$ we prove that the $d$-dimensional margin of a $d$-dimensional tensor can be estimated at the $\ell^0$-rate $n^{-1}$, up to logarithmic terms, if the underlying tensor is a product of $(k-1)^{\rm th}$-order polynomials on a constant number of hyperrectangles. For general $k$ we prove the $\ell^1$-rate of estimation $n^{- \frac{H(d)+2k-1}{2H(d)+2k-1}}$, up to logarithmic terms, where $H(d)$ is the $d^{\rm th}$ harmonic number. Thanks to an ANOVA-type of decomposition we can apply these results to the lower dimensional margins of the tensor to prove bounds for denoising the whole tensor. Our tools are interpolating tensors to bound the effective sparsity for $\ell^0$-rates, mesh grids for $\ell^1$-rates and, in the background, the projection arguments by Dalalyan et al.

preprint2020arXiv

A Framework for the construction of upper bounds on the number of affine linear regions of ReLU feed-forward neural networks

We present a framework to derive upper bounds on the number of regions that feed-forward neural networks with ReLU activation functions are affine linear on. It is based on an inductive analysis that keeps track of the number of such regions per dimensionality of their images within the layers. More precisely, the information about the number regions per dimensionality is pushed through the layers starting with one region of the input dimension of the neural network and using a recursion based on an analysis of how many regions per output dimensionality a subsequent layer with a certain width can induce on an input region with a given dimensionality. The final bound on the number of regions depends on the number and widths of the layers of the neural network and on some additional parameters that were used for the recursion. It is stated in terms of the $L1$-norm of the last column of a product of matrices and provides a unifying treatment of several previously known bounds: Depending on the choice of the recursion parameters that determine these matrices, it is possible to obtain the bounds from Montúfar (2014), (2017) and Serra et. al. (2017) as special cases. For the latter, which is the strongest of these bounds, the formulation in terms of matrices provides new insight. In particular, by using explicit formulas for a Jordan-like decomposition of the involved matrices, we achieve new tighter results for the asymptotic setting, where the number of layers of the same fixed width tends to infinity.

preprint2020arXiv

Logistic regression with total variation regularization

We study logistic regression with total variation penalty on the canonical parameter and show that the resulting estimator satisfies a sharp oracle inequality: the excess risk of the estimator is adaptive to the number of jumps of the underlying signal or an approximation thereof. In particular when there are finitely many jumps, and jumps up are sufficiently separated from jumps down, then the estimator converges with a parametric rate up to a logarithmic term $\log n / n$, provided the tuning parameter is chosen appropriately of order $1/ \sqrt n$. Our results extend earlier results for quadratic loss to logistic loss. We do not assume any a priori known bounds on the canonical parameter but instead only make use of the local curvature of the theoretical risk.

preprint2020arXiv

Prediction bounds for higher order total variation regularized least squares

We establish adaptive results for trend filtering: least squares estimation with a penalty on the total variation of $(k-1)^{\rm th}$ order differences. Our approach is based on combining a general oracle inequality for the $\ell_1$-penalized least squares estimator with "interpolating vectors" to upper-bound the "effective sparsity". This allows one to show that the $\ell_1$-penalty on the $k^{\text{th}}$ order differences leads to an estimator that can adapt to the number of jumps in the $(k-1)^{\text{th}}$ order differences of the underlying signal or an approximation thereof. We show the result for $k \in \{1,2,3,4\}$ and indicate how it could be derived for general $k\in \mathbb{N}$.

preprint2019arXiv

Oracle inequalities for square root analysis estimators with application to total variation penalties

Through the direct study of the analysis estimator we derive oracle inequalities with fast and slow rates by adapting the arguments involving projections by Dalalyan, Hebiri and Lederer (2017). We then extend the theory to the square root analysis estimator. Finally, we focus on (square root) total variation regularized estimators on graphs and obtain constant-friendly rates, which, up to log-terms, match previous results obtained by entropy calculations. We also obtain an oracle inequality for the (square root) total variation regularized estimator over the cycle graph.