Researcher profile

Reinhard Heckel

Reinhard Heckel contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2022arXiv

Image-to-Image MLP-mixer for Image Reconstruction

Neural networks are highly effective tools for image reconstruction problems such as denoising and compressive sensing. To date, neural networks for image reconstruction are almost exclusively convolutional. The most popular architecture is the U-Net, a convolutional network with a multi-resolution architecture. In this work, we show that a simple network based on the multi-layer perceptron (MLP)-mixer enables state-of-the art image reconstruction performance without convolutions and without a multi-resolution architecture, provided that the training set and the size of the network are moderately large. Similar to the original MLP-mixer, the image-to-image MLP-mixer is based exclusively on MLPs operating on linearly-transformed image patches. Contrary to the original MLP-mixer, we incorporate structure by retaining the relative positions of the image patches. This imposes an inductive bias towards natural images which enables the image-to-image MLP-mixer to learn to denoise images based on fewer examples than the original MLP-mixer. Moreover, the image-to-image MLP-mixer requires fewer parameters to achieve the same denoising performance than the U-Net and its parameters scale linearly in the image resolution instead of quadratically as for the original MLP-mixer. If trained on a moderate amount of examples for denoising, the image-to-image MLP-mixer outperforms the U-Net by a slight margin. It also outperforms the vision transformer tailored for image reconstruction and classical un-trained methods such as BM3D, making it a very effective tool for image reconstruction problems.

preprint2022arXiv

Regularization-wise double descent: Why it occurs and how to eliminate it

The risk of overparameterized models, in particular deep neural networks, is often double-descent shaped as a function of the model size. Recently, it was shown that the risk as a function of the early-stopping time can also be double-descent shaped, and this behavior can be explained as a super-position of bias-variance tradeoffs. In this paper, we show that the risk of explicit L2-regularized models can exhibit double descent behavior as a function of the regularization strength, both in theory and practice. We find that for linear regression, a double descent shaped risk is caused by a superposition of bias-variance tradeoffs corresponding to different parts of the model and can be mitigated by scaling the regularization strength of each part appropriately. Motivated by this result, we study a two-layer neural network and show that double descent can be eliminated by adjusting the regularization strengths for the first and second layer. Lastly, we study a 5-layer CNN and ResNet-18 trained on CIFAR-10 with label noise, and CIFAR-100 without label noise, and demonstrate that all exhibit double descent behavior as a function of the regularization strength.

preprint2022arXiv

Test-Time Training Can Close the Natural Distribution Shift Performance Gap in Deep Learning Based Compressed Sensing

Deep learning based image reconstruction methods outperform traditional methods. However, neural networks suffer from a performance drop when applied to images from a different distribution than the training images. For example, a model trained for reconstructing knees in accelerated magnetic resonance imaging (MRI) does not reconstruct brains well, even though the same network trained on brains reconstructs brains perfectly well. Thus there is a distribution shift performance gap for a given neural network, defined as the difference in performance when training on a distribution $P$ and training on another distribution $Q$, and evaluating both models on $Q$. In this work, we propose a domain adaptation method for deep learning based compressive sensing that relies on self-supervision during training paired with test-time training at inference. We show that for four natural distribution shifts, this method essentially closes the distribution shift performance gap for state-of-the-art architectures for accelerated MRI.

preprint2021arXiv

Super-Resolution Radar

In this paper we study the identification of a time-varying linear system from its response to a known input signal. More specifically, we consider systems whose response to the input signal is given by a weighted superposition of delayed and Doppler shifted versions of the input. This problem arises in a multitude of applications such as wireless communications and radar imaging. Due to practical constraints, the input signal has finite bandwidth B, and the received signal is observed over a finite time interval of length T only. This gives rise to a delay and Doppler resolution of 1/B and 1/T. We show that this resolution limit can be overcome, i.e., we can exactly recover the continuous delay-Doppler pairs and the corresponding attenuation factors, by solving a convex optimization problem. This result holds provided that the distance between the delay-Doppler pairs is at least 2.37/B in time or 2.37/T in frequency. Furthermore, this result allows the total number of delay-Doppler pairs to be linear up to a log-factor in BT, the dimensionality of the response of the system, and thereby the limit for identifiability. Stated differently, we show that we can estimate the time-frequency components of a signal that is S-sparse in the continuous dictionary of time-frequency shifts of a random window function, from a number of measurements, that is linear up to a log-factor in S.

preprint2020arXiv

Compressive sensing with un-trained neural networks: Gradient descent finds the smoothest approximation

Un-trained convolutional neural networks have emerged as highly successful tools for image recovery and restoration. They are capable of solving standard inverse problems such as denoising and compressive sensing with excellent results by simply fitting a neural network model to measurements from a single image or signal without the need for any additional training data. For some applications, this critically requires additional regularization in the form of early stopping the optimization. For signal recovery from a few measurements, however, un-trained convolutional networks have an intriguing self-regularizing property: Even though the network can perfectly fit any image, the network recovers a natural image from few measurements when trained with gradient descent until convergence. In this paper, we provide numerical evidence for this property and study it theoretically. We show that---without any further regularization---an un-trained convolutional neural network can approximately reconstruct signals and images that are sufficiently structured, from a near minimal number of random measurements.

preprint2020arXiv

Deep Phase Decoder: Self-calibrating phase microscopy with an untrained deep neural network

Deep neural networks have emerged as effective tools for computational imaging including quantitative phase microscopy of transparent samples. To reconstruct phase from intensity, current approaches rely on supervised learning with training examples; consequently, their performance is sensitive to a match of training and imaging settings. Here we propose a new approach to phase microscopy by using an untrained deep neural network for measurement formation, encapsulating the image prior and imaging physics. Our approach does not require any training data and simultaneously reconstructs the sought phase and pupil-plane aberrations by fitting the weights of the network to the captured images. To demonstrate experimentally, we reconstruct quantitative phase from through-focus images blindly (i.e. no explicit knowledge of the aberrations).

preprint2020arXiv

Denoising and Regularization via Exploiting the Structural Bias of Convolutional Generators

Convolutional Neural Networks (CNNs) have emerged as highly successful tools for image generation, recovery, and restoration. A major contributing factor to this success is that convolutional networks impose strong prior assumptions about natural images. A surprising experiment that highlights this architectural bias towards natural images is that one can remove noise and corruptions from a natural image without using any training data, by simply fitting (via gradient descent) a randomly initialized, over-parameterized convolutional generator to the corrupted image. While this over-parameterized network can fit the corrupted image perfectly, surprisingly after a few iterations of gradient descent it generates an almost uncorrupted image. This intriguing phenomenon enables state-of-the-art CNN-based denoising and regularization of other inverse problems. In this paper, we attribute this effect to a particular architectural choice of convolutional networks, namely convolutions with fixed interpolating filters. We then formally characterize the dynamics of fitting a two-layer convolutional generator to a noisy signal and prove that early-stopped gradient descent denoises/regularizes. Our proof relies on showing that convolutional generators fit the structured part of an image significantly faster than the corrupted portion.

preprint2020arXiv

DNA-Based Storage: Models and Fundamental Limits

Due to its longevity and enormous information density, DNA is an attractive medium for archival storage. In this work, we study the fundamental limits and trade-offs of DNA-based storage systems by introducing a new channel model, which we call the noisy shuffling-sampling channel. Motivated by current technological constraints on DNA synthesis and sequencing, this model captures three key distinctive aspects of DNA storage systems: (1) the data is written onto many short DNA molecules; (2) the molecules are corrupted by noise during synthesis and sequencing and (3) the data is read by randomly sampling from the DNA pool. We provide capacity results for this channel under specific noise and sampling assumptions and show that, in many scenarios, a simple index-based coding scheme is optimal.

preprint2020arXiv

Early Stopping in Deep Networks: Double Descent and How to Eliminate it

Over-parameterized models, such as large deep networks, often exhibit a double descent phenomenon, whereas a function of model size, error first decreases, increases, and decreases at last. This intriguing double descent behavior also occurs as a function of training epochs and has been conjectured to arise because training epochs control the model complexity. In this paper, we show that such epoch-wise double descent arises for a different reason: It is caused by a superposition of two or more bias-variance tradeoffs that arise because different parts of the network are learned at different epochs, and eliminating this by proper scaling of stepsizes can significantly improve the early stopping performance. We show this analytically for i) linear regression, where differently scaled features give rise to a superposition of bias-variance tradeoffs, and for ii) a two-layer neural network, where the first and second layer each govern a bias-variance tradeoff. Inspired by this theory, we study two standard convolutional networks empirically and show that eliminating epoch-wise double descent through adjusting stepsizes of different layers improves the early stopping performance significantly.

preprint2020arXiv

Image recognition from raw labels collected without annotators

Image classification problems are typically addressed by first collecting examples with candidate labels, second cleaning the candidate labels manually, and third training a deep neural network on the clean examples. The manual labeling step is often the most expensive one as it requires workers to label millions of images. In this paper we propose to work without any explicitly labeled data by i) directly training the deep neural network on the noisy candidate labels, and ii) early stopping the training to avoid overfitting. With this procedure we exploit an intriguing property of standard overparameterized convolutional neural networks trained with (stochastic) gradient descent: Clean labels are fitted faster than noisy ones. We consider two classification problems, a subset of ImageNet and CIFAR-10. For both, we construct large candidate datasets without any explicit human annotations, that only contain 10%-50% correctly labeled examples per class. We show that training on the candidate examples and regularizing through early stopping gives higher test performance for both problems than when training on the original, clean data. This is possible because the candidate datasets contain a huge number of clean examples, and, as we show in this paper, the noise generated through the label collection process is not nearly as adversarial for learning as the noise generated by randomly flipping labels.

preprint2020arXiv

Unsupervised Learning with Stein's Unbiased Risk Estimator

Learning from unlabeled and noisy data is one of the grand challenges of machine learning. As such, it has seen a flurry of research with new ideas proposed continuously. In this work, we revisit a classical idea: Stein's Unbiased Risk Estimator (SURE). We show that, in the context of image recovery, SURE and its generalizations can be used to train convolutional neural networks (CNNs) for a range of image denoising and recovery problems without any ground truth data. Specifically, our goal is to reconstruct an image $x$ from a noisy linear transformation (measurement) of the image. We consider two scenarios: one where no additional data is available and one where we have measurements of other images that are drawn from the same noisy distribution as $x$, but have no access to the clean images. Such is the case, for instance, in the context of medical imaging, microscopy, and astronomy, where noise-less ground truth data is rarely available. We show that in this situation, SURE can be used to estimate the mean-squared-error loss associated with an estimate of $x$. Using this estimate of the loss, we train networks to perform denoising and compressed sensing recovery. In addition, we also use the SURE framework to partially explain and improve upon an intriguing results presented by Ulyanov et al. in "Deep Image Prior": that a network initialized with random weights and fit to a single noisy image can effectively denoise that image. Public implementations of the networks and methods described in this paper can be found at https://github.com/ricedsp/D-AMP_Toolbox.

preprint2019arXiv

Adaptive Estimation for Approximate k-Nearest-Neighbor Computations

Algorithms often carry out equally many computations for "easy" and "hard" problem instances. In particular, algorithms for finding nearest neighbors typically have the same running time regardless of the particular problem instance. In this paper, we consider the approximate k-nearest-neighbor problem, which is the problem of finding a subset of O(k) points in a given set of points that contains the set of k nearest neighbors of a given query point. We propose an algorithm based on adaptively estimating the distances, and show that it is essentially optimal out of algorithms that are only allowed to adaptively estimate distances. We then demonstrate both theoretically and experimentally that the algorithm can achieve significant speedups relative to the naive method.