Source author record

Reinhard Heckel

Reinhard Heckel appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Machine Learning Computer Vision eess.IV math.ST Molecular Networks Statistics Theory Artificial Intelligence cond-mat.dis-nn Data Structures and Algorithms physics.optics

Catalog footprint

What is connected

27works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Image-to-Image MLP-mixer for Image Reconstruction

Neural networks are highly effective tools for image reconstruction problems such as denoising and compressive sensing. To date, neural networks for image reconstruction are almost exclusively convolutional. The most popular architecture is the U-Net, a convolutional network with a multi-resolution architecture. In this work, we show that a simple network based on the multi-layer perceptron (MLP)-mixer enables state-of-the art image reconstruction performance without convolutions and without a multi-resolution architecture, provided that the training set and the size of the network are moderately large. Similar to the original MLP-mixer, the image-to-image MLP-mixer is based exclusively on MLPs operating on linearly-transformed image patches. Contrary to the original MLP-mixer, we incorporate structure by retaining the relative positions of the image patches. This imposes an inductive bias towards natural images which enables the image-to-image MLP-mixer to learn to denoise images based on fewer examples than the original MLP-mixer. Moreover, the image-to-image MLP-mixer requires fewer parameters to achieve the same denoising performance than the U-Net and its parameters scale linearly in the image resolution instead of quadratically as for the original MLP-mixer. If trained on a moderate amount of examples for denoising, the image-to-image MLP-mixer outperforms the U-Net by a slight margin. It also outperforms the vision transformer tailored for image reconstruction and classical un-trained methods such as BM3D, making it a very effective tool for image reconstruction problems.

preprint2022arXiv

Regularization-wise double descent: Why it occurs and how to eliminate it

The risk of overparameterized models, in particular deep neural networks, is often double-descent shaped as a function of the model size. Recently, it was shown that the risk as a function of the early-stopping time can also be double-descent shaped, and this behavior can be explained as a super-position of bias-variance tradeoffs. In this paper, we show that the risk of explicit L2-regularized models can exhibit double descent behavior as a function of the regularization strength, both in theory and practice. We find that for linear regression, a double descent shaped risk is caused by a superposition of bias-variance tradeoffs corresponding to different parts of the model and can be mitigated by scaling the regularization strength of each part appropriately. Motivated by this result, we study a two-layer neural network and show that double descent can be eliminated by adjusting the regularization strengths for the first and second layer. Lastly, we study a 5-layer CNN and ResNet-18 trained on CIFAR-10 with label noise, and CIFAR-100 without label noise, and demonstrate that all exhibit double descent behavior as a function of the regularization strength.

preprint2022arXiv

Test-Time Training Can Close the Natural Distribution Shift Performance Gap in Deep Learning Based Compressed Sensing

Deep learning based image reconstruction methods outperform traditional methods. However, neural networks suffer from a performance drop when applied to images from a different distribution than the training images. For example, a model trained for reconstructing knees in accelerated magnetic resonance imaging (MRI) does not reconstruct brains well, even though the same network trained on brains reconstructs brains perfectly well. Thus there is a distribution shift performance gap for a given neural network, defined as the difference in performance when training on a distribution $P$ and training on another distribution $Q$, and evaluating both models on $Q$. In this work, we propose a domain adaptation method for deep learning based compressive sensing that relies on self-supervision during training paired with test-time training at inference. We show that for four natural distribution shifts, this method essentially closes the distribution shift performance gap for state-of-the-art architectures for accelerated MRI.

preprint2021arXiv

Super-Resolution Radar

In this paper we study the identification of a time-varying linear system from its response to a known input signal. More specifically, we consider systems whose response to the input signal is given by a weighted superposition of delayed and Doppler shifted versions of the input. This problem arises in a multitude of applications such as wireless communications and radar imaging. Due to practical constraints, the input signal has finite bandwidth B, and the received signal is observed over a finite time interval of length T only. This gives rise to a delay and Doppler resolution of 1/B and 1/T. We show that this resolution limit can be overcome, i.e., we can exactly recover the continuous delay-Doppler pairs and the corresponding attenuation factors, by solving a convex optimization problem. This result holds provided that the distance between the delay-Doppler pairs is at least 2.37/B in time or 2.37/T in frequency. Furthermore, this result allows the total number of delay-Doppler pairs to be linear up to a log-factor in BT, the dimensionality of the response of the system, and thereby the limit for identifiability. Stated differently, we show that we can estimate the time-frequency components of a signal that is S-sparse in the continuous dictionary of time-frequency shifts of a random window function, from a number of measurements, that is linear up to a log-factor in S.

preprint2020arXiv

Compressive sensing with un-trained neural networks: Gradient descent finds the smoothest approximation

Un-trained convolutional neural networks have emerged as highly successful tools for image recovery and restoration. They are capable of solving standard inverse problems such as denoising and compressive sensing with excellent results by simply fitting a neural network model to measurements from a single image or signal without the need for any additional training data. For some applications, this critically requires additional regularization in the form of early stopping the optimization. For signal recovery from a few measurements, however, un-trained convolutional networks have an intriguing self-regularizing property: Even though the network can perfectly fit any image, the network recovers a natural image from few measurements when trained with gradient descent until convergence. In this paper, we provide numerical evidence for this property and study it theoretically. We show that---without any further regularization---an un-trained convolutional neural network can approximately reconstruct signals and images that are sufficiently structured, from a near minimal number of random measurements.

preprint2020arXiv

Deep Phase Decoder: Self-calibrating phase microscopy with an untrained deep neural network

Deep neural networks have emerged as effective tools for computational imaging including quantitative phase microscopy of transparent samples. To reconstruct phase from intensity, current approaches rely on supervised learning with training examples; consequently, their performance is sensitive to a match of training and imaging settings. Here we propose a new approach to phase microscopy by using an untrained deep neural network for measurement formation, encapsulating the image prior and imaging physics. Our approach does not require any training data and simultaneously reconstructs the sought phase and pupil-plane aberrations by fitting the weights of the network to the captured images. To demonstrate experimentally, we reconstruct quantitative phase from through-focus images blindly (i.e. no explicit knowledge of the aberrations).

preprint2020arXiv

Denoising and Regularization via Exploiting the Structural Bias of Convolutional Generators

Convolutional Neural Networks (CNNs) have emerged as highly successful tools for image generation, recovery, and restoration. A major contributing factor to this success is that convolutional networks impose strong prior assumptions about natural images. A surprising experiment that highlights this architectural bias towards natural images is that one can remove noise and corruptions from a natural image without using any training data, by simply fitting (via gradient descent) a randomly initialized, over-parameterized convolutional generator to the corrupted image. While this over-parameterized network can fit the corrupted image perfectly, surprisingly after a few iterations of gradient descent it generates an almost uncorrupted image. This intriguing phenomenon enables state-of-the-art CNN-based denoising and regularization of other inverse problems. In this paper, we attribute this effect to a particular architectural choice of convolutional networks, namely convolutions with fixed interpolating filters. We then formally characterize the dynamics of fitting a two-layer convolutional generator to a noisy signal and prove that early-stopped gradient descent denoises/regularizes. Our proof relies on showing that convolutional generators fit the structured part of an image significantly faster than the corrupted portion.

preprint2020arXiv

DNA-Based Storage: Models and Fundamental Limits

Due to its longevity and enormous information density, DNA is an attractive medium for archival storage. In this work, we study the fundamental limits and trade-offs of DNA-based storage systems by introducing a new channel model, which we call the noisy shuffling-sampling channel. Motivated by current technological constraints on DNA synthesis and sequencing, this model captures three key distinctive aspects of DNA storage systems: (1) the data is written onto many short DNA molecules; (2) the molecules are corrupted by noise during synthesis and sequencing and (3) the data is read by randomly sampling from the DNA pool. We provide capacity results for this channel under specific noise and sampling assumptions and show that, in many scenarios, a simple index-based coding scheme is optimal.

preprint2020arXiv

Early Stopping in Deep Networks: Double Descent and How to Eliminate it

Over-parameterized models, such as large deep networks, often exhibit a double descent phenomenon, whereas a function of model size, error first decreases, increases, and decreases at last. This intriguing double descent behavior also occurs as a function of training epochs and has been conjectured to arise because training epochs control the model complexity. In this paper, we show that such epoch-wise double descent arises for a different reason: It is caused by a superposition of two or more bias-variance tradeoffs that arise because different parts of the network are learned at different epochs, and eliminating this by proper scaling of stepsizes can significantly improve the early stopping performance. We show this analytically for i) linear regression, where differently scaled features give rise to a superposition of bias-variance tradeoffs, and for ii) a two-layer neural network, where the first and second layer each govern a bias-variance tradeoff. Inspired by this theory, we study two standard convolutional networks empirically and show that eliminating epoch-wise double descent through adjusting stepsizes of different layers improves the early stopping performance significantly.

preprint2020arXiv

Image recognition from raw labels collected without annotators

Image classification problems are typically addressed by first collecting examples with candidate labels, second cleaning the candidate labels manually, and third training a deep neural network on the clean examples. The manual labeling step is often the most expensive one as it requires workers to label millions of images. In this paper we propose to work without any explicitly labeled data by i) directly training the deep neural network on the noisy candidate labels, and ii) early stopping the training to avoid overfitting. With this procedure we exploit an intriguing property of standard overparameterized convolutional neural networks trained with (stochastic) gradient descent: Clean labels are fitted faster than noisy ones. We consider two classification problems, a subset of ImageNet and CIFAR-10. For both, we construct large candidate datasets without any explicit human annotations, that only contain 10%-50% correctly labeled examples per class. We show that training on the candidate examples and regularizing through early stopping gives higher test performance for both problems than when training on the original, clean data. This is possible because the candidate datasets contain a huge number of clean examples, and, as we show in this paper, the noise generated through the label collection process is not nearly as adversarial for learning as the noise generated by randomly flipping labels.

preprint2020arXiv

Unsupervised Learning with Stein's Unbiased Risk Estimator

Learning from unlabeled and noisy data is one of the grand challenges of machine learning. As such, it has seen a flurry of research with new ideas proposed continuously. In this work, we revisit a classical idea: Stein's Unbiased Risk Estimator (SURE). We show that, in the context of image recovery, SURE and its generalizations can be used to train convolutional neural networks (CNNs) for a range of image denoising and recovery problems without any ground truth data. Specifically, our goal is to reconstruct an image $x$ from a noisy linear transformation (measurement) of the image. We consider two scenarios: one where no additional data is available and one where we have measurements of other images that are drawn from the same noisy distribution as $x$, but have no access to the clean images. Such is the case, for instance, in the context of medical imaging, microscopy, and astronomy, where noise-less ground truth data is rarely available. We show that in this situation, SURE can be used to estimate the mean-squared-error loss associated with an estimate of $x$. Using this estimate of the loss, we train networks to perform denoising and compressed sensing recovery. In addition, we also use the SURE framework to partially explain and improve upon an intriguing results presented by Ulyanov et al. in "Deep Image Prior": that a network initialized with random weights and fit to a single noisy image can effectively denoise that image. Public implementations of the networks and methods described in this paper can be found at https://github.com/ricedsp/D-AMP_Toolbox.

preprint2019arXiv

Adaptive Estimation for Approximate k-Nearest-Neighbor Computations

Algorithms often carry out equally many computations for "easy" and "hard" problem instances. In particular, algorithms for finding nearest neighbors typically have the same running time regardless of the particular problem instance. In this paper, we consider the approximate k-nearest-neighbor problem, which is the problem of finding a subset of O(k) points in a given set of points that contains the set of k nearest neighbors of a given query point. We propose an algorithm based on adaptively estimating the distances, and show that it is essentially optimal out of algorithms that are only allowed to adaptively estimate distances. We then demonstrate both theoretically and experimentally that the algorithm can achieve significant speedups relative to the naive method.

preprint2016arXiv

Active Ranking from Pairwise Comparisons and when Parametric Assumptions Don't Help

We consider sequential or active ranking of a set of n items based on noisy pairwise comparisons. Items are ranked according to the probability that a given item beats a randomly chosen item, and ranking refers to partitioning the items into sets of pre-specified sizes according to their scores. This notion of ranking includes as special cases the identification of the top-k items and the total ordering of the items. We first analyze a sequential ranking algorithm that counts the number of comparisons won, and uses these counts to decide whether to stop, or to compare another pair of items, chosen based on confidence intervals specified by the data collected up to that point. We prove that this algorithm succeeds in recovering the ranking using a number of comparisons that is optimal up to logarithmic factors. This guarantee does not require any structural properties of the underlying pairwise probability matrix, unlike a significant body of past work on pairwise ranking based on parametric models such as the Thurstone or Bradley-Terry-Luce models. It has been a long-standing open question as to whether or not imposing these parametric assumptions allows for improved ranking algorithms. For stochastic comparison models, in which the pairwise probabilities are bounded away from zero, our second contribution is to resolve this issue by proving a lower bound for parametric models. This shows, perhaps surprisingly, that these popular parametric modeling choices offer at most logarithmic gains for stochastic comparisons.

preprint2016arXiv

Generalized Line Spectral Estimation via Convex Optimization

Line spectral estimation is the problem of recovering the frequencies and amplitudes of a mixture of a few sinusoids from equispaced samples. However, in a variety of signal processing problems arising in imaging, radar, and localization we do not have access directly to such equispaced samples. Rather we only observe a severely undersampled version of these observations through linear measurements. This paper is about such generalized line spectral estimation problems. We reformulate these problems as sparse signal recovery problems over a continuously indexed dictionary which can be solved via a convex program. We prove that the frequencies and amplitudes of the components of the mixture can be recovered perfectly from a near-minimal number of observations via this convex program. This result holds provided the frequencies are sufficiently separated, and the linear measurements obey natural conditions that are satisfied in a variety of applications.

preprint2016arXiv

Super-Resolution MIMO Radar

A multiple input, multiple output (MIMO) radar emits probings signals with multiple transmit antennas and records the reflections from targets with multiple receive antennas. Estimating the relative angles, delays, and Doppler shifts from the received signals allows to determine the locations and velocities of the targets. Standard approaches to MIMO radar based on digital matched filtering or compressed sensing only resolve the angle-delay-Doppler triplets on a $(1/(N_T N_R), 1/B,1/T)$ grid, where $N_T$ and $N_R$ are the number of transmit and receive antennas, $B$ is the bandwidth of the probing signals, and $T$ is the length of the time interval over which the reflections are observed. In this work, we show that the \emph{continuous} angle-delay-Doppler triplets and the corresponding attenuation factors can be recovered perfectly by solving a convex optimization problem. This result holds provided that the angle-delay-Doppler triplets are separated either by $10/(N_T N_R-1)$ in angle, $10.01/B$ in delay, or $10.01/T$ in Doppler direction. Furthermore, this result is optimal (up to log factors) in the number of angle-delay-Doppler triplets that can be recovered.

preprint2015arXiv

Dimensionality-reduced subspace clustering

Subspace clustering refers to the problem of clustering unlabeled high-dimensional data points into a union of low-dimensional linear subspaces, whose number, orientations, and dimensions are all unknown. In practice one may have access to dimensionality-reduced observations of the data only, resulting, e.g., from undersampling due to complexity and speed constraints on the acquisition device or mechanism. More pertinently, even if the high-dimensional data set is available it is often desirable to first project the data points into a lower-dimensional space and to perform clustering there; this reduces storage requirements and computational cost. The purpose of this paper is to quantify the impact of dimensionality reduction through random projection on the performance of three subspace clustering algorithms, all of which are based on principles from sparse signal recovery. Specifically, we analyze the thresholding based subspace clustering (TSC) algorithm, the sparse subspace clustering (SSC) algorithm, and an orthogonal matching pursuit variant thereof (SSC-OMP). We find, for all three algorithms, that dimensionality reduction down to the order of the subspace dimensions is possible without incurring significant performance degradation. Moreover, these results are order-wise optimal in the sense that reducing the dimensionality further leads to a fundamentally ill-posed clustering problem. Our findings carry over to the noisy case as illustrated through analytical results for TSC and simulations for SSC and SSC-OMP. Extensive experiments on synthetic and real data complement our theoretical findings.

preprint2015arXiv

Robust Subspace Clustering via Thresholding

The problem of clustering noisy and incompletely observed high-dimensional data points into a union of low-dimensional subspaces and a set of outliers is considered. The number of subspaces, their dimensions, and their orientations are assumed unknown. We propose a simple low-complexity subspace clustering algorithm, which applies spectral clustering to an adjacency matrix obtained by thresholding the correlations between data points. In other words, the adjacency matrix is constructed from the nearest neighbors of each data point in spherical distance. A statistical performance analysis shows that the algorithm exhibits robustness to additive noise and succeeds even when the subspaces intersect. Specifically, our results reveal an explicit tradeoff between the affinity of the subspaces and the tolerable noise level. We furthermore prove that the algorithm succeeds even when the data points are incompletely observed with the number of missing entries allowed to be (up to a log-factor) linear in the ambient dimension. We also propose a simple scheme that provably detects outliers, and we present numerical results on real and synthetic data.

preprint2014arXiv

Compressive Nonparametric Graphical Model Selection For Time Series

We propose a method for inferring the conditional indepen- dence graph (CIG) of a high-dimensional discrete-time Gaus- sian vector random process from finite-length observations. Our approach does not rely on a parametric model (such as, e.g., an autoregressive model) for the vector random process; rather, it only assumes certain spectral smoothness proper- ties. The proposed inference scheme is compressive in that it works for sample sizes that are (much) smaller than the number of scalar process components. We provide analytical conditions for our method to correctly identify the CIG with high probability.

preprint2014arXiv

Neighborhood Selection for Thresholding-based Subspace Clustering

Subspace clustering refers to the problem of clustering high-dimensional data points into a union of low-dimensional linear subspaces, where the number of subspaces, their dimensions and orientations are all unknown. In this paper, we propose a variation of the recently introduced thresholding-based subspace clustering (TSC) algorithm, which applies spectral clustering to an adjacency matrix constructed from the nearest neighbors of each data point with respect to the spherical distance measure. The new element resides in an individual and data-driven choice of the number of nearest neighbors. Previous performance results for TSC, as well as for other subspace clustering algorithms based on spectral clustering, come in terms of an intermediate performance measure, which does not address the clustering error directly. Our main analytical contribution is a performance analysis of the modified TSC algorithm (as well as the original TSC algorithm) in terms of the clustering error directly.

preprint2014arXiv

Subspace clustering of dimensionality-reduced data

Subspace clustering refers to the problem of clustering unlabeled high-dimensional data points into a union of low-dimensional linear subspaces, assumed unknown. In practice one may have access to dimensionality-reduced observations of the data only, resulting, e.g., from "undersampling" due to complexity and speed constraints on the acquisition device. More pertinently, even if one has access to the high-dimensional data set it is often desirable to first project the data points into a lower-dimensional space and to perform the clustering task there; this reduces storage requirements and computational cost. The purpose of this paper is to quantify the impact of dimensionality-reduction through random projection on the performance of the sparse subspace clustering (SSC) and the thresholding based subspace clustering (TSC) algorithms. We find that for both algorithms dimensionality reduction down to the order of the subspace dimensions is possible without incurring significant performance degradation. The mathematical engine behind our theorems is a result quantifying how the affinities between subspaces change under random dimensionality reducing projections.

preprint2013arXiv

Bounds on the Average Sensitivity of Nested Canalizing Functions

Nested canalizing Boolean (NCF) functions play an important role in biological motivated regulative networks and in signal processing, in particular describing stack filters. It has been conjectured that NCFs have a stabilizing effect on the network dynamics. It is well known that the average sensitivity plays a central role for the stability of (random) Boolean networks. Here we provide a tight upper bound on the average sensitivity for NCFs as a function of the number of relevant input variables. As conjectured in literature this bound is smaller than 4/3 This shows that a large number of functions appearing in biological networks belong to a class that has very low average sensitivity, which is even close to a tight lower bound.

preprint2013arXiv

Harmonic Analysis of Boolean Networks: Determinative Power and Perturbations

Consider a large Boolean network with a feed forward structure. Given a probability distribution on the inputs, can one find, possibly small, collections of input nodes that determine the states of most other nodes in the network? To answer this question, a notion that quantifies the determinative power of an input over the states of the nodes in the network is needed. We argue that the mutual information (MI) between a given subset of the inputs X = {X_1, ..., X_n} of some node i and its associated function f_i(X) quantifies the determinative power of this set of inputs over node i. We compare the determinative power of a set of inputs to the sensitivity to perturbations to these inputs, and find that, maybe surprisingly, an input that has large sensitivity to perturbations does not necessarily have large determinative power. However, for unate functions, which play an important role in genetic regulatory networks, we find a direct relation between MI and sensitivity to perturbations. As an application of our results, we analyze the large-scale regulatory network of Escherichia coli. We identify the most determinative nodes and show that a small subset of those reduces the overall uncertainty of the network state significantly. Furthermore, the network is found to be tolerant to perturbations of its inputs.

preprint2013arXiv

Identification of Sparse Linear Operators

We consider the problem of identifying a linear deterministic operator from its response to a given probing signal. For a large class of linear operators, we show that stable identifiability is possible if the total support area of the operator's spreading function satisfies D<=1/2. This result holds for an arbitrary (possibly fragmented) support region of the spreading function, does not impose limitations on the total extent of the support region, and, most importantly, does not require the support region to be known prior to identification. Furthermore, we prove that stable identifiability of almost all operators is possible if D<1. This result is surprising as it says that there is no penalty for not knowing the support region of the spreading function prior to identification. Algorithms that provably recover all operators with D<=1/2, and almost all operators with D<1 are presented.

preprint2013arXiv

Noisy Subspace Clustering via Thresholding

We consider the problem of clustering noisy high-dimensional data points into a union of low-dimensional subspaces and a set of outliers. The number of subspaces, their dimensions, and their orientations are unknown. A probabilistic performance analysis of the thresholding-based subspace clustering (TSC) algorithm introduced recently in [1] shows that TSC succeeds in the noisy case, even when the subspaces intersect. Our results reveal an explicit tradeoff between the allowed noise level and the affinity of the subspaces. We furthermore find that the simple outlier detection scheme introduced in [1] provably succeeds in the noisy case.

preprint2013arXiv

Subspace Clustering via Thresholding and Spectral Clustering

We consider the problem of clustering a set of high-dimensional data points into sets of low-dimensional linear subspaces. The number of subspaces, their dimensions, and their orientations are unknown. We propose a simple and low-complexity clustering algorithm based on thresholding the correlations between the data points followed by spectral clustering. A probabilistic performance analysis shows that this algorithm succeeds even when the subspaces intersect, and when the dimensions of the subspaces scale (up to a log-factor) linearly in the ambient dimension. Moreover, we prove that the algorithm also succeeds for data points that are subject to erasures with the number of erasures scaling (up to a log-factor) linearly in the ambient dimension. Finally, we propose a simple scheme that provably detects outliers.

preprint2012arXiv

Joint Sparsity with Different Measurement Matrices

We consider a generalization of the multiple measurement vector (MMV) problem, where the measurement matrices are allowed to differ across measurements. This problem arises naturally when multiple measurements are taken over time, e.g., and the measurement modality (matrix) is time-varying. We derive probabilistic recovery guarantees showing that---under certain (mild) conditions on the measurement matrices---l2/l1-norm minimization and a variant of orthogonal matching pursuit fail with a probability that decays exponentially in the number of measurements. This allows us to conclude that, perhaps surprisingly, recovery performance does not suffer from the individual measurements being taken through different measurement matrices. What is more, recovery performance typically benefits (significantly) from diversity in the measurement matrices; we specify conditions under which such improvements are obtained. These results continue to hold when the measurements are subject to (bounded) noise.

preprint2011arXiv

Compressive Identification of Linear Operators

We consider the problem of identifying a linear deterministic operator from an input-output measurement. For the large class of continuous (and hence bounded) operators, under additional mild restrictions, we show that stable identifiability is possible if the total support area of the operator's spreading function satisfies D <= 1/2. This result holds for arbitrary (possibly fragmented) support regions of the spreading function, does not impose limitations on the total extent of the support region, and, most importantly, does not require the support region of the spreading function to be known prior to identification. Furthermore, we prove that asking for identifiability of only almost all operators, stable identifiability is possible if D <= 1. This result is surprising as it says that there is no penalty for not knowing the support region of the spreading function prior to identification.

Reinhard Heckel

What is connected

Connect this record

See the researcher in context

Building this map preview

27 published item(s)

Image-to-Image MLP-mixer for Image Reconstruction

Regularization-wise double descent: Why it occurs and how to eliminate it

Test-Time Training Can Close the Natural Distribution Shift Performance Gap in Deep Learning Based Compressed Sensing

Super-Resolution Radar

Compressive sensing with un-trained neural networks: Gradient descent finds the smoothest approximation

Deep Phase Decoder: Self-calibrating phase microscopy with an untrained deep neural network

Denoising and Regularization via Exploiting the Structural Bias of Convolutional Generators

DNA-Based Storage: Models and Fundamental Limits

Early Stopping in Deep Networks: Double Descent and How to Eliminate it

Image recognition from raw labels collected without annotators

Unsupervised Learning with Stein's Unbiased Risk Estimator

Adaptive Estimation for Approximate k-Nearest-Neighbor Computations

Active Ranking from Pairwise Comparisons and when Parametric Assumptions Don't Help

Generalized Line Spectral Estimation via Convex Optimization

Super-Resolution MIMO Radar

Dimensionality-reduced subspace clustering

Robust Subspace Clustering via Thresholding

Compressive Nonparametric Graphical Model Selection For Time Series

Neighborhood Selection for Thresholding-based Subspace Clustering

Subspace clustering of dimensionality-reduced data

Bounds on the Average Sensitivity of Nested Canalizing Functions

Harmonic Analysis of Boolean Networks: Determinative Power and Perturbations

Identification of Sparse Linear Operators

Noisy Subspace Clustering via Thresholding

Subspace Clustering via Thresholding and Spectral Clustering

Joint Sparsity with Different Measurement Matrices

Compressive Identification of Linear Operators