Source author record

Michael Unser

Michael Unser appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

46works

22topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Revisiting Deep Information Propagation: Fractal Frontier and Finite-size Effects

Information propagation characterizes how input correlations evolve across layers in deep neural networks. This framework has been well studied using mean-field theory, which assumes infinitely wide networks. However, these assumptions break down for practical, finite-size networks. In this work, we study information propagation in randomly initialized neural networks with finite width and reveal that the boundary between ordered and chaotic regimes exhibits a fractal structure. This shows the fundamental complexity of neural network dynamics, in a setting that is independent of input data and optimization. To extend this analysis beyond multilayer perceptrons, we leverage recently introduced Fourier-based structured transforms, and show that information propagation in convolutional neural networks also follow the same behavior. In practice, our investigation highlights the importance of finite network depth with respect to the tradeoff between separation and robustness.

preprint2026arXiv

Universal Architectures for the Learning of Polyhedral Norms and Convex Regularizers

This paper addresses the task of learning convex regularizers to guide the reconstruction of images from limited data. By imposing that the reconstruction be amplitude-equivariant, we narrow down the class of admissible functionals to those that can be expressed as a power of a seminorm. We then show that such functionals can be approximated to arbitrary precision with the help of polyhedral norms. In particular, we identify two dual parameterizations of such systems: (i) a synthesis form with an $\ell_1$-penalty that involves some learnable dictionary; and (ii) an analysis form with an $\ell_\infty$-penalty that involves a trainable regularization operator. After having provided geometric insights and proved that the two forms are universal, we propose an implementation that relies on a specific architecture (tight frame with a weighted $\ell_1$ penalty) that is easy to train. We illustrate its use for denoising and the reconstruction of biomedical images. We find that the proposed framework outperforms the sparsity-based methods of compressed sensing, while it offers essentially the same convergence and robustness guarantees.

preprint2022arXiv

Approximation of Lipschitz Functions using Deep Spline Neural Networks

Lipschitz-constrained neural networks have many applications in machine learning. Since designing and training expressive Lipschitz-constrained networks is very challenging, there is a need for improved methods and a better theoretical understanding. Unfortunately, it turns out that ReLU networks have provable disadvantages in this setting. Hence, we propose to use learnable spline activation functions with at least 3 linear regions instead. We prove that this choice is optimal among all component-wise $1$-Lipschitz activation functions in the sense that no other weight constrained architecture can approximate a larger class of functions. Additionally, this choice is at least as expressive as the recently introduced non component-wise Groupsort activation function for spectral-norm-constrained weights. Previously published numerical results support our theoretical findings.

preprint2022arXiv

Asymptotic Stability in Reservoir Computing

Reservoir Computing is a class of Recurrent Neural Networks with internal weights fixed at random. Stability relates to the sensitivity of the network state to perturbations. It is an important property in Reservoir Computing as it directly impacts performance. In practice, it is desirable to stay in a stable regime, where the effect of perturbations does not explode exponentially, but also close to the chaotic frontier where reservoir dynamics are rich. Open questions remain today regarding input regularization and discontinuous activation functions. In this work, we use the recurrent kernel limit to draw new insights on stability in reservoir computing. This limit corresponds to large reservoir sizes, and it already becomes relevant for reservoirs with a few hundred neurons. We obtain a quantitative characterization of the frontier between stability and chaos, which can greatly benefit hyperparameter tuning. In a broader sense, our results contribute to understanding the complex dynamics of Recurrent Neural Networks.

preprint2022arXiv

Bona fide Riesz projections for density estimation

The projection of sample measurements onto a reconstruction space represented by a basis on a regular grid is a powerful and simple approach to estimate a probability density function. In this paper, we focus on Riesz bases and propose a projection operator that, in contrast to previous works, guarantees the bona fide properties for the estimate, namely, non-negativity and total probability mass $1$. Our bona fide projection is defined as a convex problem. We propose solution techniques and evaluate them. Results suggest an improved performance, specifically in circumstances prone to rippling effects.

preprint2022arXiv

Complex-Order Scale-Invariant Operators and Self-Similar Processes

Derivatives and integration operators are well-studied examples of linear operators that commute with scaling up to a fixed multiplicative factor; i.e., they are scale-invariant. Fractional order derivatives (integration operators) also belong to this family. In this paper, we extend the fractional operators to complex-order operators by constructing them in the Fourier domain. We analyze these operators in details with a special emphasis on the decay properties of the outputs. We further use these operators to introduce a family of complex-valued stable processes that are self-similar with complex-valued Hurst indices. These processes are expressed via the characteristic functionals over the Schwartz space of functions. Besides the self-similarity and stationarity, we study the regularity (in terms of Sobolev spaces) of the proposed processes.

preprint2022arXiv

Coupled Splines for Sparse Curve Fitting

We formulate as an inverse problem the construction of sparse parametric continuous curve models that fit a sequence of contour points. Our prior is incorporated as a regularization term that encourages rotation invariance and sparsity. We prove that an optimal solution to the inverse problem is a closed curve with spline components. We then show how to efficiently solve the task using B-splines as basis functions. We extend our problem formulation to curves made of two distinct components with complementary smoothness properties and solve it using hybrid splines. We illustrate the performance of our model on contours of different smoothness. Our experimental results show that we can faithfully reconstruct any general contour using few parameters, even in the presence of imprecisions in the measurements.

preprint2022arXiv

Delaunay-Triangulation-Based Learning with Hessian Total-Variation Regularization

Regression is one of the core problems tackled in supervised learning. Rectified linear unit (ReLU) neural networks generate continuous and piecewise-linear (CPWL) mappings and are the state-of-the-art approach for solving regression problems. In this paper, we propose an alternative method that leverages the expressivity of CPWL functions. In contrast to deep neural networks, our CPWL parameterization guarantees stability and is interpretable. Our approach relies on the partitioning of the domain of the CPWL function by a Delaunay triangulation. The function values at the vertices of the triangulation are our learnable parameters and identify the CPWL function uniquely. Formulating the learning scheme as a variational problem, we use the Hessian total variation (HTV) as regularizer to favor CPWL functions with few affine pieces. In this way, we control the complexity of our model through a single hyperparameter. By developing a computational framework to compute the HTV of any CPWL function parameterized by a triangulation, we discretize the learning problem as the generalized least absolute shrinkage and selection operator (LASSO). Our experiments validate the usage of our method in low-dimensional scenarios.

preprint2022arXiv

From Kernel Methods to Neural Networks: A Unifying Variational Formulation

The minimization of a data-fidelity term and an additive regularization functional gives rise to a powerful framework for supervised learning. In this paper, we present a unifying regularization functional that depends on an operator and on a generic Radon-domain norm. We establish the existence of a minimizer and give the parametric form of the solution(s) under very mild assumptions. When the norm is Hilbertian, the proposed formulation yields a solution that involves radial-basis functions and is compatible with the classical methods of machine learning. By contrast, for the total-variation norm, the solution takes the form of a two-layer neural network with an activation function that is determined by the regularization operator. In particular, we retrieve the popular ReLU networks by letting the operator be the Laplacian. We also characterize the solution for the intermediate regularization norms $\|\cdot\|=\|\cdot\|_{L_p}$ with $p\in(1,2]$. Our framework offers guarantees of universal approximation for a broad family of regularization operators or, equivalently, for a wide variety of shallow neural networks, including the cases (such as ReLU) where the activation function is increasing polynomially. It also explains the favorable role of bias and skip connections in neural architectures.

preprint2022arXiv

Measuring Complexity of Learning Schemes Using Hessian-Schatten Total Variation

In this paper, we introduce the Hessian-Schatten total variation (HTV) -- a novel seminorm that quantifies the total "rugosity" of multivariate functions. Our motivation for defining HTV is to assess the complexity of supervised-learning schemes. We start by specifying the adequate matrix-valued Banach spaces that are equipped with suitable classes of mixed norms. We then show that the HTV is invariant to rotations, scalings, and translations. Additionally, its minimum value is achieved for linear mappings, which supports the common intuition that linear regression is the least complex learning model. Next, we present closed-form expressions of the HTV for two general classes of functions. The first one is the class of Sobolev functions with a certain degree of regularity, for which we show that the HTV coincides with the Hessian-Schatten seminorm that is sometimes used as a regularizer for image reconstruction. The second one is the class of continuous and piecewise-linear (CPWL) functions. In this case, we show that the HTV reflects the total change in slopes between linear regions that have a common facet. Hence, it can be viewed as a convex relaxation (l1-type) of the number of linear regions (l0-type) of CPWL mappings. Finally, we illustrate the use of our proposed seminorm.

preprint2022arXiv

Phase Retrieval: From Computational Imaging to Machine Learning

Phase retrieval consists in the recovery of a complex-valued signal from intensity-only measurements. As it pervades a broad variety of applications, many researchers have striven to develop phase-retrieval algorithms. Classical approaches involve techniques as varied as generic gradient-descent routines or specialized spectral methods, to name a few. Yet, the phase-recovery problem remains a challenge to this day. Recently, however, advances in machine learning have revitalized the study of phase retrieval in two ways: significant theoretical advances have emerged from the analogy between phase retrieval and single-layer neural networks; practical breakthroughs have been obtained thanks to deep-learning regularization. In this tutorial, we review phase retrieval under a unifying framework that encompasses classical and machine-learning methods. We focus on three key elements: applications, overview of recent reconstruction algorithms, and the latest theoretical results.

preprint2022arXiv

Ridges, Neural Networks, and the Radon Transform

A ridge is a function that is characterized by a one-dimensional profile (activation) and a multidimensional direction vector. Ridges appear in the theory of neural networks as functional descriptors of the effect of a neuron, with the direction vector being encoded in the linear weights. In this paper, we investigate properties of the Radon transform in relation to ridges and to the characterization of neural networks. We introduce a broad category of hyper-spherical Banach subspaces (including the relevant subspace of measures) over which the back-projection operator is invertible. We also give conditions under which the back-projection operator is extendable to the full parent space with its null space being identifiable as a Banach complement. Starting from first principles, we then characterize the sampling functionals that are in the range of the filtered Radon transform. Next, we extend the definition of ridges for any distributional profile and determine their (filtered) Radon transform in full generality. Finally, we apply our formalism to clarify and simplify some of the results and proofs on the optimality of ReLU networks that have appeared in the literature.

preprint2022arXiv

Stable Parametrization of Continuous and Piecewise-Linear Functions

Rectified-linear-unit (ReLU) neural networks, which play a prominent role in deep learning, generate continuous and piecewise-linear (CPWL) functions. While they provide a powerful parametric representation, the mapping between the parameter and function spaces lacks stability. In this paper, we investigate an alternative representation of CPWL functions that relies on local hat basis functions. It is predicated on the fact that any CPWL function can be specified by a triangulation and its values at the grid points. We give the necessary and sufficient condition on the triangulation (in any number of dimensions) for the hat functions to form a Riesz basis, which ensures that the link between the parameters and the corresponding CPWL function is stable and unique. In addition, we provide an estimate of the $\ell_2\rightarrow L_2$ condition number of this local representation. Finally, as a special case of our framework, we focus on a systematic parametrization of $\mathbb{R}^d$ with control points placed on a uniform grid. In particular, we choose hat basis functions that are shifted replicas of a single linear box spline. In this setting, we prove that our general estimate of the condition number is optimal. We also relate our local representation to a nonlocal one based on shifts of a causal ReLU-like function.

preprint2021arXiv

Optimal-transport-based metric for SMLM

We propose the use of Flat Metric to assess the performance of reconstruction methods for single-molecule localization microscopy (SMLM) in scenarios where the ground-truth is available. Flat Metric is intimately related to the concept of optimal transport between measures of different mass, providing solid mathematical foundations for SMLM evaluation and integrating both localization and detection performance. In this paper, we provide the foundations of Flat Metric and validate this measure by applying it to controlled synthetic examples and to data from the SMLM 2016 Challenge.

preprint2021arXiv

Time-Dependent Deep Image Prior for Dynamic MRI

We propose a novel unsupervised deep-learning-based algorithm for dynamic magnetic resonance imaging (MRI) reconstruction. Dynamic MRI requires rapid data acquisition for the study of moving organs such as the heart. Existing reconstruction methods suffer from restrictions either in the model design or in the absence of ground-truth data, resulting in low image quality. We introduce a generalized version of the deep-image-prior approach, which optimizes the network weights to fit a sequence of sparsely acquired dynamic MRI measurements. Our method needs neither prior training nor additional data. In particular, for cardiac images, it does not require the marking of heartbeats or the reordering of spokes. The key ingredients of our method are threefold: 1) a fixed low-dimensional manifold that encodes the temporal variations of images; 2) a network that maps the manifold into a more expressive latent space; and 3) a convolutional neural network that generates a dynamic series of MRI images from the latent variables and that favors their consistency with the measurements in k-space. Our method outperforms the state-of-the-art methods quantitatively and qualitatively in both retrospective and real fetal cardiac datasets. To the best of our knowledge, this is the first unsupervised deep-learning-based method that can reconstruct the continuous variation of dynamic MRI sequences with high spatial resolution.

preprint2020arXiv

A unifying representer theorem for inverse problems and machine learning

The standard approach for dealing with the ill-posedness of the training problem in machine learning and/or the reconstruction of a signal from a limited number of measurements is regularization. The method is applicable whenever the problem is formulated as an optimization task. The standard strategy consists in augmenting the original cost functional by an energy that penalizes solutions with undesirable behavior. The effect of regularization is very well understood when the penalty involves a Hilbertian norm. Another popular configuration is the use of an $\ell_1$-norm (or some variant thereof) that favors sparse solutions. In this paper, we propose a higher-level formulation of regularization within the context of Banach spaces. We present a general representer theorem that characterizes the solutions of a remarkably broad class of optimization problems. We then use our theorem to retrieve a number of known results in the literature---e.g., the celebrated representer theorem of machine leaning for RKHS, Tikhonov regularization, representer theorems for sparsity promoting functionals, the recovery of spikes---as well as a few new ones.

preprint2020arXiv

Dictionary Learning for Two-Dimensional Kendall Shapes

We propose a novel sparse dictionary learning method for planar shapes in the sense of Kendall, namely configurations of landmarks in the plane considered up to similitudes. Our shape dictionary method provides a good trade-off between algorithmic simplicity and faithfulness with respect to the nonlinear geometric structure of Kendall's shape space. Remarkably, it boils down to a classical dictionary learning formulation modified using complex weights. Existing dictionary learning methods extended to nonlinear spaces either map the manifold to a reproducing kernel Hilbert space or to a tangent space. The first approach is unnecessarily heavy in the case of Kendall's shape space and causes the geometrical understanding of shapes to be lost, while the second one induces distortions and theoretical complexity. Our approach does not suffer from these drawbacks. Instead of embedding the shape space into a linear space, we rely on the hyperplane of centered configurations, including pre-shapes from which shapes are defined as rotation orbits. In this linear space, the dictionary atoms are scaled and rotated using complex weights before summation. Furthermore, our formulation is more general than Kendall's original one: it applies to discretely-defined configurations of landmarks as well as continuously-defined interpolating curves. We implemented our algorithm by adapting the method of optimal directions combined to a Cholesky-optimized order recursive matching pursuit. An interesting feature of our shape dictionary is that it produces visually realistic atoms, while guaranteeing reconstruction accuracy. Its efficiency can mostly be attributed to a clear formulation of the framework with complex numbers. We illustrate the strong potential of our approach for the characterization of datasets of shapes up to similitudes and the analysis of patterns in deforming 2D shapes.

preprint2020arXiv

Duality Mapping for Schatten Matrix Norms

In this paper, we fully characterize the duality mapping over the space of matrices that are equipped with Schatten norms. Our approach is based on the analysis of the saturation of the Hölder inequality for Schatten norms. We prove in our main result that, for $p\in (1,\infty)$, the duality mapping over the space of real-valued matrices with Schatten-$p$ norm is a continuous and single-valued function and provide an explicit form for its computation. For the special case $p = 1$, the mapping is set-valued; by adding a rank constraint, we show that it can be reduced to a Borel-measurable single-valued function for which we also provide a closed-form expression.

preprint2020arXiv

Fast Rotational Sparse Coding

We propose an algorithm for rotational sparse coding along with an efficient implementation using steerability. Sparse coding (also called dictionary learning) is an important technique in image processing, useful in inverse problems, compression, and analysis; however, the usual formulation fails to capture an important aspect of the structure of images: images are formed from building blocks, e.g., edges, lines, or points, that appear at different locations, orientations, and scales. The sparse coding problem can be reformulated to explicitly account for these transforms, at the cost of increased computation. In this work, we propose an algorithm for a rotational version of sparse coding that is based on K-SVD with additional rotation operations. We then propose a method to accelerate these rotations by learning the dictionary in a steerable basis. Our experiments on patch coding and texture classification demonstrate that the proposed algorithm is fast enough for practical use and compares favorably to standard sparse coding.

preprint2020arXiv

Generating Sparse Stochastic Processes Using Matched Splines

We provide an algorithm to generate trajectories of sparse stochastic processes that are solutions of linear ordinary differential equations driven by Lévy white noises. A recent paper showed that these processes are limits in law of generalized compound-Poisson processes. Based on this result, we derive an off-the-grid algorithm that generates arbitrarily close approximations of the target process. Our method relies on a B-spline representation of generalized compound-Poisson processes. We illustrate numerically the validity of our approach.

preprint2020arXiv

Joint Angular Refinement and Reconstruction for Single-Particle Cryo-EM

Single-particle cryo-electron microscopy (cryo-EM) reconstructs the three-dimensional (3D) structure of bio-molecules from a large set of 2D projection images with random and unknown orientations. A crucial step in the single-particle cryo-EM pipeline is 3D refinement, which resolves a high-resolution 3D structure from an initial approximate volume by refining the estimation of the orientation of each projection. In this work, we propose a new approach that refines the projection angles on the continuum. We formulate the optimization problem over the density map and the orientations jointly. The density map is updated using the efficient alternating-direction method of multipliers, while the orientations are updated through a semi-coordinate-wise gradient descent for which we provide an explicit derivation of the gradient. Our method eliminates the requirement for a fine discretization of the orientation space and does away with the classical but computationally expensive template-matching step. Numerical results demonstrate the feasibility and performance of our approach compared to several baselines.

preprint2019arXiv

Three-Dimensional Optical Diffraction Tomography with Lippmann-Schwinger Model

A broad class of imaging modalities involve the resolution of an inverse-scattering problem. Among them, three-dimensional optical diffraction tomography (ODT) comes with its own challenges. These include a limited range of views, a large size of the sample with respect to the illumination wavelength, and optical aberrations that are inherent to the system itself. In this work, we present an accurate and efficient implementation of the forward model. It relies on the exact (nonlinear) Lippmann-Schwinger equation. We address several crucial issues such as the discretization of the Green function, the computation of the far field, and the estimation of the incident field. We then deploy this model in a regularized variational-reconstruction framework and show on both simulated and real data that it leads to substantially better reconstructions than the approximate models that are traditionally used in ODT.

preprint2016arXiv

Back-propagating the light of field stars to probe telescope mirrors aberrations

We propose a wavefront-based method to estimate the PSF over the whole field of view. This method estimate the aberrations of all the mirrors of the telescope using only field stars. In this proof of concept paper, we described the method and present some qualitative results.

preprint2016arXiv

Multidimensional Lévy White Noises in Weighted Besov Spaces

In this paper, we study the Besov regularity of d-dimensional Lévy white noises. More precisely, we describe new sample paths properties of a given white noise in terms of weighted Besov spaces. In particular, the smoothness and integrability properties of Lévy white noises are characterized using the Blumenthal-Getoor indices. Our techniques rely on wavelet methods and generalized moments estimates for Lévy white noises.

preprint2015arXiv

A Learning Approach to Optical Tomography

We describe a method for imaging 3D objects in a tomographic configuration implemented by training an artificial neural network to reproduce the complex amplitude of the experimentally measured scattered light. The network is designed such that the voxel values of the refractive index of the 3D object are the variables that are adapted during the training process. We demonstrate the method experimentally by forming images of the 3D refractive index distribution of cells.

preprint2015arXiv

Joint Image Reconstruction and Segmentation Using the Potts Model

We propose a new algorithmic approach to the non-smooth and non-convex Potts problem (also called piecewise-constant Mumford-Shah problem) for inverse imaging problems. We derive a suitable splitting into specific subproblems that can all be solved efficiently. Our method does not require a priori knowledge on the gray levels nor on the number of segments of the reconstruction. Further, it avoids anisotropic artifacts such as geometric staircasing. We demonstrate the suitability of our method for joint image reconstruction and segmentation. We focus on Radon data, where we in particular consider limited data situations. For instance, our method is able to recover all segments of the Shepp-Logan phantom from $7$ angular views only. We illustrate the practical applicability on a real PET dataset. As further applications, we consider spherical Radon data as well as blurred data.

preprint2014arXiv

Ellipse-preserving Hermite interpolation and subdivision

We introduce a family of piecewise-exponential functions that have the Hermite interpolation property. Our design is motivated by the search for an effective scheme for the joint interpolation of points and associated tangents on a curve with the ability to perfectly reproduce ellipses. We prove that the proposed Hermite functions form a Riesz basis and that they reproduce prescribed exponential polynomials. We present a method based on Green's functions to unravel their multi-resolution and approximation-theoretic properties. Finally, we derive the corresponding vector and scalar subdivision schemes, which lend themselves to a fast implementation. The proposed vector scheme is interpolatory and level-dependent, but its asymptotic behaviour is the same as the classical cubic Hermite spline algorithm. The same convergence properties---i.e., fourth order of approximation---are hence ensured.

preprint2014arXiv

On the Continuity of Characteristic Functionals and Sparse Stochastic Modeling

The characteristic functional is the infinite-dimensional generalization of the Fourier transform for measures on function spaces. It characterizes the statistical law of the associated stochastic process in the same way as a characteristic function specifies the probability distribution of its corresponding random variable. Our goal in this work is to lay the foundations of the innovation model, a (possibly) non-Gaussian probabilistic model for sparse signals. This is achieved by using the characteristic functional to specify sparse stochastic processes that are defined as linear transformations of general continuous-domain white noises (also called innovation processes). We prove the existence of a broad class of sparse processes by using the Minlos-Bochner theorem. This requires a careful study of the regularity properties, especially the boundedness in Lp-spaces, of the characteristic functional of the innovations. We are especially interested in the functionals that are only defined for p<1 since they appear to be associated with the sparser kind of processes. Finally, we apply our main theorem of existence to two specific subclasses of processes with specific invariance properties.

preprint2013arXiv

Decay properties of Riesz transforms and steerable wavelets

The Riesz transform is a natural multi-dimensional extension of the Hilbert transform, and it has been the object of study for many years due to its nice mathematical properties. More recently, the Riesz transform and its variants have been used to construct complex wavelets and steerable wavelet frames in higher dimensions. The flip side of this approach, however, is that the Riesz transform of a wavelet often has slow decay. One can nevertheless overcome this problem by requiring the original wavelet to have sufficient smoothness, decay, and vanishing moments. In this paper, we derive necessary conditions in terms of these three properties that guarantee the decay of the Riesz transform and its variants, and as an application, we show how the decay of the popular Simoncelli wavelets can be improved by appropriately modifying their Fourier transforms. By applying the Riesz transform to these new wavelets, we obtain steerable frames with rapid decay.

preprint2013arXiv

Harmonic Singular Integrals and Steerable Wavelets in $L_2(\mathbb{R}^d)$

Here we present a method of constructing steerable wavelet frames in $L_2(\mathbb{R}^d)$ that generalizes and unifies previous approaches, including Simoncelli's pyramid and Riesz wavelets. The motivation for steerable wavelets is the need to more accurately account for the orientation of data. Such wavelets can be constructed by decomposing an isotropic mother wavelet into a finite collection of oriented mother wavelets. The key to this construction is that the angular decomposition is an isometry, whereby the new collection of wavelets maintains the frame bounds of the original one. The general method that we propose here is based on partitions of unity involving spherical harmonics. A fundamental aspect of this construction is that Fourier multipliers composed of spherical harmonics correspond to singular integrals in the spatial domain. Such transforms have been studied extensively in the field of harmonic analysis, and we take advantage of this wealth of knowledge to make the proposed construction practically feasible and computationally efficient.

preprint2013arXiv

Hessian Schatten-Norm Regularization for Linear Inverse Problems

We introduce a novel family of invariant, convex, and non-quadratic functionals that we employ to derive regularized solutions of ill-posed linear inverse imaging problems. The proposed regularizers involve the Schatten norms of the Hessian matrix, computed at every pixel of the image. They can be viewed as second-order extensions of the popular total-variation (TV) semi-norm since they satisfy the same invariance properties. Meanwhile, by taking advantage of second-order derivatives, they avoid the staircase effect, a common artifact of TV-based reconstructions, and perform well for a wide range of applications. To solve the corresponding optimization problems, we propose an algorithm that is based on a primal-dual formulation. A fundamental ingredient of this algorithm is the projection of matrices onto Schatten norm balls of arbitrary radius. This operation is performed efficiently based on a direct link we provide between vector projections onto $\ell_q$ norm balls and matrix projections onto Schatten norm balls. Finally, we demonstrate the effectiveness of the proposed methods through experimental results on several inverse imaging problems with real and simulated data.

preprint2013arXiv

Operator-Like Wavelet Bases of $L_2(\mathbb{R}^d)$

The connection between derivative operators and wavelets is well known. Here we generalize the concept by constructing multiresolution approximations and wavelet basis functions that act like Fourier multiplier operators. This construction follows from a stochastic model: signals are tempered distributions such that the application of a whitening (differential) operator results in a realization of a sparse white noise. Using wavelets constructed from these operators, the sparsity of the white noise can be inherited by the wavelet coefficients. In this paper, we specify such wavelets in full generality and determine their properties in terms of the underlying operator.

preprint2013arXiv

Optimality of Operator-Like Wavelets for Representing Sparse AR(1) Processes

It is known that the Karhunen-Loève transform (KLT) of Gaussian first-order auto-regressive (AR(1)) processes results in sinusoidal basis functions. The same sinusoidal bases come out of the independent-component analysis (ICA) and actually correspond to processes with completely independent samples. In this paper, we relax the Gaussian hypothesis and study how orthogonal transforms decouple symmetric-alpha-stable (S$α$S) AR(1) processes. The Gaussian case is not sparse and corresponds to $α=2$, while $0<α<2$ yields processes with sparse linear-prediction error. In the presence of sparsity, we show that operator-like wavelet bases do outperform the sinusoidal ones. Also, we observe that, for processes with very sparse increments ($0<α\leq 1$), the operator-like wavelet basis is indistinguishable from the ICA solution obtained through numerical optimization. We consider two criteria for independence. The first is the Kullback-Leibler divergence between the joint probability density function (pdf) of the original signal and the product of the marginals in the transformed domain. The second is a divergence between the joint pdf of the original signal and the product of the marginals in the transformed domain, which is based on Stein's formula for the mean-square estimation error in additive Gaussian noise. Our framework then offers a unified view that encompasses the discrete cosine transform (known to be asymptotically optimal for $α=2$) and Haar-like wavelets (for which we achieve optimality for $0<α\leq1$).

preprint2012arXiv

A unified formulation of Gaussian vs. sparse stochastic processes - Part I: Continuous-domain theory

We introduce a general distributional framework that results in a unifying description and characterization of a rich variety of continuous-time stochastic processes. The cornerstone of our approach is an innovation model that is driven by some generalized white noise process, which may be Gaussian or not (e.g., Laplace, impulsive Poisson or alpha stable). This allows for a conceptual decoupling between the correlation properties of the process, which are imposed by the whitening operator L, and its sparsity pattern which is determined by the type of noise excitation. The latter is fully specified by a Levy measure. We show that the range of admissible innovation behavior varies between the purely Gaussian and super-sparse extremes. We prove that the corresponding generalized stochastic processes are well-defined mathematically provided that the (adjoint) inverse of the whitening operator satisfies some Lp bound for p>=1. We present a novel operator-based method that yields an explicit characterization of all Levy-driven processes that are solutions of constant-coefficient stochastic differential equations. When the underlying system is stable, we recover the family of stationary CARMA processes, including the Gaussian ones. The approach remains valid when the system is unstable and leads to the identification of potentially useful generalizations of the Levy processes, which are sparse and non-stationary. Finally, we show how we can apply finite difference operators to obtain a stationary characterization of these processes that is maximally decoupled and stable, irrespective of the location of the poles in the complex plane.

preprint2012arXiv

A unified formulation of Gaussian vs. sparse stochastic processes - Part II: Discrete-domain theory

This paper is devoted to the characterization of an extended family of CARMA (continuous-time autoregressive moving average) processes that are solutions of stochastic differential equations driven by white Levy innovations. These are completely specified by: (1) a set of poles and zeros that fixes their correlation structure, and (2) a canonical infinitely-divisible probability distribution that controls their degree of sparsity (with the Gaussian model corresponding to the least sparse scenario). The generalized CARMA processes are either stationary or non-stationary, depending on the location of the poles in the complex plane. The most basic non-stationary representatives (with a single pole at the origin) are the Levy processes, which are the non-Gaussian counterparts of Brownian motion. We focus on the general analog-to-discrete conversion problem and introduce a novel spline-based formalism that greatly simplifies the derivation of the correlation properties and joint probability distributions of the discrete versions of these processes. We also rely on the concept of generalized increment process, which suppresses all long range dependencies, to specify an equivalent discrete-domain innovation model. A crucial ingredient is the existence of a minimally-supported function associated with the whitening operator L; this B-spline, which is fundamental to our formulation, appears in most of our formulas, both at the level of the correlation and the characteristic function. We make use of these discrete-domain results to numerically generate illustrative examples of sparse signals that are consistent with the continuous-domain model.

preprint2012arXiv

Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning

We consider the estimation of an i.i.d. (possibly non-Gaussian) vector $\xbf \in \R^n$ from measurements $\ybf \in \R^m$ obtained by a general cascade model consisting of a known linear transform followed by a probabilistic componentwise (possibly nonlinear) measurement channel. A novel method, called adaptive generalized approximate message passing (Adaptive GAMP), that enables joint learning of the statistics of the prior and measurement channel along with estimation of the unknown vector $\xbf$ is presented. The proposed algorithm is a generalization of a recently-developed EM-GAMP that uses expectation-maximization (EM) iterations where the posteriors in the E-steps are computed via approximate message passing. The methodology can be applied to a large class of learning problems including the learning of sparse priors in compressed sensing or identification of linear-nonlinear cascade models in dynamical systems and neural spiking processes. We prove that for large i.i.d. Gaussian transform matrices the asymptotic componentwise behavior of the adaptive GAMP algorithm is predicted by a simple set of scalar state evolution equations. In addition, we show that when a certain maximum-likelihood estimation can be performed in each step, the adaptive GAMP method can yield asymptotically consistent parameter estimates, which implies that the algorithm achieves a reconstruction quality equivalent to the oracle algorithm that knows the correct parameter values. Remarkably, this result applies to essentially arbitrary parametrizations of the unknown distributions, including ones that are nonlinear and non-Gaussian. The adaptive GAMP methodology thus provides a systematic, general and computationally efficient method applicable to a large range of complex linear-nonlinear models with provable guarantees.

preprint2012arXiv

Bayesian Estimation for Continuous-Time Sparse Stochastic Processes

We consider continuous-time sparse stochastic processes from which we have only a finite number of noisy/noiseless samples. Our goal is to estimate the noiseless samples (denoising) and the signal in-between (interpolation problem). By relying on tools from the theory of splines, we derive the joint a priori distribution of the samples and show how this probability density function can be factorized. The factorization enables us to tractably implement the maximum a posteriori and minimum mean-square error (MMSE) criteria as two statistical approaches for estimating the unknowns. We compare the derived statistical methods with well-known techniques for the recovery of sparse signals, such as the $\ell_1$ norm and Log ($\ell_1$-$\ell_0$ relaxation) regularization methods. The simulation results show that, under certain conditions, the performance of the regularization techniques can be very close to that of the MMSE estimator.

preprint2012arXiv

Sparse Stochastic Processes and Discretization of Linear Inverse Problems

We present a novel statistically-based discretization paradigm and derive a class of maximum a posteriori (MAP) estimators for solving ill-conditioned linear inverse problems. We are guided by the theory of sparse stochastic processes, which specifies continuous-domain signals as solutions of linear stochastic differential equations. Accordingly, we show that the class of admissible priors for the discretized version of the signal is confined to the family of infinitely divisible distributions. Our estimators not only cover the well-studied methods of Tikhonov and $\ell_1$-type regularizations as particular cases, but also open the door to a broader class of sparsity-promoting regularization schemes that are typically nonconvex. We provide an algorithm that handles the corresponding nonconvex problems and illustrate the use of our formalism by applying it to deconvolution, MRI, and X-ray tomographic reconstruction problems. Finally, we compare the performance of estimators associated with models of increasing sparsity.

preprint2011arXiv

Fast O(1) bilateral filtering using trigonometric range kernels

It is well-known that spatial averaging can be realized (in space or frequency domain) using algorithms whose complexity does not depend on the size or shape of the filter. These fast algorithms are generally referred to as constant-time or O(1) algorithms in the image processing literature. Along with the spatial filter, the edge-preserving bilateral filter [Tomasi1998] involves an additional range kernel. This is used to restrict the averaging to those neighborhood pixels whose intensity are similar or close to that of the pixel of interest. The range kernel operates by acting on the pixel intensities. This makes the averaging process non-linear and computationally intensive, especially when the spatial filter is large. In this paper, we show how the O(1) averaging algorithms can be leveraged for realizing the bilateral filter in constant-time, by using trigonometric range kernels. This is done by generalizing the idea in [Porikli2008] of using polynomial range kernels. The class of trigonometric kernels turns out to be sufficiently rich, allowing for the approximation of the standard Gaussian bilateral filter. The attractive feature of our approach is that, for a fixed number of terms, the quality of approximation achieved using trigonometric kernels is much superior to that obtained in [Porikli2008] using polynomials.

preprint2011arXiv

Fast space-variant elliptical filtering using box splines

The efficient realization of linear space-variant (non-convolution) filters is a challenging computational problem in image processing. In this paper, we demonstrate that it is possible to filter an image with a Gaussian-like elliptic window of varying size, elongation and orientation using a fixed number of computations per pixel. The associated algorithm, which is based on a family of smooth compactly supported piecewise polynomials, the radially-uniform box splines, is realized using pre-integration and local finite-differences. The radially-uniform box splines are constructed through the repeated convolution of a fixed number of box distributions, which have been suitably scaled and distributed radially in an uniform fashion. The attractive features of these box splines are their asymptotic behavior, their simple covariance structure, and their quasi-separability. They converge to Gaussians with the increase of their order, and are used to approximate anisotropic Gaussians of varying covariance simply by controlling the scales of the constituent box distributions. Based on the second feature, we develop a technique for continuously controlling the size, elongation and orientation of these Gaussian-like functions. Finally, the quasi-separable structure, along with a certain scaling property of box distributions, is used to efficiently realize the associated space-variant elliptical filtering, which requires O(1) computations per pixel irrespective of the shape and size of the filter.

preprint2011arXiv

On the Hilbert transform of wavelets

A wavelet is a localized function having a prescribed number of vanishing moments. In this correspondence, we provide precise arguments as to why the Hilbert transform of a wavelet is again a wavelet. In particular, we provide sharp estimates of the localization, vanishing moments, and smoothness of the transformed wavelet. We work in the general setting of non-compactly supported wavelets. Our main result is that, in the presence of some minimal smoothness and decay, the Hilbert transform of a wavelet is again as smooth and oscillating as the original wavelet, whereas its localization is controlled by the number of vanishing moments of the original wavelet. We motivate our results using concrete examples.

preprint2010arXiv

Left-Inverses of Fractional Laplacian and Sparse Stochastic Processes

The fractional Laplacian $(-\triangle)^{γ/2}$ commutes with the primary coordination transformations in the Euclidean space $\RR^d$: dilation, translation and rotation, and has tight link to splines, fractals and stable Levy processes. For $0<γ<d$, its inverse is the classical Riesz potential $I_γ$ which is dilation-invariant and translation-invariant. In this work, we investigate the functional properties (continuity, decay and invertibility) of an extended class of differential operators that share those invariance properties. In particular, we extend the definition of the classical Riesz potential $I_γ$ to any non-integer number $γ$ larger than $d$ and show that it is the unique left-inverse of the fractional Laplacian $(-\triangle)^{γ/2}$ which is dilation-invariant and translation-invariant. We observe that, for any $1\le p\le \infty$ and $γ\ge d(1-1/p)$, there exists a Schwartz function $f$ such that $I_γf$ is not $p$-integrable. We then introduce the new unique left-inverse $I_{γ, p}$ of the fractional Laplacian $(-\triangle)^{γ/2}$ with the property that $I_{γ, p}$ is dilation-invariant (but not translation-invariant) and that $I_{γ, p}f$ is $p$-integrable for any Schwartz function $f$. We finally apply that linear operator $I_{γ, p}$ with $p=1$ to solve the stochastic partial differential equation $(-\triangle)^{γ/2} Φ=w$ with white Poisson noise as its driving term $w$.

preprint2009arXiv

Construction of Hilbert Transform Pairs of Wavelet Bases and Gabor-like Transforms

We propose a novel method for constructing Hilbert transform (HT) pairs of wavelet bases based on a fundamental approximation-theoretic characterization of scaling functions--the B-spline factorization theorem. In particular, starting from well-localized scaling functions, we construct HT pairs of biorthogonal wavelet bases of L^2(R) by relating the corresponding wavelet filters via a discrete form of the continuous HT filter. As a concrete application of this methodology, we identify HT pairs of spline wavelets of a specific flavor, which are then combined to realize a family of complex wavelets that resemble the optimally-localized Gabor function for sufficiently large orders. Analytic wavelets, derived from the complexification of HT wavelet pairs, exhibit a one-sided spectrum. Based on the tensor-product of such analytic wavelets, and, in effect, by appropriately combining four separable biorthogonal wavelet bases of L^2(R^2), we then discuss a methodology for constructing 2D directional-selective complex wavelets. In particular, analogous to the HT correspondence between the components of the 1D counterpart, we relate the real and imaginary components of these complex wavelets using a multi-dimensional extension of the HT--the directional HT. Next, we construct a family of complex spline wavelets that resemble the directional Gabor functions proposed by Daugman. Finally, we present an efficient FFT-based filterbank algorithm for implementing the associated complex wavelet transform.

preprint2009arXiv

Fast adaptive elliptical filtering using box splines

We demonstrate that it is possible to filter an image with an elliptic window of varying size, elongation and orientation with a fixed computational cost per pixel. Our method involves the application of a suitable global pre-integrator followed by a pointwise-adaptive localization mesh. We present the basic theory for the 1D case using a B-spline formalism and then appropriately extend it to 2D using radially-uniform box splines. The size and ellipticity of these radially-uniform box splines is adaptively controlled. Moreover, they converge to Gaussians as the order increases. Finally, we present a fast and practical directional filtering algorithm that has the capability of adapting to the local image features.

preprint2009arXiv

Gabor wavelet analysis and the fractional Hilbert transform

We propose an amplitude-phase representation of the dual-tree complex wavelet transform (DT-CWT) which provides an intuitive interpretation of the associated complex wavelet coefficients. The representation, in particular, is based on the shifting action of the group of fractional Hilbert transforms (fHT) which allow us to extend the notion of arbitrary phase-shifts beyond pure sinusoids. We explicitly characterize this shifting action for a particular family of Gabor-like wavelets which, in effect, links the corresponding dual-tree transform with the framework of windowed-Fourier analysis. We then extend these ideas to the bivariate DT-CWT based on certain directional extensions of the fHT. In particular, we derive a signal representation involving the superposition of direction-selective wavelets affected with appropriate phase-shifts.

preprint2009arXiv

On the Shiftability of Dual-Tree Complex Wavelet Transforms

The dual-tree complex wavelet transform (DT-CWT) is known to exhibit better shift-invariance than the conventional discrete wavelet transform. We propose an amplitude-phase representation of the DT-CWT which, among other things, offers a direct explanation for the improvement in the shift-invariance. The representation is based on the shifting action of the group of fractional Hilbert transform (fHT) operators, which extends the notion of arbitrary phase-shifts from sinusoids to finite-energy signals (wavelets in particular). In particular, we characterize the shiftability of the DT-CWT in terms of the shifting property of the fHTs. At the heart of the representation are certain fundamental invariances of the fHT group, namely that of translation, dilation, and norm, which play a decisive role in establishing the key properties of the transform. It turns out that these fundamental invariances are exclusive to this group. Next, by introducing a generalization of the Bedrosian theorem for the fHT operator, we derive an explicitly understanding of the shifting action of the fHT for the particular family of wavelets obtained through the modulation of lowpass functions (e.g., the Shannon and Gabor wavelet). This, in effect, links the corresponding dual-tree transform with the framework of windowed-Fourier analysis. Finally, we extend these ideas to the multi-dimensional setting by introducing a directional extension of the fHT, the fractional directional Hilbert transform. In particular, we derive a signal representation involving the superposition of direction-selective wavelets with appropriate phase-shifts, which helps explain the improved shift-invariance of the transform along certain preferential directions.

Michael Unser

What is connected

Connect this record

See the researcher in context

Building this map preview

46 published item(s)

Revisiting Deep Information Propagation: Fractal Frontier and Finite-size Effects

Universal Architectures for the Learning of Polyhedral Norms and Convex Regularizers

Approximation of Lipschitz Functions using Deep Spline Neural Networks

Asymptotic Stability in Reservoir Computing

Bona fide Riesz projections for density estimation

Complex-Order Scale-Invariant Operators and Self-Similar Processes

Coupled Splines for Sparse Curve Fitting

Delaunay-Triangulation-Based Learning with Hessian Total-Variation Regularization

From Kernel Methods to Neural Networks: A Unifying Variational Formulation

Measuring Complexity of Learning Schemes Using Hessian-Schatten Total Variation

Phase Retrieval: From Computational Imaging to Machine Learning

Ridges, Neural Networks, and the Radon Transform

Stable Parametrization of Continuous and Piecewise-Linear Functions

Optimal-transport-based metric for SMLM

Time-Dependent Deep Image Prior for Dynamic MRI

A unifying representer theorem for inverse problems and machine learning

Dictionary Learning for Two-Dimensional Kendall Shapes

Duality Mapping for Schatten Matrix Norms

Fast Rotational Sparse Coding

Generating Sparse Stochastic Processes Using Matched Splines

Joint Angular Refinement and Reconstruction for Single-Particle Cryo-EM

Three-Dimensional Optical Diffraction Tomography with Lippmann-Schwinger Model

Back-propagating the light of field stars to probe telescope mirrors aberrations

Multidimensional Lévy White Noises in Weighted Besov Spaces

A Learning Approach to Optical Tomography

Joint Image Reconstruction and Segmentation Using the Potts Model

Ellipse-preserving Hermite interpolation and subdivision

On the Continuity of Characteristic Functionals and Sparse Stochastic Modeling

Decay properties of Riesz transforms and steerable wavelets

Harmonic Singular Integrals and Steerable Wavelets in $L_2(\mathbb{R}^d)$

Hessian Schatten-Norm Regularization for Linear Inverse Problems

Operator-Like Wavelet Bases of $L_2(\mathbb{R}^d)$

Optimality of Operator-Like Wavelets for Representing Sparse AR(1) Processes

A unified formulation of Gaussian vs. sparse stochastic processes - Part I: Continuous-domain theory

A unified formulation of Gaussian vs. sparse stochastic processes - Part II: Discrete-domain theory

Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning

Bayesian Estimation for Continuous-Time Sparse Stochastic Processes

Sparse Stochastic Processes and Discretization of Linear Inverse Problems

Fast O(1) bilateral filtering using trigonometric range kernels

Fast space-variant elliptical filtering using box splines

On the Hilbert transform of wavelets

Left-Inverses of Fractional Laplacian and Sparse Stochastic Processes

Construction of Hilbert Transform Pairs of Wavelet Bases and Gabor-like Transforms

Fast adaptive elliptical filtering using box splines

Gabor wavelet analysis and the fractional Hilbert transform

On the Shiftability of Dual-Tree Complex Wavelet Transforms