Researcher profile

Michael Unser

Michael Unser contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
22works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

22 published item(s)

preprint2026arXiv

Revisiting Deep Information Propagation: Fractal Frontier and Finite-size Effects

Information propagation characterizes how input correlations evolve across layers in deep neural networks. This framework has been well studied using mean-field theory, which assumes infinitely wide networks. However, these assumptions break down for practical, finite-size networks. In this work, we study information propagation in randomly initialized neural networks with finite width and reveal that the boundary between ordered and chaotic regimes exhibits a fractal structure. This shows the fundamental complexity of neural network dynamics, in a setting that is independent of input data and optimization. To extend this analysis beyond multilayer perceptrons, we leverage recently introduced Fourier-based structured transforms, and show that information propagation in convolutional neural networks also follow the same behavior. In practice, our investigation highlights the importance of finite network depth with respect to the tradeoff between separation and robustness.

preprint2026arXiv

Universal Architectures for the Learning of Polyhedral Norms and Convex Regularizers

This paper addresses the task of learning convex regularizers to guide the reconstruction of images from limited data. By imposing that the reconstruction be amplitude-equivariant, we narrow down the class of admissible functionals to those that can be expressed as a power of a seminorm. We then show that such functionals can be approximated to arbitrary precision with the help of polyhedral norms. In particular, we identify two dual parameterizations of such systems: (i) a synthesis form with an $\ell_1$-penalty that involves some learnable dictionary; and (ii) an analysis form with an $\ell_\infty$-penalty that involves a trainable regularization operator. After having provided geometric insights and proved that the two forms are universal, we propose an implementation that relies on a specific architecture (tight frame with a weighted $\ell_1$ penalty) that is easy to train. We illustrate its use for denoising and the reconstruction of biomedical images. We find that the proposed framework outperforms the sparsity-based methods of compressed sensing, while it offers essentially the same convergence and robustness guarantees.

preprint2022arXiv

Approximation of Lipschitz Functions using Deep Spline Neural Networks

Lipschitz-constrained neural networks have many applications in machine learning. Since designing and training expressive Lipschitz-constrained networks is very challenging, there is a need for improved methods and a better theoretical understanding. Unfortunately, it turns out that ReLU networks have provable disadvantages in this setting. Hence, we propose to use learnable spline activation functions with at least 3 linear regions instead. We prove that this choice is optimal among all component-wise $1$-Lipschitz activation functions in the sense that no other weight constrained architecture can approximate a larger class of functions. Additionally, this choice is at least as expressive as the recently introduced non component-wise Groupsort activation function for spectral-norm-constrained weights. Previously published numerical results support our theoretical findings.

preprint2022arXiv

Asymptotic Stability in Reservoir Computing

Reservoir Computing is a class of Recurrent Neural Networks with internal weights fixed at random. Stability relates to the sensitivity of the network state to perturbations. It is an important property in Reservoir Computing as it directly impacts performance. In practice, it is desirable to stay in a stable regime, where the effect of perturbations does not explode exponentially, but also close to the chaotic frontier where reservoir dynamics are rich. Open questions remain today regarding input regularization and discontinuous activation functions. In this work, we use the recurrent kernel limit to draw new insights on stability in reservoir computing. This limit corresponds to large reservoir sizes, and it already becomes relevant for reservoirs with a few hundred neurons. We obtain a quantitative characterization of the frontier between stability and chaos, which can greatly benefit hyperparameter tuning. In a broader sense, our results contribute to understanding the complex dynamics of Recurrent Neural Networks.

preprint2022arXiv

Bona fide Riesz projections for density estimation

The projection of sample measurements onto a reconstruction space represented by a basis on a regular grid is a powerful and simple approach to estimate a probability density function. In this paper, we focus on Riesz bases and propose a projection operator that, in contrast to previous works, guarantees the bona fide properties for the estimate, namely, non-negativity and total probability mass $1$. Our bona fide projection is defined as a convex problem. We propose solution techniques and evaluate them. Results suggest an improved performance, specifically in circumstances prone to rippling effects.

preprint2022arXiv

Complex-Order Scale-Invariant Operators and Self-Similar Processes

Derivatives and integration operators are well-studied examples of linear operators that commute with scaling up to a fixed multiplicative factor; i.e., they are scale-invariant. Fractional order derivatives (integration operators) also belong to this family. In this paper, we extend the fractional operators to complex-order operators by constructing them in the Fourier domain. We analyze these operators in details with a special emphasis on the decay properties of the outputs. We further use these operators to introduce a family of complex-valued stable processes that are self-similar with complex-valued Hurst indices. These processes are expressed via the characteristic functionals over the Schwartz space of functions. Besides the self-similarity and stationarity, we study the regularity (in terms of Sobolev spaces) of the proposed processes.

preprint2022arXiv

Coupled Splines for Sparse Curve Fitting

We formulate as an inverse problem the construction of sparse parametric continuous curve models that fit a sequence of contour points. Our prior is incorporated as a regularization term that encourages rotation invariance and sparsity. We prove that an optimal solution to the inverse problem is a closed curve with spline components. We then show how to efficiently solve the task using B-splines as basis functions. We extend our problem formulation to curves made of two distinct components with complementary smoothness properties and solve it using hybrid splines. We illustrate the performance of our model on contours of different smoothness. Our experimental results show that we can faithfully reconstruct any general contour using few parameters, even in the presence of imprecisions in the measurements.

preprint2022arXiv

Delaunay-Triangulation-Based Learning with Hessian Total-Variation Regularization

Regression is one of the core problems tackled in supervised learning. Rectified linear unit (ReLU) neural networks generate continuous and piecewise-linear (CPWL) mappings and are the state-of-the-art approach for solving regression problems. In this paper, we propose an alternative method that leverages the expressivity of CPWL functions. In contrast to deep neural networks, our CPWL parameterization guarantees stability and is interpretable. Our approach relies on the partitioning of the domain of the CPWL function by a Delaunay triangulation. The function values at the vertices of the triangulation are our learnable parameters and identify the CPWL function uniquely. Formulating the learning scheme as a variational problem, we use the Hessian total variation (HTV) as regularizer to favor CPWL functions with few affine pieces. In this way, we control the complexity of our model through a single hyperparameter. By developing a computational framework to compute the HTV of any CPWL function parameterized by a triangulation, we discretize the learning problem as the generalized least absolute shrinkage and selection operator (LASSO). Our experiments validate the usage of our method in low-dimensional scenarios.

preprint2022arXiv

From Kernel Methods to Neural Networks: A Unifying Variational Formulation

The minimization of a data-fidelity term and an additive regularization functional gives rise to a powerful framework for supervised learning. In this paper, we present a unifying regularization functional that depends on an operator and on a generic Radon-domain norm. We establish the existence of a minimizer and give the parametric form of the solution(s) under very mild assumptions. When the norm is Hilbertian, the proposed formulation yields a solution that involves radial-basis functions and is compatible with the classical methods of machine learning. By contrast, for the total-variation norm, the solution takes the form of a two-layer neural network with an activation function that is determined by the regularization operator. In particular, we retrieve the popular ReLU networks by letting the operator be the Laplacian. We also characterize the solution for the intermediate regularization norms $\|\cdot\|=\|\cdot\|_{L_p}$ with $p\in(1,2]$. Our framework offers guarantees of universal approximation for a broad family of regularization operators or, equivalently, for a wide variety of shallow neural networks, including the cases (such as ReLU) where the activation function is increasing polynomially. It also explains the favorable role of bias and skip connections in neural architectures.

preprint2022arXiv

Measuring Complexity of Learning Schemes Using Hessian-Schatten Total Variation

In this paper, we introduce the Hessian-Schatten total variation (HTV) -- a novel seminorm that quantifies the total "rugosity" of multivariate functions. Our motivation for defining HTV is to assess the complexity of supervised-learning schemes. We start by specifying the adequate matrix-valued Banach spaces that are equipped with suitable classes of mixed norms. We then show that the HTV is invariant to rotations, scalings, and translations. Additionally, its minimum value is achieved for linear mappings, which supports the common intuition that linear regression is the least complex learning model. Next, we present closed-form expressions of the HTV for two general classes of functions. The first one is the class of Sobolev functions with a certain degree of regularity, for which we show that the HTV coincides with the Hessian-Schatten seminorm that is sometimes used as a regularizer for image reconstruction. The second one is the class of continuous and piecewise-linear (CPWL) functions. In this case, we show that the HTV reflects the total change in slopes between linear regions that have a common facet. Hence, it can be viewed as a convex relaxation (l1-type) of the number of linear regions (l0-type) of CPWL mappings. Finally, we illustrate the use of our proposed seminorm.

preprint2022arXiv

Phase Retrieval: From Computational Imaging to Machine Learning

Phase retrieval consists in the recovery of a complex-valued signal from intensity-only measurements. As it pervades a broad variety of applications, many researchers have striven to develop phase-retrieval algorithms. Classical approaches involve techniques as varied as generic gradient-descent routines or specialized spectral methods, to name a few. Yet, the phase-recovery problem remains a challenge to this day. Recently, however, advances in machine learning have revitalized the study of phase retrieval in two ways: significant theoretical advances have emerged from the analogy between phase retrieval and single-layer neural networks; practical breakthroughs have been obtained thanks to deep-learning regularization. In this tutorial, we review phase retrieval under a unifying framework that encompasses classical and machine-learning methods. We focus on three key elements: applications, overview of recent reconstruction algorithms, and the latest theoretical results.

preprint2022arXiv

Ridges, Neural Networks, and the Radon Transform

A ridge is a function that is characterized by a one-dimensional profile (activation) and a multidimensional direction vector. Ridges appear in the theory of neural networks as functional descriptors of the effect of a neuron, with the direction vector being encoded in the linear weights. In this paper, we investigate properties of the Radon transform in relation to ridges and to the characterization of neural networks. We introduce a broad category of hyper-spherical Banach subspaces (including the relevant subspace of measures) over which the back-projection operator is invertible. We also give conditions under which the back-projection operator is extendable to the full parent space with its null space being identifiable as a Banach complement. Starting from first principles, we then characterize the sampling functionals that are in the range of the filtered Radon transform. Next, we extend the definition of ridges for any distributional profile and determine their (filtered) Radon transform in full generality. Finally, we apply our formalism to clarify and simplify some of the results and proofs on the optimality of ReLU networks that have appeared in the literature.

preprint2022arXiv

Stable Parametrization of Continuous and Piecewise-Linear Functions

Rectified-linear-unit (ReLU) neural networks, which play a prominent role in deep learning, generate continuous and piecewise-linear (CPWL) functions. While they provide a powerful parametric representation, the mapping between the parameter and function spaces lacks stability. In this paper, we investigate an alternative representation of CPWL functions that relies on local hat basis functions. It is predicated on the fact that any CPWL function can be specified by a triangulation and its values at the grid points. We give the necessary and sufficient condition on the triangulation (in any number of dimensions) for the hat functions to form a Riesz basis, which ensures that the link between the parameters and the corresponding CPWL function is stable and unique. In addition, we provide an estimate of the $\ell_2\rightarrow L_2$ condition number of this local representation. Finally, as a special case of our framework, we focus on a systematic parametrization of $\mathbb{R}^d$ with control points placed on a uniform grid. In particular, we choose hat basis functions that are shifted replicas of a single linear box spline. In this setting, we prove that our general estimate of the condition number is optimal. We also relate our local representation to a nonlocal one based on shifts of a causal ReLU-like function.

preprint2021arXiv

Optimal-transport-based metric for SMLM

We propose the use of Flat Metric to assess the performance of reconstruction methods for single-molecule localization microscopy (SMLM) in scenarios where the ground-truth is available. Flat Metric is intimately related to the concept of optimal transport between measures of different mass, providing solid mathematical foundations for SMLM evaluation and integrating both localization and detection performance. In this paper, we provide the foundations of Flat Metric and validate this measure by applying it to controlled synthetic examples and to data from the SMLM 2016 Challenge.

preprint2021arXiv

Time-Dependent Deep Image Prior for Dynamic MRI

We propose a novel unsupervised deep-learning-based algorithm for dynamic magnetic resonance imaging (MRI) reconstruction. Dynamic MRI requires rapid data acquisition for the study of moving organs such as the heart. Existing reconstruction methods suffer from restrictions either in the model design or in the absence of ground-truth data, resulting in low image quality. We introduce a generalized version of the deep-image-prior approach, which optimizes the network weights to fit a sequence of sparsely acquired dynamic MRI measurements. Our method needs neither prior training nor additional data. In particular, for cardiac images, it does not require the marking of heartbeats or the reordering of spokes. The key ingredients of our method are threefold: 1) a fixed low-dimensional manifold that encodes the temporal variations of images; 2) a network that maps the manifold into a more expressive latent space; and 3) a convolutional neural network that generates a dynamic series of MRI images from the latent variables and that favors their consistency with the measurements in k-space. Our method outperforms the state-of-the-art methods quantitatively and qualitatively in both retrospective and real fetal cardiac datasets. To the best of our knowledge, this is the first unsupervised deep-learning-based method that can reconstruct the continuous variation of dynamic MRI sequences with high spatial resolution.

preprint2020arXiv

A unifying representer theorem for inverse problems and machine learning

The standard approach for dealing with the ill-posedness of the training problem in machine learning and/or the reconstruction of a signal from a limited number of measurements is regularization. The method is applicable whenever the problem is formulated as an optimization task. The standard strategy consists in augmenting the original cost functional by an energy that penalizes solutions with undesirable behavior. The effect of regularization is very well understood when the penalty involves a Hilbertian norm. Another popular configuration is the use of an $\ell_1$-norm (or some variant thereof) that favors sparse solutions. In this paper, we propose a higher-level formulation of regularization within the context of Banach spaces. We present a general representer theorem that characterizes the solutions of a remarkably broad class of optimization problems. We then use our theorem to retrieve a number of known results in the literature---e.g., the celebrated representer theorem of machine leaning for RKHS, Tikhonov regularization, representer theorems for sparsity promoting functionals, the recovery of spikes---as well as a few new ones.

preprint2020arXiv

Dictionary Learning for Two-Dimensional Kendall Shapes

We propose a novel sparse dictionary learning method for planar shapes in the sense of Kendall, namely configurations of landmarks in the plane considered up to similitudes. Our shape dictionary method provides a good trade-off between algorithmic simplicity and faithfulness with respect to the nonlinear geometric structure of Kendall's shape space. Remarkably, it boils down to a classical dictionary learning formulation modified using complex weights. Existing dictionary learning methods extended to nonlinear spaces either map the manifold to a reproducing kernel Hilbert space or to a tangent space. The first approach is unnecessarily heavy in the case of Kendall's shape space and causes the geometrical understanding of shapes to be lost, while the second one induces distortions and theoretical complexity. Our approach does not suffer from these drawbacks. Instead of embedding the shape space into a linear space, we rely on the hyperplane of centered configurations, including pre-shapes from which shapes are defined as rotation orbits. In this linear space, the dictionary atoms are scaled and rotated using complex weights before summation. Furthermore, our formulation is more general than Kendall's original one: it applies to discretely-defined configurations of landmarks as well as continuously-defined interpolating curves. We implemented our algorithm by adapting the method of optimal directions combined to a Cholesky-optimized order recursive matching pursuit. An interesting feature of our shape dictionary is that it produces visually realistic atoms, while guaranteeing reconstruction accuracy. Its efficiency can mostly be attributed to a clear formulation of the framework with complex numbers. We illustrate the strong potential of our approach for the characterization of datasets of shapes up to similitudes and the analysis of patterns in deforming 2D shapes.

preprint2020arXiv

Duality Mapping for Schatten Matrix Norms

In this paper, we fully characterize the duality mapping over the space of matrices that are equipped with Schatten norms. Our approach is based on the analysis of the saturation of the Hölder inequality for Schatten norms. We prove in our main result that, for $p\in (1,\infty)$, the duality mapping over the space of real-valued matrices with Schatten-$p$ norm is a continuous and single-valued function and provide an explicit form for its computation. For the special case $p = 1$, the mapping is set-valued; by adding a rank constraint, we show that it can be reduced to a Borel-measurable single-valued function for which we also provide a closed-form expression.

preprint2020arXiv

Fast Rotational Sparse Coding

We propose an algorithm for rotational sparse coding along with an efficient implementation using steerability. Sparse coding (also called dictionary learning) is an important technique in image processing, useful in inverse problems, compression, and analysis; however, the usual formulation fails to capture an important aspect of the structure of images: images are formed from building blocks, e.g., edges, lines, or points, that appear at different locations, orientations, and scales. The sparse coding problem can be reformulated to explicitly account for these transforms, at the cost of increased computation. In this work, we propose an algorithm for a rotational version of sparse coding that is based on K-SVD with additional rotation operations. We then propose a method to accelerate these rotations by learning the dictionary in a steerable basis. Our experiments on patch coding and texture classification demonstrate that the proposed algorithm is fast enough for practical use and compares favorably to standard sparse coding.

preprint2020arXiv

Generating Sparse Stochastic Processes Using Matched Splines

We provide an algorithm to generate trajectories of sparse stochastic processes that are solutions of linear ordinary differential equations driven by Lévy white noises. A recent paper showed that these processes are limits in law of generalized compound-Poisson processes. Based on this result, we derive an off-the-grid algorithm that generates arbitrarily close approximations of the target process. Our method relies on a B-spline representation of generalized compound-Poisson processes. We illustrate numerically the validity of our approach.

preprint2020arXiv

Joint Angular Refinement and Reconstruction for Single-Particle Cryo-EM

Single-particle cryo-electron microscopy (cryo-EM) reconstructs the three-dimensional (3D) structure of bio-molecules from a large set of 2D projection images with random and unknown orientations. A crucial step in the single-particle cryo-EM pipeline is 3D refinement, which resolves a high-resolution 3D structure from an initial approximate volume by refining the estimation of the orientation of each projection. In this work, we propose a new approach that refines the projection angles on the continuum. We formulate the optimization problem over the density map and the orientations jointly. The density map is updated using the efficient alternating-direction method of multipliers, while the orientations are updated through a semi-coordinate-wise gradient descent for which we provide an explicit derivation of the gradient. Our method eliminates the requirement for a fine discretization of the orientation space and does away with the classical but computationally expensive template-matching step. Numerical results demonstrate the feasibility and performance of our approach compared to several baselines.

preprint2019arXiv

Three-Dimensional Optical Diffraction Tomography with Lippmann-Schwinger Model

A broad class of imaging modalities involve the resolution of an inverse-scattering problem. Among them, three-dimensional optical diffraction tomography (ODT) comes with its own challenges. These include a limited range of views, a large size of the sample with respect to the illumination wavelength, and optical aberrations that are inherent to the system itself. In this work, we present an accurate and efficient implementation of the forward model. It relies on the exact (nonlinear) Lippmann-Schwinger equation. We address several crucial issues such as the discretization of the Green function, the computation of the far field, and the estimation of the incident field. We then deploy this model in a regularized variational-reconstruction framework and show on both simulated and real data that it leads to substantially better reconstructions than the approximate models that are traditionally used in ODT.