Source author record

Gabriel Peyré

Gabriel Peyré appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning Information Theory math.IT math.ST Statistics Theory math.NA Computer Vision math.AP Applications Artificial Intelligence Computation Discrete Mathematics math.PR quant-ph

Catalog footprint

What is connected

56works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Geometry-Aware Discretization Error of Diffusion Models

Practical diffusion sampling is a numerical approximation problem: under a fixed inference budget, one must simulate a reverse-time ODE or SDE using only a limited number of denoising steps, so discretization error is often the dominant source of error. Existing non-asymptotic analyses provide convergence guarantees, but are typically too loose and too insensitive to diffusion parameters to guide practical design: broad families of schedules receive the same rates, which depend on coarse worst-case quantities such as the dimension or the drift Lipschitz constant. We take a less ambitious but more informative route. In the exact-score setting, we derive first-order asymptotic expansions of the Euler-Maruyama weak and Fréchet discretization errors. These formulas hold for general smooth reverse diffusions and become fully explicit under Gaussian data. They show how discretization error adapts to the geometry of the data through the covariance spectrum, and how this geometry interacts with key diffusion parameters, including the diffusion schedules and the diffusion-term coefficient. This yields tractable objectives for geometry-aware parameter optimization. Finally, we show that the qualitative predictions of the Gaussian formulas remain robust across diffusion sampling problems with different geometries, including image generation on different datasets and image posterior sampling.

preprint2026arXiv

On the global convergence of gradient descent for wide shallow models with bounded nonlinearities

A surprising phenomenon in the training of neural networks is the ability of gradient descent to find global minimizers of the training loss despite its non-convexity. Following earlier works, we investigate this behavior for wide shallow networks. Existing results essentially cover the case of ReLU activations and the case of sigmoid activations with scalar output weights. We study a large class of models that includes multi-head attention layers and two-layer sigmoid networks with vector output weights. Building upon [Chizat and Bach, 2018], we prove that all non-global minimizers of the training loss are unstable under gradient descent dynamics. Thus, when the initial distribution of the parameters has full support (which includes the popular Gaussian case), and in the many hidden neurons or attention heads limit, continuous-time gradient descent can only converge to global minimizers. Establishing the instability of non-global minimizers corresponds to the construction of an ``escaping active set'' -- we complete the proof of [Chizat and Bach, 2018] to construct this set for models with bounded nonlinearities and scalar output weights. We also extend this construction to new cases for models with vector output weights. Finally, we show the well-posedness and the stability with respect to discretization of the mean field training dynamic for sub-Gaussian initializations.

preprint2026arXiv

Training Infinitely Deep and Wide Transformers

Transformers have become the dominant architecture in modern machine learning, yet the theoretical understanding of their training dynamics remains limited. This paper develops a rigorous mathematical framework for analyzing gradient-based training of transformers in the mean-field regime, where both the depth (number of layers) and width (number of attention heads) tend to infinity. While ResNet training can be understood as controlling a neural ODE, transformer training corresponds to controlling a neural PDE, due to the coupling of multiple token distributions through the attention mechanism. Our mean-field model features two types of measure representations: token distributions evolving through layers and attention parameters at each layer. We establish well-posedness of the forward pass through infinitely deep transformers, characterizing token evolution via flow maps that satisfy ODEs in function spaces. Using adjoint sensitivity analysis, we derive an explicit formula for the conditional Wasserstein gradient of the training risk, involving adjoint variables governed by backward ODEs. We prove the existence and uniqueness of gradient flow curves in the conditional Wasserstein metric space, establishing a rigorous foundation for gradient-based transformer training. A key technical contribution is providing necessary and sufficient conditions for injectivity of the Neural Tangent Kernel (NTK) for attention mechanisms: we show that NTK injectivity is equivalent to linear independence of log-sum-exp functions modulo affine functions, a condition satisfied by diverse token distributions, including discrete distributions, uniform distributions, and Gaussian mixtures. Under this NTK injectivity assumption, we prove that gradient flow converges to global minima when the initial loss is sufficiently small, eliminating spurious local minima from the optimization landscape.

preprint2023arXiv

Sinkhorn Divergences for Unbalanced Optimal Transport

Optimal transport induces the Earth Mover's (Wasserstein) distance between probability distributions, a geometric divergence that is relevant to a wide range of problems. Over the last decade, two relaxations of optimal transport have been studied in depth: unbalanced transport, which is robust to the presence of outliers and can be used when distributions don't have the same total mass; entropy-regularized transport, which is robust to sampling noise and lends itself to fast computations using the Sinkhorn algorithm. This paper combines both lines of work to put robust optimal transport on solid ground. Our main contribution is a generalization of the Sinkhorn algorithm to unbalanced transport: our method alternates between the standard Sinkhorn updates and the pointwise application of a contractive function. This implies that entropic transport solvers on grid images, point clouds and sampled distributions can all be modified easily to support unbalanced transport, with a proof of linear convergence that holds in all settings. We then show how to use this method to define pseudo-distances on the full space of positive measures that satisfy key geometric axioms: (unbalanced) Sinkhorn divergences are differentiable, positive, definite, convex, statistically robust and avoid any "entropic bias" towards a shrinkage of the measures' supports.

preprint2023arXiv

The Unbalanced Gromov Wasserstein Distance: Conic Formulation and Relaxation

Comparing metric measure spaces (i.e. a metric space endowed with aprobability distribution) is at the heart of many machine learning problems. The most popular distance between such metric measure spaces is theGromov-Wasserstein (GW) distance, which is the solution of a quadratic assignment problem. The GW distance is however limited to the comparison of metric measure spaces endowed with a probability distribution. To alleviate this issue, we introduce two Unbalanced Gromov-Wasserstein formulations: a distance and a more tractable upper-bounding relaxation.They both allow the comparison of metric spaces equipped with arbitrary positive measures up to isometries. The first formulation is a positive and definite divergence based on a relaxation of the mass conservation constraint using a novel type of quadratically-homogeneous divergence. This divergence works hand in hand with the entropic regularization approach which is popular to solve large scale optimal transport problems. We show that the underlying non-convex optimization problem can be efficiently tackled using a highly parallelizable and GPU-friendly iterative scheme. The second formulation is a distance between mm-spaces up to isometries based on a conic lifting. Lastly, we provide numerical experiments onsynthetic examples and domain adaptation data with a Positive-Unlabeled learning task to highlight the salient features of the unbalanced divergence and its potential applications in ML.

preprint2023arXiv

Unbalanced Optimal Transport, from Theory to Numerics

Optimal Transport (OT) has recently emerged as a central tool in data sciences to compare in a geometrically faithful way point clouds and more generally probability distributions. The wide adoption of OT into existing data analysis and machine learning pipelines is however plagued by several shortcomings. This includes its lack of robustness to outliers, its high computational costs, the need for a large number of samples in high dimension and the difficulty to handle data in distinct spaces. In this review, we detail several recently proposed approaches to mitigate these issues. We insist in particular on unbalanced OT, which compares arbitrary positive measures, not restricted to probability distributions (i.e. their total mass can vary). This generalization of OT makes it robust to outliers and missing data. The second workhorse of modern computational OT is entropic regularization, which leads to scalable algorithms while lowering the sample complexity in high dimension. The last point presented in this review is the Gromov-Wasserstein (GW) distance, which extends OT to cope with distributions belonging to different metric spaces. The main motivation for this review is to explain how unbalanced OT, entropic regularization and GW can work hand-in-hand to turn OT into efficient geometric loss functions for data sciences.

preprint2022arXiv

Do Residual Neural Networks discretize Neural Ordinary Differential Equations?

Neural Ordinary Differential Equations (Neural ODEs) are the continuous analog of Residual Neural Networks (ResNets). We investigate whether the discrete dynamics defined by a ResNet are close to the continuous one of a Neural ODE. We first quantify the distance between the ResNet's hidden state trajectory and the solution of its corresponding Neural ODE. Our bound is tight and, on the negative side, does not go to 0 with depth N if the residual functions are not smooth with depth. On the positive side, we show that this smoothness is preserved by gradient descent for a ResNet with linear residual functions and small enough initial loss. It ensures an implicit regularization towards a limit Neural ODE at rate 1 over N, uniformly with depth and optimization time. As a byproduct of our analysis, we consider the use of a memory-free discrete adjoint method to train a ResNet by recovering the activations on the fly through a backward pass of the network, and show that this method theoretically succeeds at large depth if the residual functions are Lipschitz with the input. We then show that Heun's method, a second order ODE integration scheme, allows for better gradient estimation with the adjoint method when the residual functions are smooth with depth. We experimentally validate that our adjoint method succeeds at large depth, and that Heun method needs fewer layers to succeed. We finally use the adjoint method successfully for fine-tuning very deep ResNets without memory consumption in the residual layers.

preprint2022arXiv

Fast and accurate optimization on the orthogonal manifold without retraction

We consider the problem of minimizing a function over the manifold of orthogonal matrices. The majority of algorithms for this problem compute a direction in the tangent space, and then use a retraction to move in that direction while staying on the manifold. Unfortunately, the numerical computation of retractions on the orthogonal manifold always involves some expensive linear algebra operation, such as matrix inversion, exponential or square-root. These operations quickly become expensive as the dimension of the matrices grows. To bypass this limitation, we propose the landing algorithm which does not use retractions. The algorithm is not constrained to stay on the manifold but its evolution is driven by a potential energy which progressively attracts it towards the manifold. One iteration of the landing algorithm only involves matrix multiplications, which makes it cheap compared to its retraction counterparts. We provide an analysis of the convergence of the algorithm, and demonstrate its promises on large-scale and deep learning problems, where it is faster and less prone to numerical errors than retraction-based methods.

preprint2022arXiv

Faster Unbalanced Optimal Transport: Translation invariant Sinkhorn and 1-D Frank-Wolfe

Unbalanced optimal transport (UOT) extends optimal transport (OT) to take into account mass variations to compare distributions. This is crucial to make OT successful in ML applications, making it robust to data normalization and outliers. The baseline algorithm is Sinkhorn, but its convergence speed might be significantly slower for UOT than for OT. In this work, we identify the cause for this deficiency, namely the lack of a global normalization of the iterates, which equivalently corresponds to a translation of the dual OT potentials. Our first contribution leverages this idea to develop a provably accelerated Sinkhorn algorithm (coined 'translation invariant Sinkhorn') for UOT, bridging the computational gap with OT. Our second contribution focusses on 1-D UOT and proposes a Frank-Wolfe solver applied to this translation invariant formulation. The linear oracle of each steps amounts to solving a 1-D OT problems, resulting in a linear time complexity per iteration. Our last contribution extends this method to the computation of UOT barycenter of 1-D measures. Numerical simulations showcase the convergence speed improvement brought by these three approaches.

preprint2022arXiv

Sinkformers: Transformers with Doubly Stochastic Attention

Attention based models such as Transformers involve pairwise interactions between data points, modeled with a learnable attention matrix. Importantly, this attention matrix is normalized with the SoftMax operator, which makes it row-wise stochastic. In this paper, we propose instead to use Sinkhorn's algorithm to make attention matrices doubly stochastic. We call the resulting model a Sinkformer. We show that the row-wise stochastic attention matrices in classical Transformers get close to doubly stochastic matrices as the number of epochs increases, justifying the use of Sinkhorn normalization as an informative prior. On the theoretical side, we show that, unlike the SoftMax operation, this normalization makes it possible to understand the iterations of self-attention modules as a discretized gradient-flow for the Wasserstein metric. We also show in the infinite number of samples limit that, when rescaling both attention matrices and depth, Sinkformers operate a heat diffusion. On the experimental side, we show that Sinkformers enhance model accuracy in vision and natural language processing tasks. In particular, on 3D shapes classification, Sinkformers lead to a significant improvement.

preprint2022arXiv

Unsupervised Ground Metric Learning using Wasserstein Singular Vectors

Defining meaningful distances between samples in a dataset is a fundamental problem in machine learning. Optimal Transport (OT) lifts a distance between features (the "ground metric") to a geometrically meaningful distance between samples. However, there is usually no straightforward choice of ground metric. Supervised ground metric learning approaches exist but require labeled data. In absence of labels, only ad-hoc ground metrics remain. Unsupervised ground metric learning is thus a fundamental problem to enable data-driven applications of OT. In this paper, we propose for the first time a canonical answer by simultaneously computing an OT distance between samples and between features of a dataset. These distance matrices emerge naturally as positive singular vectors of the function mapping ground metrics to OT distances. We provide criteria to ensure the existence and uniqueness of these singular vectors. We then introduce scalable computational methods to approximate them in high-dimensional settings, using stochastic approximation and entropic regularization. Finally, we showcase Wasserstein Singular Vectors on a single-cell RNA-sequencing dataset.

preprint2021arXiv

Low-Rank Sinkhorn Factorization

Several recent applications of optimal transport (OT) theory to machine learning have relied on regularization, notably entropy and the Sinkhorn algorithm. Because matrix-vector products are pervasive in the Sinkhorn algorithm, several works have proposed to \textit{approximate} kernel matrices appearing in its iterations using low-rank factors. Another route lies instead in imposing low-rank constraints on the feasible set of couplings considered in OT problems, with no approximations on cost nor kernel matrices. This route was first explored by Forrow et al., 2018, who proposed an algorithm tailored for the squared Euclidean ground cost, using a proxy objective that can be solved through the machinery of regularized 2-Wasserstein barycenters. Building on this, we introduce in this work a generic approach that aims at solving, in full generality, the OT problem under low-rank constraints with arbitrary costs. Our algorithm relies on an explicit factorization of low rank couplings as a product of \textit{sub-coupling} factors linked by a common marginal; similar to an NMF approach, we alternatively updates these factors. We prove the non-asymptotic stationary convergence of this algorithm and illustrate its efficiency on benchmark experiments.

preprint2020arXiv

Computational Optimal Transport

Optimal transport (OT) theory can be informally described using the words of the French mathematician Gaspard Monge (1746-1818): A worker with a shovel in hand has to move a large pile of sand lying on a construction site. The goal of the worker is to erect with all that sand a target pile with a prescribed shape (for example, that of a giant sand castle). Naturally, the worker wishes to minimize her total effort, quantified for instance as the total distance or time spent carrying shovelfuls of sand. Mathematicians interested in OT cast that problem as that of comparing two probability distributions, two different piles of sand of the same volume. They consider all of the many possible ways to morph, transport or reshape the first pile into the second, and associate a "global" cost to every such transport, using the "local" consideration of how much it costs to move a grain of sand from one place to another. Recent years have witnessed the spread of OT in several fields, thanks to the emergence of approximate solvers that can scale to sizes and dimensions that are relevant to data sciences. Thanks to this newfound scalability, OT is being increasingly used to unlock various problems in imaging sciences (such as color or texture processing), computer vision and graphics (for shape manipulation) or machine learning (for regression, classification and density fitting). This short book reviews OT with a bias toward numerical methods and their applications in data sciences, and sheds lights on the theoretical properties of OT that make it particularly useful for some of these applications.

preprint2020arXiv

Online Sinkhorn: Optimal Transport distances from sample streams

Optimal Transport (OT) distances are now routinely used as loss functions in ML tasks. Yet, computing OT distances between arbitrary (i.e. not necessarily discrete) probability distributions remains an open problem. This paper introduces a new online estimator of entropy-regularized OT distances between two such arbitrary distributions. It uses streams of samples from both distributions to iteratively enrich a non-parametric representation of the transportation plan. Compared to the classic Sinkhorn algorithm, our method leverages new samples at each iteration, which enables a consistent estimation of the true regularized OT distance. We provide a theoretical analysis of the convergence of the online Sinkhorn algorithm, showing a nearly-O(1/n) asymptotic sample complexity for the iterate sequence. We validate our method on synthetic 1D to 10D data and on real 3D shape data.

preprint2020arXiv

Super-efficiency of automatic differentiation for functions defined as a minimum

In min-min optimization or max-min optimization, one has to compute the gradient of a function defined as a minimum. In most cases, the minimum has no closed-form, and an approximation is obtained via an iterative algorithm. There are two usual ways of estimating the gradient of the function: using either an analytic formula obtained by assuming exactness of the approximation, or automatic differentiation through the algorithm. In this paper, we study the asymptotic error made by these estimators as a function of the optimization error. We find that the error of the automatic estimator is close to the square of the error of the analytic estimator, reflecting a super-efficiency phenomenon. The convergence of the automatic estimator greatly depends on the convergence of the Jacobian of the algorithm. We analyze it for gradient descent and stochastic gradient descent and derive convergence rates for the estimators in these cases. Our analysis is backed by numerical experiments on toy problems and on Wasserstein barycenter computation. Finally, we discuss the computational complexity of these estimators and give practical guidelines to chose between them.

preprint2020arXiv

The geometry of off-the-grid compressed sensing

This paper presents a sharp geometric analysis of the recovery performance of sparse regularization. More specifically, we analyze the BLASSO method which estimates a sparse measure (sum of Dirac masses) from randomized sub-sampled measurements. This is a "continuous", often called off-the-grid, extension of the compressed sensing problem, where the $\ell^1$ norm is replaced by the total variation of measures. This extension is appealing from a numerical perspective because it avoids to discretize the the space by some grid. But more importantly, it makes explicit the geometry of the problem since the positions of the Diracs can now freely move over the parameter space. On a methodological level, our contribution is to propose the Fisher geodesic distance on this parameter space as the canonical metric to analyze super-resolution in a way which is invariant to reparameterization of this space. Switching to the Fisher metric allows us to take into account measurement operators which are not translation invariant, which is crucial for applications such as Laplace inversion in imaging, Gaussian mixtures estimation and training of multilayer perceptrons with one hidden layer. On a theoretical level, our main contribution shows that if the Fisher distance between spikes is larger than a Rayleigh separation constant, then the BLASSO recovers in a stable way a stream of Diracs, provided that the number of measurements is proportional (up to log factors) to the number of Diracs. We measure the stability using an optimal transport distance constructed on top of the Fisher geodesic distance. Our result is (up to log factor) sharp and does not require any randomness assumption on the amplitudes of the underlying measure. Our proof technique relies on an infinite-dimensional extension of the so-called "golfing scheme" which operates over the space of measures and is of general interest.

preprint2020arXiv

Wasserstein Control of Mirror Langevin Monte Carlo

Discretized Langevin diffusions are efficient Monte Carlo methods for sampling from high dimensional target densities that are log-Lipschitz-smooth and (strongly) log-concave. In particular, the Euclidean Langevin Monte Carlo sampling algorithm has received much attention lately, leading to a detailed understanding of its non-asymptotic convergence properties and of the role that smoothness and log-concavity play in the convergence rate. Distributions that do not possess these regularity properties can be addressed by considering a Riemannian Langevin diffusion with a metric capturing the local geometry of the log-density. However, the Monte Carlo algorithms derived from discretizations of such Riemannian Langevin diffusions are notoriously difficult to analyze. In this paper, we consider Langevin diffusions on a Hessian-type manifold and study a discretization that is closely related to the mirror-descent scheme. We establish for the first time a non-asymptotic upper-bound on the sampling error of the resulting Hessian Riemannian Langevin Monte Carlo algorithm. This bound is measured according to a Wasserstein distance induced by a Riemannian metric ground cost capturing the Hessian structure and closely related to a self-concordance-like condition. The upper-bound implies, for instance, that the iterates contract toward a Wasserstein ball around the target density whose radius is made explicit. Our theory recovers existing Euclidean results and can cope with a wide variety of Hessian metrics related to highly non-flat geometries.

preprint2016arXiv

A Multi-step Inertial Forward--Backward Splitting Method for Non-convex Optimization

In this paper, we propose a multi-step inertial Forward--Backward splitting algorithm for minimizing the sum of two non-necessarily convex functions, one of which is proper lower semi-continuous while the other is differentiable with a Lipschitz continuous gradient. We first prove global convergence of the scheme with the help of the Kurdyka-Łojasiewicz property. Then, when the non-smooth part is also partly smooth relative to a smooth submanifold, we establish finite identification of the latter and provide sharp local linear convergence analysis. The proposed method is illustrated on a few problems arising from statistics and machine learning.

preprint2016arXiv

Geometric properties of solutions to the total variation denoising problem

This article studies the denoising performance of total variation (TV) image regularization. More precisely, we study geometrical properties of the solution to the so-called Rudin-Osher-Fatemi total variation denoising method. The first contribution of this paper is a precise mathematical definition of the "extended support" (associated to the noise-free image) of TV denoising. It is intuitively the region which is unstable and will suffer from the staircasing effect. We highlight in several practical cases, such as the indicator of convex sets, that this region can be determined explicitly. Our second and main contribution is a proof that the TV denoising method indeed restores an image which is exactly constant outside a small tube surrounding the extended support. The radius of this tube shrinks toward zero as the noise level vanishes, and are able to determine, in some cases, an upper bound on the convergence rate. For indicators of so-called "calibrable" sets (such as disks or properly eroded squares), this extended support matches the edges, so that discontinuities produced by TV denoising cluster tightly around the edges. In contrast, for indicators of more general shapes or for complicated images, this extended support can be larger. Beside these main results, our paper also proves several intermediate results about fine properties of TV regularization, in particular for indicators of calibrable and convex sets, which are of independent interest.

preprint2016arXiv

Sparse Support Recovery with Non-smooth Loss Functions

In this paper, we study the support recovery guarantees of underdetermined sparse regression using the $\ell_1$-norm as a regularizer and a non-smooth loss function for data fidelity. More precisely, we focus in detail on the cases of $\ell_1$ and $\ell_\infty$ losses, and contrast them with the usual $\ell_2$ loss. While these losses are routinely used to account for either sparse ($\ell_1$ loss) or uniform ($\ell_\infty$ loss) noise models, a theoretical analysis of their performance is still lacking. In this article, we extend the existing theory from the smooth $\ell_2$ case to these non-smooth cases. We derive a sharp condition which ensures that the support of the vector to recover is stable to small additive noise in the observations, as long as the loss constraint size is tuned proportionally to the noise level. A distinctive feature of our theory is that it also explains what happens when the support is unstable. While the support is not stable anymore, we identify an "extended support" and show that this extended support is stable to small additive noise. To exemplify the usefulness of our theory, we give a detailed numerical analysis of the support stability/instability of compressed sensing recovery with these different losses. This highlights different parameter regimes, ranging from total support stability to progressively increasing support instability.

preprint2016arXiv

Stochastic Optimization for Large-scale Optimal Transport

Optimal transport (OT) defines a powerful framework to compare probability distributions in a geometrically faithful way. However, the practical impact of OT is still limited because of its computational burden. We propose a new class of stochastic optimization algorithms to cope with large-scale problems routinely encountered in machine learning applications. These methods are able to manipulate arbitrary distributions (either discrete or continuous) by simply requiring to be able to draw samples from them, which is the typical setup in high-dimensional learning problems. This alleviates the need to discretize these densities, while giving access to provably convergent methods that output the correct distance without discretization error. These algorithms rely on two main ideas: (a) the dual OT problem can be re-cast as the maximization of an expectation ; (b) entropic regularization of the primal OT problem results in a smooth dual optimization optimization which can be addressed with algorithms that have a provably faster convergence. We instantiate these ideas in three different setups: (i) when comparing a discrete distribution to another, we show that incremental stochastic optimization schemes can beat Sinkhorn's algorithm, the current state-of-the-art finite dimensional OT solver; (ii) when comparing a discrete distribution to a continuous density, a semi-discrete reformulation of the dual program is amenable to averaged stochastic gradient descent, leading to better performance than approximately solving the problem by discretization ; (iii) when dealing with two continuous densities, we propose a stochastic gradient descent over a reproducing kernel Hilbert space (RKHS). This is currently the only known method to solve this problem, apart from computing OT on finite samples. We backup these claims on a set of discrete, semi-discrete and continuous benchmark problems.

preprint2016arXiv

Support Recovery for Sparse Deconvolution of Positive Measures

We study sparse spikes deconvolution over the space of Radon measures on $\mathbb{R}$ or $\mathbb{T}$ when the input measure is a finite sum of positive Dirac masses using the BLASSO convex program. We focus on the recovery properties of the support and the amplitudes of the initial measure in the presence of noise as a function of the minimum separation $t$ of the input measure (the minimum distance between two spikes). We show that when ${w}/λ$, ${w}/t^{2N-1}$ and $λ/t^{2N-1}$ are small enough (where $λ$ is the regularization parameter, $w$ the noise and $N$ the number of spikes), which corresponds roughly to a sufficient signal-to-noise ratio and a noise level small enough with respect to the minimum separation, there exists a unique solution to the BLASSO program with exactly the same number of spikes as the original measure. We show that the amplitudes and positions of the spikes of the solution both converge toward those of the input measure when the noise and the regularization parameter drops to zero faster than $t^{2N-1}$.

preprint2016arXiv

The Degrees of Freedom of Partly Smooth Regularizers

In this paper, we are concerned with regularized regression problems where the prior regularizer is a proper lower semicontinuous and convex function which is also partly smooth relative to a Riemannian submanifold. This encompasses as special cases several known penalties such as the Lasso ($\ell^1$-norm), the group Lasso ($\ell^1-\ell^2$-norm), the $\ell^\infty$-norm, and the nuclear norm. This also includes so-called analysis-type priors, i.e. composition of the previously mentioned penalties with linear operators, typical examples being the total variation or fused Lasso penalties.We study the sensitivity of any regularized minimizer to perturbations of the observations and provide its precise local parameterization.Our main sensitivity analysis result shows that the predictor moves locally stably along the same active submanifold as the observations undergo small perturbations. This local stability is a consequence of the smoothness of the regularizer when restricted to the active submanifold, which in turn plays a pivotal role to get a closed form expression for the variations of the predictor w.r.t. observations. We also show that, for a variety of regularizers, including polyhedral ones or the group Lasso and its analysis counterpart, this divergence formula holds Lebesgue almost everywhere.When the perturbation is random (with an appropriate continuous distribution), this allows us to derive an unbiased estimator of the degrees of freedom and of the risk of the estimator prediction.Our results hold true without requiring the design matrix to be full column rank.They generalize those already known in the literature such as the Lasso problem, the general Lasso problem (analysis $\ell^1$-penalty), or the group Lasso where existing results for the latter assume that the design is full column rank.

preprint2015arXiv

A Smoothed Dual Approach for Variational Wasserstein Problems

Variational problems that involve Wasserstein distances have been recently proposed to summarize and learn from probability measures. Despite being conceptually simple, such problems are computationally challenging because they involve minimizing over quantities (Wasserstein distances) that are themselves hard to compute. We show that the dual formulation of Wasserstein variational problems introduced recently by Carlier et al. (2014) can be regularized using an entropic smoothing, which leads to smooth, differentiable, convex optimization problems that are simpler to implement and numerically more stable. We illustrate the versatility of this approach by applying it to the computation of Wasserstein barycenters and gradient flows of spacial regularization functionals.

preprint2015arXiv

Activity Identification and Local Linear Convergence of Douglas--Rachford/ADMM under Partial Smoothness

Convex optimization has become ubiquitous in most quantitative disciplines of science, including variational image processing. Proximal splitting algorithms are becoming popular to solve such structured convex optimization problems. Within this class of algorithms, Douglas--Rachford (DR) and alternating direction method of multipliers (ADMM) are designed to minimize the sum of two proper lower semi-continuous convex functions whose proximity operators are easy to compute. The goal of this work is to understand the local convergence behaviour of DR (resp. ADMM) when the involved functions (resp. their Legendre-Fenchel conjugates) are moreover partly smooth. More precisely, when both of the two functions (resp. their conjugates) are partly smooth relative to their respective manifolds, we show that DR (resp. ADMM) identifies these manifolds in finite time. Moreover, when these manifolds are affine or linear, we prove that DR/ADMM is locally linearly convergent. When $J$ and $G$ are locally polyhedral, we show that the optimal convergence radius is given in terms of the cosine of the Friedrichs angle between the tangent spaces of the identified manifolds. This is illustrated by several concrete examples and supported by numerical experiments.

preprint2015arXiv

An Interpolating Distance between Optimal Transport and Fisher-Rao

This paper defines a new transport metric over the space of non-negative measures. This metric interpolates between the quadratic Wasserstein and the Fisher-Rao metrics and generalizes optimal transport to measures with different masses. It is defined as a generalization of the dynamical formulation of optimal transport of Benamou and Brenier, by introducing a source term in the continuity equation. The influence of this source term is measured using the Fisher-Rao metric, and is averaged with the transportation term. This gives rise to a convex variational problem defining our metric. Our first contribution is a proof of the existence of geodesics (i.e. solutions to this variational problem). We then show that (generalized) optimal transport and Fisher-Rao metrics are obtained as limiting cases of our metric. Our last theoretical contribution is a proof that geodesics between mixtures of sufficiently close Diracs are made of translating mixtures of Diracs. Lastly, we propose a numerical scheme making use of first order proximal splitting methods and we show an application of this new distance to image interpolation.

preprint2015arXiv

Biologically Inspired Dynamic Textures for Probing Motion Perception

Perception is often described as a predictive process based on an optimal inference with respect to a generative model. We study here the principled construction of a generative model specifically crafted to probe motion perception. In that context, we first provide an axiomatic, biologically-driven derivation of the model. This model synthesizes random dynamic textures which are defined by stationary Gaussian distributions obtained by the random aggregation of warped patterns. Importantly, we show that this model can equivalently be described as a stochastic partial differential equation. Using this characterization of motion in images, it allows us to recast motion-energy models into a principled Bayesian inference framework. Finally, we apply these textures in order to psychophysically probe speed perception in humans. In this framework, while the likelihood is derived from the generative model, the prior is estimated from the observed results and accounts for the perceptual bias in a principled fashion.

preprint2015arXiv

Convergence Rates with Inexact Non-expansive Operators

In this paper, we present a convergence rate analysis for the inexact Krasnosel'skii-Mann iteration built from nonexpansive operators. Our results include two main parts: we first establish global pointwise and ergodic iteration-complexity bounds, and then, under a metric subregularity assumption, we establish local linear convergence for the distance of the iterates to the set of fixed points. The obtained iteration-complexity result can be applied to analyze the convergence rate of various monotone operator splitting methods in the literature, including the Forward-Backward, the Generalized Forward-Backward, Douglas-Rachford, alternating direction method of multipliers (ADMM) and Primal-Dual splitting methods. For these methods, we also develop easily verifiable termination criteria for finding an approximate solution, which can be seen as a generalization of the termination criterion for the classical gradient descent method. We finally develop a parallel analysis for the non-stationary Krasnosel'skii-Mann iteration. The usefulness of our results is illustrated by applying them to a large class of structured monotone inclusion and convex optimization problems. Experiments on some large scale inverse problems in signal and image processing problems are shown.

preprint2015arXiv

Entropic Wasserstein Gradient Flows

This article details a novel numerical scheme to approximate gradient flows for optimal transport (i.e. Wasserstein) metrics. These flows have proved useful to tackle theoretically and numerically non-linear diffusion equations that model for instance porous media or crowd evolutions. These gradient flows define a suitable notion of weak solutions for these evolutions and they can be approximated in a stable way using discrete flows. These discrete flows are implicit Euler time stepping according to the Wasserstein metric. A bottleneck of these approaches is the high computational load induced by the resolution of each step. Indeed, this corresponds to the resolution of a convex optimization problem involving a Wasserstein distance to the previous iterate. Following several recent works on the approximation of Wasserstein distances, we consider a discrete flow induced by an entropic regularization of the transportation coupling. This entropic regularization allows one to trade the initial Wasserstein fidelity term for a Kulback-Leibler divergence, which is easier to deal with numerically. We show how KL proximal schemes, and in particular Dykstra's algorithm, can be used to compute each step of the regularized flow. The resulting algorithm is both fast, parallelizable and versatile, because it only requires multiplications by a Gibbs kernel. On Euclidean domains discretized on an uniform grid, this corresponds to a linear filtering (for instance a Gaussian filtering when $c$ is the squared Euclidean distance) which can be computed in nearly linear time. On more general domains, such as (possibly non-convex) shapes or on manifolds discretized by a triangular mesh, following a recently proposed numerical scheme for optimal transport, this Gibbs kernel multiplication is approximated by a short-time heat diffusion.

preprint2015arXiv

Fast Optimal Transport Averaging of Neuroimaging Data

Knowing how the Human brain is anatomically and functionally organized at the level of a group of healthy individuals or patients is the primary goal of neuroimaging research. Yet computing an average of brain imaging data defined over a voxel grid or a triangulation remains a challenge. Data are large, the geometry of the brain is complex and the between subjects variability leads to spatially or temporally non-overlapping effects of interest. To address the problem of variability, data are commonly smoothed before group linear averaging. In this work we build on ideas originally introduced by Kantorovich to propose a new algorithm that can average efficiently non-normalized data defined over arbitrary discrete domains using transportation metrics. We show how Kantorovich means can be linked to Wasserstein barycenters in order to take advantage of an entropic smoothing approach. It leads to a smooth convex optimization problem and an algorithm with strong convergence guarantees. We illustrate the versatility of this tool and its empirical behavior on functional neuroimaging data, functional MRI and magnetoencephalography (MEG) source estimates, defined on voxel grids and triangulations of the folded cortical surface.

preprint2015arXiv

Local Linear Convergence of Forward-Backward under Partial Smoothness

In this paper, we consider the Forward--Backward proximal splitting algorithm to minimize the sum of two proper convex functions, one of which having a Lipschitz continuous gradient and the other being partly smooth relative to an active manifold $\mathcal{M}$. We propose a generic framework under which we show that the Forward--Backward (i) correctly identifies the active manifold $\mathcal{M}$ in a finite number of iterations, and then (ii) enters a local linear convergence regime that we characterize precisely. This gives a grounded and unified explanation to the typical behaviour that has been observed numerically for many problems encompassed in our framework, including the Lasso, the group Lasso, the fused Lasso and the nuclear norm regularization to name a few. These results may have numerous applications including in signal/image processing processing, sparse recovery and machine learning.

preprint2015arXiv

Minimax and adaptive estimation of the Wigner function in quantum homodyne tomography with noisy data

In quantum optics, the quantum state of a light beam is represented through the Wigner function, a density on $\mathbb R^2$ which may take negative values but must respect intrinsic positivity constraints imposed by quantum physics. In the framework of noisy quantum homodyne tomography with efficiency parameter $1/2 < η\leq 1$, we study the theoretical performance of a kernel estimator of the Wigner function. We prove that it is minimax efficient, up to a logarithmic factor in the sample size, for the $\mathbb L_\infty$-risk over a class of infinitely differentiable. We compute also the lower bound for the $\mathbb L_2$-risk. We construct adaptive estimator, i.e. which does not depend on the smoothness parameters, and prove that it attains the minimax rates for the corresponding smoothness class functions. Finite sample behaviour of our adaptive procedure are explored through numerical experiments.

preprint2015arXiv

Piecewise rigid curve deformation via a Finsler steepest descent

This paper introduces a novel steepest descent flow in Banach spaces. This extends previous works on generalized gradient descent, notably the work of Charpiat et al., to the setting of Finsler metrics. Such a generalized gradient allows one to take into account a prior on deformations (e.g., piecewise rigid) in order to favor some specific evolutions. We define a Finsler gradient descent method to minimize a functional defined on a Banach space and we prove a convergence theorem for such a method. In particular, we show that the use of non-Hilbertian norms on Banach spaces is useful to study non-convex optimization problems where the geometry of the space might play a crucial role to avoid poor local minima. We show some applications to the curve matching problem. In particular, we characterize piecewise rigid deformations on the space of curves and we study several models to perform piecewise rigid evolution of curves.

preprint2015arXiv

Sparse Spikes Deconvolution on Thin Grids

This article analyzes the recovery performance of two popular finite dimensional approximations of the sparse spikes deconvolution problem over Radon measures. We examine in a unified framework both the L1 regularization (often referred to as Lasso or Basis-Pursuit) and the Continuous Basis-Pursuit (C-BP) methods. The Lasso is the de-facto standard for the sparse regularization of inverse problems in imaging. It performs a nearest neighbor interpolation of the spikes locations on the sampling grid. The C-BP method, introduced by Ekanadham, Tranchina and Simoncelli, uses a linear interpolation of the locations to perform a better approximation of the infinite-dimensional optimization problem, for positive measures. We show that, in the small noise regime, both methods estimate twice the number of spikes as the number of original spikes. Indeed, we show that they both detect two neighboring spikes around the locations of an original spikes. These results for deconvolution problems are based on an abstract analysis of the so-called extended support of the solutions of L1-type problems (including as special cases the Lasso and C-BP for deconvolution), which are of an independent interest. They precisely characterize the support of the solutions when the noise is small and the regularization parameter is selected accordingly. We illustrate these findings to analyze for the first time the support instability of compressed sensing recovery when the number of measurements is below the critical limit (well documented in the literature) where the support is provably stable.

preprint2014arXiv

A $Γ$-Convergence Result for the Upper Bound Limit Analysis of Plates

Upper bound limit analysis allows one to evaluate directly the ultimate load of structures without performing a cumbersome incremental analysis. In order to numerically apply this method to thin plates in bending, several authors have proposed to use various finite elements discretizations. We provide in this paper a mathematical analysis which ensures the convergence of the finite element method, even with finite elements with discontinuous derivatives such as the quadratic 6 node Lagrange triangles and the cubic Hermite triangles. More precisely, we prove the $Γ$-convergence of the discretized problems towards the continuous limit analysis problem. Numerical results illustrate the relevance of this analysis for the yield design of both homogeneous and non-homogeneous materials.

preprint2014arXiv

Exact Support Recovery for Sparse Spikes Deconvolution

This paper studies sparse spikes deconvolution over the space of measures. We focus our attention to the recovery properties of the support of the measure, i.e. the location of the Dirac masses. For non-degenerate sums of Diracs, we show that, when the signal-to-noise ratio is large enough, total variation regularization (which is the natural extension of the L1 norm of vectors to the setting of measures) recovers the exact same number of Diracs. We also show that both the locations and the heights of these Diracs converge toward those of the input measure when the noise drops to zero. The exact speed of convergence is governed by a specific dual certificate, which can be computed by solving a linear system. We draw connections between the support of the recovered measure on a continuous domain and on a discretized grid. We show that when the signal-to-noise level is large enough, the solution of the discretized problem is supported on pairs of Diracs which are neighbors of the Diracs of the input measure. This gives a precise description of the convergence of the solution of the discretized problem toward the solution of the continuous grid-free problem, as the grid size tends to zero.

preprint2014arXiv

Iteration-Complexity of a Generalized Forward Backward Splitting Algorithm

In this paper, we analyze the iteration-complexity of Generalized Forward--Backward (GFB) splitting algorithm, as proposed in \cite{gfb2011}, for minimizing a large class of composite objectives $f + \sum_{i=1}^n h_i$ on a Hilbert space, where $f$ has a Lipschitz-continuous gradient and the $h_i$'s are simple (\ie their proximity operators are easy to compute). We derive iteration-complexity bounds (pointwise and ergodic) for the inexact version of GFB to obtain an approximate solution based on an easily verifiable termination criterion. Along the way, we prove complexity bounds for relaxed and inexact fixed point iterations built from composition of nonexpansive averaged operators. These results apply more generally to GFB when used to find a zero of a sum of $n > 0$ maximal monotone operators and a co-coercive operator on a Hilbert space. The theoretical findings are exemplified with experiments on video processing.

preprint2014arXiv

Iterative Bregman Projections for Regularized Transportation Problems

This article details a general numerical framework to approximate so-lutions to linear programs related to optimal transport. The general idea is to introduce an entropic regularization of the initial linear program. This regularized problem corresponds to a Kullback-Leibler Bregman di-vergence projection of a vector (representing some initial joint distribu-tion) on the polytope of constraints. We show that for many problems related to optimal transport, the set of linear constraints can be split in an intersection of a few simple constraints, for which the projections can be computed in closed form. This allows us to make use of iterative Bregman projections (when there are only equality constraints) or more generally Bregman-Dykstra iterations (when inequality constraints are in-volved). We illustrate the usefulness of this approach to several variational problems related to optimal transport: barycenters for the optimal trans-port metric, tomographic reconstruction, multi-marginal optimal trans-port and in particular its application to Brenier's relaxed solutions of in-compressible Euler equations, partial un-balanced optimal transport and optimal transport with capacity constraints.

preprint2014arXiv

Low Complexity Regularization of Linear Inverse Problems

Inverse problems and regularization theory is a central theme in contemporary signal processing, where the goal is to reconstruct an unknown signal from partial indirect, and possibly noisy, measurements of it. A now standard method for recovering the unknown signal is to solve a convex optimization problem that enforces some prior knowledge about its structure. This has proved efficient in many problems routinely encountered in imaging sciences, statistics and machine learning. This chapter delivers a review of recent advances in the field where the regularization prior promotes solutions conforming to some notion of simplicity/low-complexity. These priors encompass as popular examples sparsity and group sparsity (to capture the compressibility of natural signals and images), total variation and analysis sparsity (to promote piecewise regularity), and low-rank (as natural extension of sparsity to matrix-valued data). Our aim is to provide a unified treatment of all these regularizations under a single umbrella, namely the theory of partial smoothness. This framework is very general and accommodates all low-complexity regularizers just mentioned, as well as many others. Partial smoothness turns out to be the canonical way to encode low-dimensional models that can be linear spaces or more general smooth manifolds. This review is intended to serve as a one stop shop toward the understanding of the theoretical properties of the so-regularized solutions. It covers a large spectrum including: (i) recovery guarantees and stability to noise, both in terms of $\ell^2$-stability and model (manifold) identification; (ii) sensitivity analysis to perturbations of the parameters involved (in particular the observations), with applications to unbiased risk estimation ; (iii) convergence properties of the forward-backward proximal splitting scheme, that is particularly well suited to solve the corresponding large-scale regularized optimization problem.

preprint2014arXiv

Model Consistency of Partly Smooth Regularizers

This paper studies least-square regression penalized with partly smooth convex regularizers. This class of functions is very large and versatile allowing to promote solutions conforming to some notion of low-complexity. Indeed, they force solutions of variational problems to belong to a low-dimensional manifold (the so-called model) which is stable under small perturbations of the function. This property is crucial to make the underlying low-complexity model robust to small noise. We show that a generalized "irrepresentable condition" implies stable model selection under small noise perturbations in the observations and the design matrix, when the regularization parameter is tuned proportionally to the noise level. This condition is shown to be almost a necessary condition. We then show that this condition implies model consistency of the regularized estimator. That is, with a probability tending to one as the number of measurements increases, the regularized estimator belongs to the correct low-dimensional model manifold. This work unifies and generalizes several previous ones, where model consistency is known to hold for sparse, group sparse, total variation and low-rank regularizations.

preprint2014arXiv

Model Selection with Low Complexity Priors

Regularization plays a pivotal role when facing the challenge of solving ill-posed inverse problems, where the number of observations is smaller than the ambient dimension of the object to be estimated. A line of recent work has studied regularization models with various types of low-dimensional structures. In such settings, the general approach is to solve a regularized optimization problem, which combines a data fidelity term and some regularization penalty that promotes the assumed low-dimensional/simple structure. This paper provides a general framework to capture this low-dimensional structure through what we coin partly smooth functions relative to a linear manifold. These are convex, non-negative, closed and finite-valued functions that will promote objects living on low-dimensional subspaces. This class of regularizers encompasses many popular examples such as the L1 norm, L1-L2 norm (group sparsity), as well as several others including the Linfty norm. We also show that the set of partly smooth functions relative to a linear manifold is closed under addition and pre-composition by a linear operator, which allows to cover mixed regularization, and the so-called analysis-type priors (e.g. total variation, fused Lasso, finite-valued polyhedral gauges). Our main result presents a unified sharp analysis of exact and robust recovery of the low-dimensional subspace model associated to the object to recover from partial measurements. This analysis is illustrated on a number of special and previously studied cases, and on an analysis of the performance of Linfty regularization in a compressed sensing scenario.

preprint2014arXiv

Stein Unbiased GrAdient estimator of the Risk (SUGAR) for multiple parameter selection

Algorithms to solve variational regularization of ill-posed inverse problems usually involve operators that depend on a collection of continuous parameters. When these operators enjoy some (local) regularity, these parameters can be selected using the so-called Stein Unbiased Risk Estimate (SURE). While this selection is usually performed by exhaustive search, we address in this work the problem of using the SURE to efficiently optimize for a collection of continuous parameters of the model. When considering non-smooth regularizers, such as the popular l1-norm corresponding to soft-thresholding mapping, the SURE is a discontinuous function of the parameters preventing the use of gradient descent optimization techniques. Instead, we focus on an approximation of the SURE based on finite differences as proposed in (Ramani et al., 2008). Under mild assumptions on the estimation mapping, we show that this approximation is a weakly differentiable function of the parameters and its weak gradient, coined the Stein Unbiased GrAdient estimator of the Risk (SUGAR), provides an asymptotically (with respect to the data dimension) unbiased estimate of the gradient of the risk. Moreover, in the particular case of soft-thresholding, it is proved to be also a consistent estimator. This gradient estimate can then be used as a basis to perform a quasi-Newton optimization. The computation of the SUGAR relies on the closed-form (weak) differentiation of the non-smooth function. We provide its expression for a large class of iterative methods including proximal splitting ones and apply our strategy to regularizations involving non-smooth convex structured penalties. Illustrations on various image restoration and matrix completion problems are given.

preprint2013arXiv

Optimal Transport with Proximal Splitting

This article reviews the use of first order convex optimization schemes to solve the discretized dynamic optimal transport problem, initially proposed by Benamou and Brenier. We develop a staggered grid discretization that is well adapted to the computation of the $L^2$ optimal transport geodesic between distributions defined on a uniform spatial grid. We show how proximal splitting schemes can be used to solve the resulting large scale convex optimization problem. A specific instantiation of this method on a centered grid corresponds to the initial algorithm developed by Benamou and Brenier. We also show how more general cost functions can be taken into account and how to extend the method to perform optimal transport on a Riemannian manifold.

preprint2013arXiv

Regularized Discrete Optimal Transport

This article introduces a generalization of the discrete optimal transport, with applications to color image manipulations. This new formulation includes a relaxation of the mass conservation constraint and a regularization term. These two features are crucial for image processing tasks, which necessitate to take into account families of multimodal histograms, with large mass variation across modes. The corresponding relaxed and regularized transportation problem is the solution of a convex optimization problem. Depending on the regularization used, this minimization can be solved using standard linear programming methods or first order proximal splitting schemes. The resulting transportation plan can be used as a color transfer map, which is robust to mass variation across images color palettes. Furthermore, the regularization of the transport plan helps to remove colorization artifacts due to noise amplification. We also extend this framework to the computation of barycenters of distributions. The barycenter is the solution of an optimization problem, which is separately convex with respect to the barycenter and the transportation plans, but not jointly convex. A block coordinate descent scheme converges to a stationary point of the energy. We show that the resulting algorithm can be used for color normalization across several images. The relaxed and regularized barycenter defines a common color palette for those images. Applying color transfer toward this average palette performs a color normalization of the input images.

preprint2013arXiv

Robust Polyhedral Regularization

In this paper, we establish robustness to noise perturbations of polyhedral regularization of linear inverse problems. We provide a sufficient condition that ensures that the polyhedral face associated to the true vector is equal to that of the recovered one. This criterion also implies that the $\ell^2$ recovery error is proportional to the noise level for a range of parameter. Our criterion is expressed in terms of the hyperplanes supporting the faces of the unit polyhedral ball of the regularization. This generalizes to an arbitrary polyhedral regularization results that are known to hold for sparse synthesis and analysis $\ell^1$ regularization which are encompassed in this framework. As a byproduct, we obtain recovery guarantees for $\ell^\infty$ and $\ell^1-\ell^\infty$ regularization.

preprint2013arXiv

Stein COnsistent Risk Estimator (SCORE) for hard thresholding

In this work, we construct a risk estimator for hard thresholding which can be used as a basis to solve the difficult task of automatically selecting the threshold. As hard thresholding is not even continuous, Stein's lemma cannot be used to get an unbiased estimator of degrees of freedom, hence of the risk. We prove that under a mild condition, our estimator of the degrees of freedom, although biased, is consistent. Numerical evidence shows that our estimator outperforms another biased risk estimator.

preprint2012arXiv

Generalized Forward-Backward Splitting

This paper introduces the generalized forward-backward splitting algorithm for minimizing convex functions of the form $F + \sum_{i=1}^n G_i$, where $F$ has a Lipschitz-continuous gradient and the $G_i$'s are simple in the sense that their Moreau proximity operators are easy to compute. While the forward-backward algorithm cannot deal with more than $n = 1$ non-smooth function, our method generalizes it to the case of arbitrary $n$. Our method makes an explicit use of the regularity of $F$ in the forward step, and the proximity operators of the $G_i$'s are applied in parallel in the backward step. This allows the generalized forward backward to efficiently address an important class of convex problems. We prove its convergence in infinite dimension, and its robustness to errors on the computation of the proximity operators and of the gradient of $F$. Examples on inverse problems in imaging demonstrate the advantage of the proposed methods in comparison to other splitting algorithms.

preprint2012arXiv

Local Behavior of Sparse Analysis Regularization: Applications to Risk Estimation

In this paper, we aim at recovering an unknown signal x0 from noisy L1measurements y=Phi*x0+w, where Phi is an ill-conditioned or singular linear operator and w accounts for some noise. To regularize such an ill-posed inverse problem, we impose an analysis sparsity prior. More precisely, the recovery is cast as a convex optimization program where the objective is the sum of a quadratic data fidelity term and a regularization term formed of the L1-norm of the correlations between the sought after signal and atoms in a given (generally overcomplete) dictionary. The L1-sparsity analysis prior is weighted by a regularization parameter lambda>0. In this paper, we prove that any minimizers of this problem is a piecewise-affine function of the observations y and the regularization parameter lambda. As a byproduct, we exploit these properties to get an objectively guided choice of lambda. In particular, we develop an extension of the Generalized Stein Unbiased Risk Estimator (GSURE) and show that it is an unbiased and reliable estimator of an appropriately defined risk. The latter encompasses special cases such as the prediction risk, the projection risk and the estimation risk. We apply these risk estimators to the special case of L1-sparsity analysis regularization. We also discuss implementation issues and propose fast algorithms to solve the L1 analysis minimization problem and to compute the associated GSURE. We finally illustrate the applicability of our framework to parameter(s) selection on several imaging problems.

preprint2012arXiv

Risk estimation for matrix recovery with spectral regularization

In this paper, we develop an approach to recursively estimate the quadratic risk for matrix recovery problems regularized with spectral functions. Toward this end, in the spirit of the SURE theory, a key step is to compute the (weak) derivative and divergence of a solution with respect to the observations. As such a solution is not available in closed form, but rather through a proximal splitting algorithm, we propose to recursively compute the divergence from the sequence of iterates. A second challenge that we unlocked is the computation of the (weak) derivative of the proximity operator of a spectral function. To show the potential applicability of our approach, we exemplify it on a matrix completion problem to objectively and automatically select the regularization parameter.

preprint2012arXiv

Robust Sparse Analysis Regularization

This paper investigates the theoretical guarantees of L1-analysis regularization when solving linear inverse problems. Most of previous works in the literature have mainly focused on the sparse synthesis prior where the sparsity is measured as the L1 norm of the coefficients that synthesize the signal from a given dictionary. In contrast, the more general analysis regularization minimizes the L1 norm of the correlations between the signal and the atoms in the dictionary, where these correlations define the analysis support. The corresponding variational problem encompasses several well-known regularizations such as the discrete total variation and the Fused Lasso. Our main contributions consist in deriving sufficient conditions that guarantee exact or partial analysis support recovery of the true signal in presence of noise. More precisely, we give a sufficient condition to ensure that a signal is the unique solution of the L1-analysis regularization in the noiseless case. The same condition also guarantees exact analysis support recovery and L2-robustness of the L1-analysis minimizer vis-a-vis an enough small noise in the measurements. This condition turns to be sharp for the robustness of the analysis support. To show partial support recovery and L2-robustness to an arbitrary bounded noise, we introduce a stronger sufficient condition. When specialized to the L1-synthesis regularization, our results recover some corresponding recovery and robustness guarantees previously known in the literature. From this perspective, our work is a generalization of these results. We finally illustrate these theoretical findings on several examples to study the robustness of the 1-D total variation and Fused Lasso regularizations.

preprint2012arXiv

The Degrees of Freedom of the Group Lasso

This paper studies the sensitivity to the observations of the block/group Lasso solution to an overdetermined linear regression model. Such a regularization is known to promote sparsity patterns structured as nonoverlapping groups of coefficients. Our main contribution provides a local parameterization of the solution with respect to the observations. As a byproduct, we give an unbiased estimate of the degrees of freedom of the group Lasso. Among other applications of such results, one can choose in a principled and objective way the regularization parameter of the Lasso through model selection criteria.

preprint2012arXiv

The degrees of freedom of the Group Lasso for a General Design

In this paper, we are concerned with regression problems where covariates can be grouped in nonoverlapping blocks, and where only a few of them are assumed to be active. In such a situation, the group Lasso is an at- tractive method for variable selection since it promotes sparsity of the groups. We study the sensitivity of any group Lasso solution to the observations and provide its precise local parameterization. When the noise is Gaussian, this allows us to derive an unbiased estimator of the degrees of freedom of the group Lasso. This result holds true for any fixed design, no matter whether it is under- or overdetermined. With these results at hand, various model selec- tion criteria, such as the Stein Unbiased Risk Estimator (SURE), are readily available which can provide an objectively guided choice of the optimal group Lasso fit.

preprint2012arXiv

The degrees of freedom of the Lasso for general design matrix

In this paper, we investigate the degrees of freedom ($\dof$) of penalized $\ell_1$ minimization (also known as the Lasso) for linear regression models. We give a closed-form expression of the $\dof$ of the Lasso response. Namely, we show that for any given Lasso regularization parameter $λ$ and any observed data $y$ belonging to a set of full (Lebesgue) measure, the cardinality of the support of a particular solution of the Lasso problem is an unbiased estimator of the degrees of freedom. This is achieved without the need of uniqueness of the Lasso solution. Thus, our result holds true for both the underdetermined and the overdetermined case, where the latter was originally studied in \cite{zou}. We also show, by providing a simple counterexample, that although the $\dof$ theorem of \cite{zou} is correct, their proof contains a flaw since their divergence formula holds on a different set of a full measure than the one that they claim. An effective estimator of the number of degrees of freedom may have several applications including an objectively guided choice of the regularization parameter in the Lasso through the $\sure$ framework. Our theoretical findings are illustrated through several numerical simulations.

preprint2011arXiv

A Panorama on Multiscale Geometric Representations, Intertwining Spatial, Directional and Frequency Selectivity

The richness of natural images makes the quest for optimal representations in image processing and computer vision challenging. The latter observation has not prevented the design of image representations, which trade off between efficiency and complexity, while achieving accurate rendering of smooth regions as well as reproducing faithful contours and textures. The most recent ones, proposed in the past decade, share an hybrid heritage highlighting the multiscale and oriented nature of edges and patterns in images. This paper presents a panorama of the aforementioned literature on decompositions in multiscale, multi-orientation bases or dictionaries. They typically exhibit redundancy to improve sparsity in the transformed domain and sometimes its invariance with respect to simple geometric deformations (translation, rotation). Oriented multiscale dictionaries extend traditional wavelet processing and may offer rotation invariance. Highly redundant dictionaries require specific algorithms to simplify the search for an efficient (sparse) representation. We also discuss the extension of multiscale geometric decompositions to non-Euclidean domains such as the sphere or arbitrary meshed surfaces. The etymology of panorama suggests an overview, based on a choice of partially overlapping "pictures". We hope that this paper will contribute to the appreciation and apprehension of a stream of current research directions in image understanding.

preprint2011arXiv

Sharp Support Recovery from Noisy Random Measurements by L1 minimization

In this paper, we investigate the theoretical guarantees of penalized $\lun$ minimization (also called Basis Pursuit Denoising or Lasso) in terms of sparsity pattern recovery (support and sign consistency) from noisy measurements with non-necessarily random noise, when the sensing operator belongs to the Gaussian ensemble (i.e. random design matrix with i.i.d. Gaussian entries). More precisely, we derive sharp non-asymptotic bounds on the sparsity level and (minimal) signal-to-noise ratio that ensure support identification for most signals and most Gaussian sensing matrices by solving the Lasso problem with an appropriately chosen regularization parameter. Our first purpose is to establish conditions allowing exact sparsity pattern recovery when the signal is strictly sparse. Then, these conditions are extended to cover the compressible or nearly sparse case. In these two results, the role of the minimal signal-to-noise ratio is crucial. Our third main result gets rid of this assumption in the strictly sparse case, but this time, the Lasso allows only partial recovery of the support. We also provide in this case a sharp $\ell_2$-consistency result on the coefficient vector. The results of the present work have several distinctive features compared to previous ones. One of them is that the leading constants involved in all the bounds are sharp and explicit. This is illustrated by some numerical experiments where it is indeed shown that the sharp sparsity level threshold identified by our theoretical results below which sparsistency of the Lasso is guaranteed meets that empirically observed.

preprint2008arXiv

Compressive Wave Computation

This paper considers large-scale simulations of wave propagation phenomena. We argue that it is possible to accurately compute a wavefield by decomposing it onto a largely incomplete set of eigenfunctions of the Helmholtz operator, chosen at random, and that this provides a natural way of parallelizing wave simulations for memory-intensive applications. This paper shows that L1-Helmholtz recovery makes sense for wave computation, and identifies a regime in which it is provably effective: the one-dimensional wave equation with coefficients of small bounded variation. Under suitable assumptions we show that the number of eigenfunctions needed to evolve a sparse wavefield defined on N points, accurately with very high probability, is bounded by C log(N) log(log(N)), where C is related to the desired accuracy and can be made to grow at a much slower rate than N when the solution is sparse. The PDE estimates that underlie this result are new to the authors' knowledge and may be of independent mathematical interest; they include an L1 estimate for the wave equation, an estimate of extension of eigenfunctions, and a bound for eigenvalue gaps in Sturm-Liouville problems. Numerical examples are presented in one spatial dimension and show that as few as 10 percents of all eigenfunctions can suffice for accurate results. Finally, we argue that the compressive viewpoint suggests a competitive parallel algorithm for an adjoint-state inversion method in reflection seismology.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

math.OC Machine Learning Information Theory math.IT math.ST Statistics Theory math.NA Computer Vision math.AP Applications Artificial Intelligence Computation Discrete Mathematics math.PR quant-ph

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2605.08392:author:3:gabriel-peyre

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.10775:author:3:gabriel-peyre

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.17660:author:4:gabriel-peyre

Imported May 20, 2026Synced May 21, 2026

14 works

Jalal Fadili

Researcher

Jalal Fadili contributes to research discovery and scholarly infrastructure.

Open to collaborate

11 works

Samuel Vaiter

Researcher

Samuel Vaiter contributes to research discovery and scholarly infrastructure.

Open to collaborate

8 works

Charles Dossal

Researcher

Charles Dossal contributes to research discovery and scholarly infrastructure.

Open to collaborate

8 works

Jalal M. Fadili

Researcher

Jalal M. Fadili contributes to research discovery and scholarly infrastructure.

Open to collaborate

Gabriel Peyré

What is connected

Connect this record

See the researcher in context

Building this map preview

56 published item(s)

Geometry-Aware Discretization Error of Diffusion Models

On the global convergence of gradient descent for wide shallow models with bounded nonlinearities

Training Infinitely Deep and Wide Transformers

Sinkhorn Divergences for Unbalanced Optimal Transport

The Unbalanced Gromov Wasserstein Distance: Conic Formulation and Relaxation

Unbalanced Optimal Transport, from Theory to Numerics

Do Residual Neural Networks discretize Neural Ordinary Differential Equations?

Fast and accurate optimization on the orthogonal manifold without retraction

Faster Unbalanced Optimal Transport: Translation invariant Sinkhorn and 1-D Frank-Wolfe

Sinkformers: Transformers with Doubly Stochastic Attention

Unsupervised Ground Metric Learning using Wasserstein Singular Vectors

Low-Rank Sinkhorn Factorization

Computational Optimal Transport

Online Sinkhorn: Optimal Transport distances from sample streams

Super-efficiency of automatic differentiation for functions defined as a minimum

The geometry of off-the-grid compressed sensing

Wasserstein Control of Mirror Langevin Monte Carlo

A Multi-step Inertial Forward--Backward Splitting Method for Non-convex Optimization

Geometric properties of solutions to the total variation denoising problem

Sparse Support Recovery with Non-smooth Loss Functions

Stochastic Optimization for Large-scale Optimal Transport

Support Recovery for Sparse Deconvolution of Positive Measures

The Degrees of Freedom of Partly Smooth Regularizers

A Smoothed Dual Approach for Variational Wasserstein Problems

Activity Identification and Local Linear Convergence of Douglas--Rachford/ADMM under Partial Smoothness

An Interpolating Distance between Optimal Transport and Fisher-Rao

Biologically Inspired Dynamic Textures for Probing Motion Perception

Convergence Rates with Inexact Non-expansive Operators

Entropic Wasserstein Gradient Flows

Fast Optimal Transport Averaging of Neuroimaging Data

Local Linear Convergence of Forward-Backward under Partial Smoothness

Minimax and adaptive estimation of the Wigner function in quantum homodyne tomography with noisy data

Piecewise rigid curve deformation via a Finsler steepest descent

Sparse Spikes Deconvolution on Thin Grids

A $Γ$-Convergence Result for the Upper Bound Limit Analysis of Plates

Exact Support Recovery for Sparse Spikes Deconvolution

Iteration-Complexity of a Generalized Forward Backward Splitting Algorithm

Iterative Bregman Projections for Regularized Transportation Problems

Low Complexity Regularization of Linear Inverse Problems

Model Consistency of Partly Smooth Regularizers

Model Selection with Low Complexity Priors

Stein Unbiased GrAdient estimator of the Risk (SUGAR) for multiple parameter selection

Optimal Transport with Proximal Splitting

Regularized Discrete Optimal Transport

Robust Polyhedral Regularization

Stein COnsistent Risk Estimator (SCORE) for hard thresholding

Generalized Forward-Backward Splitting

Local Behavior of Sparse Analysis Regularization: Applications to Risk Estimation

Risk estimation for matrix recovery with spectral regularization

Robust Sparse Analysis Regularization

The Degrees of Freedom of the Group Lasso

The degrees of freedom of the Group Lasso for a General Design

The degrees of freedom of the Lasso for general design matrix

A Panorama on Multiscale Geometric Representations, Intertwining Spatial, Directional and Frequency Selectivity

Sharp Support Recovery from Noisy Random Measurements by L1 minimization

Compressive Wave Computation