Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
20works
0followers
20topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

20 published item(s)

preprint2022arXiv

An Embedding of ReLU Networks and an Analysis of their Identifiability

Neural networks with the Rectified Linear Unit (ReLU) nonlinearity are described by a vector of parameters $θ$, and realized as a piecewise linear continuous function $R_θ: x \in \mathbb R^{d} \mapsto R_θ(x) \in \mathbb R^{k}$. Natural scalings and permutations operations on the parameters $θ$ leave the realization unchanged, leading to equivalence classes of parameters that yield the same realization. These considerations in turn lead to the notion of identifiability -- the ability to recover (the equivalence class of) $θ$ from the sole knowledge of its realization $R_θ$. The overall objective of this paper is to introduce an embedding for ReLU neural networks of any depth, $Φ(θ)$, that is invariant to scalings and that provides a locally linear parameterization of the realization of the network. Leveraging these two key properties, we derive some conditions under which a deep ReLU network is indeed locally identifiable from the knowledge of the realization on a finite set of samples $x_{i} \in \mathbb R^{d}$. We study the shallow case in more depth, establishing necessary and sufficient conditions for the network to be identifiable from a bounded subset $\mathcal X \subseteq \mathbb R^{d}$.

preprint2021arXiv

Minibatch optimal transport distances; analysis and applications

Optimal transport distances have become a classic tool to compare probability distributions and have found many applications in machine learning. Yet, despite recent algorithmic developments, their complexity prevents their direct use on large scale datasets. To overcome this challenge, a common workaround is to compute these distances on minibatches i.e. to average the outcome of several smaller optimal transport problems. We propose in this paper an extended analysis of this practice, which effects were previously studied in restricted cases. We first consider a large variety of Optimal Transport kernels. We notably argue that the minibatch strategy comes with appealing properties such as unbiased estimators, gradients and a concentration bound around the expectation, but also with limits: the minibatch OT is not a distance. To recover some of the lost distance axioms, we introduce a debiased minibatch OT function and study its statistical and optimisation properties. Along with this theoretical analysis, we also conduct empirical experiments on gradient flows, generative adversarial networks (GANs) or color transfer that highlight the practical interest of this strategy.

preprint2021arXiv

Nonsmooth convex optimization to estimate the Covid-19 reproduction number space-time evolution with robustness against low quality data

Daily pandemic surveillance, often achieved through the estimation of the reproduction number, constitutes a critical challenge for national health authorities to design countermeasures. In an earlier work, we proposed to formulate the estimation of the reproduction number as an optimization problem, combining data-model fidelity and space-time regularity constraints, solved by nonsmooth convex proximal minimizations. Though promising, that first formulation significantly lacks robustness against the Covid-19 data low quality (irrelevant or missing counts, pseudo-seasonalities,.. .) stemming from the emergency and crisis context, which significantly impairs accurate pandemic evolution assessments. The present work aims to overcome these limitations by carefully crafting a functional permitting to estimate jointly, in a single step, the reproduction number and outliers defined to model low quality data. This functional also enforces epidemiology-driven regularity properties for the reproduction number estimates, while preserving convexity, thus permitting the design of efficient minimization algorithms, based on proximity operators that are derived analytically. The explicit convergence of the proposed algorithm is proven theoretically. Its relevance is quantified on real Covid-19 data, consisting of daily new infection counts for 200+ countries and for the 96 metropolitan France counties, publicly available at Johns Hopkins University and Sant{é}-Publique-France. The procedure permits automated daily updates of these estimates, reported via animated and interactive maps. Open-source estimation procedures will be made publicly available.

preprint2020arXiv

A characterization of proximity operators

We characterize proximity operators, that is to say functions that map a vector to a solution of a penalized least squares optimization problem. Proximity operators of convex penalties have been widely studied and fully characterized by Moreau. They are also widely used in practice with nonconvex penalties such as the {\ell} 0 pseudo-norm, yet the extension of Moreau's characterization to this setting seemed to be a missing element of the literature. We characterize proximity operators of (convex or nonconvex) penalties as functions that are the subdifferential of some convex potential. This is proved as a consequence of a more general characterization of so-called Bregman proximity operators of possibly nonconvex penalties in terms of certain convex potentials. As a side effect of our analysis, we obtain a test to verify whether a given function is the proximity operator of some penalty, or not. Many well-known shrinkage operators are indeed confirmed to be proximity operators. However, we prove that windowed Group-LASSO and persistent empirical Wiener shrinkage -- two forms of so-called social sparsity shrinkage-- are generally not the proximity operator of any penalty; the exception is when they are simply weighted versions of group-sparse shrinkage with non-overlapping groups.

preprint2020arXiv

Approximation spaces of deep neural networks

We study the expressivity of deep neural networks. Measuring a network's complexity by its number of connections or by its number of neurons, we consider the class of functions for which the error of best approximation with networks of a given complexity decays at a certain rate when increasing the complexity budget. Using results from classical approximation theory, we show that this class can be endowed with a (quasi)-norm that makes it a linear function space, called approximation space. We establish that allowing the networks to have certain types of "skip connections" does not change the resulting approximation spaces. We also discuss the role of the network's nonlinearity (also known as activation function) on the resulting spaces, as well as the role of depth. For the popular ReLU nonlinearity and its powers, we relate the newly constructed spaces to classical Besov spaces. The established embeddings highlight that some functions of very low Besov smoothness can nevertheless be well approximated by neural networks, if these networks are sufficiently deep.

preprint2020arXiv

Don't take it lightly: Phasing optical random projections with unknown operators

In this paper we tackle the problem of recovering the phase of complex linear measurements when only magnitude information is available and we control the input. We are motivated by the recent development of dedicated optics-based hardware for rapid random projections which leverages the propagation of light in random media. A signal of interest $\mathbfξ \in \mathbb{R}^N$ is mixed by a random scattering medium to compute the projection $\mathbf{y} = \mathbf{A} \mathbfξ$, with $\mathbf{A} \in \mathbb{C}^{M \times N}$ being a realization of a standard complex Gaussian iid random matrix. Such optics-based matrix multiplications can be much faster and energy-efficient than their CPU or GPU counterparts, yet two difficulties must be resolved: only the intensity ${|\mathbf{y}|}^2$ can be recorded by the camera, and the transmission matrix $\mathbf{A}$ is unknown. We show that even without knowing $\mathbf{A}$, we can recover the unknown phase of $\mathbf{y}$ for some equivalent transmission matrix with the same distribution as $\mathbf{A}$. Our method is based on two observations: first, conjugating or changing the phase of any row of $\mathbf{A}$ does not change its distribution; and second, since we control the input we can interfere $\mathbfξ$ with arbitrary reference signals. We show how to leverage these observations to cast the measurement phase retrieval problem as a Euclidean distance geometry problem. We demonstrate appealing properties of the proposed algorithm in both numerical simulations and real hardware experiments. Not only does our algorithm accurately recover the missing phase, but it mitigates the effects of quantization and the sensitivity threshold, thus improving the measured magnitudes.

preprint2020arXiv

Fast Optical System Identification by Numerical Interferometry

We propose a numerical interferometry method for identification of optical multiply-scattering systems when only intensity can be measured. Our method simplifies the calibration of optical transmission matrices from a quadratic to a linear inverse problem by first recovering the phase of the measurements. We show that by carefully designing the probing signals, measurement phase retrieval amounts to a distance geometry problem---a multilateration---in the complex plane. Since multilateration can be formulated as a small linear system which is the same for entire rows of the transmission matrix, the phases can be retrieved very efficiently. To speed up the subsequent estimation of transmission matrices, we design calibration signals so as to take advantage of the fast Fourier transform, achieving a numerical complexity almost linear in the number of transmission matrix entries. We run experiments on real optical hardware and use the numerically computed transmission matrix to recover an unseen image behind a scattering medium. Where the previous state-of-the-art method reports hours to compute the transmission matrix on a GPU, our method takes only a few minutes on a CPU.

preprint2014arXiv

Balancing Sparsity and Rank Constraints in Quadratic Basis Pursuit

We investigate the methods that simultaneously enforce sparsity and low-rank structure in a matrix as often employed for sparse phase retrieval problems or phase calibration problems in compressive sensing. We propose a new approach for analyzing the trade off between the sparsity and low rank constraints in these approaches which not only helps to provide guidelines to adjust the weights between the aforementioned constraints, but also enables new simulation strategies for evaluating performance. We then provide simulation results for phase retrieval and phase calibration cases both to demonstrate the consistency of the proposed method with other approaches and to evaluate the change of performance with different weights for the sparsity and low rank structure constraints.

preprint2014arXiv

Projection onto the Cosparse Set is NP-Hard

The computational complexity of a problem arising in the context of sparse optimization is considered, namely, the projection onto the set of $k$-cosparse vectors w.r.t. some given matrix $\Omeg$. It is shown that this projection problem is (strongly) \NP-hard, even in the special cases in which the matrix $\Omeg$ contains only ternary or bipolar coefficients. Interestingly, this is in contrast to the projection onto the set of $k$-sparse vectors, which is trivially solved by keeping only the $k$ largest coefficients.

preprint2014arXiv

Separable Cosparse Analysis Operator Learning

The ability of having a sparse representation for a certain class of signals has many applications in data analysis, image processing, and other research fields. Among sparse representations, the cosparse analysis model has recently gained increasing interest. Many signals exhibit a multidimensional structure, e.g. images or three-dimensional MRI scans. Most data analysis and learning algorithms use vectorized signals and thereby do not account for this underlying structure. The drawback of not taking the inherent structure into account is a dramatic increase in computational cost. We propose an algorithm for learning a cosparse Analysis Operator that adheres to the preexisting structure of the data, and thus allows for a very efficient implementation. This is achieved by enforcing a separable structure on the learned operator. Our learning algorithm is able to deal with multidimensional data of arbitrary order. We evaluate our method on volumetric data at the example of three-dimensional MRI scans.

preprint2013arXiv

Greedy-Like Algorithms for the Cosparse Analysis Model

The cosparse analysis model has been introduced recently as an interesting alternative to the standard sparse synthesis approach. A prominent question brought up by this new construction is the analysis pursuit problem -- the need to find a signal belonging to this model, given a set of corrupted measurements of it. Several pursuit methods have already been proposed based on $\ell_1$ relaxation and a greedy approach. In this work we pursue this question further, and propose a new family of pursuit algorithms for the cosparse analysis model, mimicking the greedy-like methods -- compressive sampling matching pursuit (CoSaMP), subspace pursuit (SP), iterative hard thresholding (IHT) and hard thresholding pursuit (HTP). Assuming the availability of a near optimal projection scheme that finds the nearest cosparse subspace to any vector, we provide performance guarantees for these algorithms. Our theoretical study relies on a restricted isometry property adapted to the context of the cosparse analysis model. We explore empirically the performance of these algorithms by adopting a plain thresholding projection, demonstrating their good performance.

preprint2012arXiv

Compressible Distributions for High-dimensional Statistics

We develop a principled way of identifying probability distributions whose independent and identically distributed (iid) realizations are compressible, i.e., can be well-approximated as sparse. We focus on Gaussian random underdetermined linear regression (GULR) problems, where compressibility is known to ensure the success of estimators exploiting sparse regularization. We prove that many distributions revolving around maximum a posteriori (MAP) interpretation of sparse regularized estimators are in fact incompressible, in the limit of large problem sizes. A highlight is the Laplace distribution and $\ell^{1}$ regularized estimators such as the Lasso and Basis Pursuit denoising. To establish this result, we identify non-trivial undersampling regions in GULR where the simple least squares solution almost surely outperforms an oracle sparse solution, when the data is generated from the Laplace distribution. We provide simple rules of thumb to characterize classes of compressible (respectively incompressible) distributions based on their second and fourth moments. Generalized Gaussians and generalized Pareto distributions serve as running examples for concreteness.

preprint2012arXiv

Joint k-step analysis of Orthogonal Matching Pursuit and Orthogonal Least Squares

Tropp's analysis of Orthogonal Matching Pursuit (OMP) using the Exact Recovery Condition (ERC) is extended to a first exact recovery analysis of Orthogonal Least Squares (OLS). We show that when the ERC is met, OLS is guaranteed to exactly recover the unknown support in at most k iterations. Moreover, we provide a closer look at the analysis of both OMP and OLS when the ERC is not fulfilled. The existence of dictionaries for which some subsets are never recovered by OMP is proved. This phenomenon also appears with basis pursuit where support recovery depends on the sign patterns, but it does not occur for OLS. Finally, numerical experiments show that none of the considered algorithms is uniformly better than the other but for correlated dictionaries, guaranteed exact recovery may be obtained after fewer iterations for OLS than for OMP.

preprint2012arXiv

Local stability and robustness of sparse dictionary learning in the presence of noise

A popular approach within the signal processing and machine learning communities consists in modelling signals as sparse linear combinations of atoms selected from a learned dictionary. While this paradigm has led to numerous empirical successes in various fields ranging from image to audio processing, there have only been a few theoretical arguments supporting these evidences. In particular, sparse coding, or sparse dictionary learning, relies on a non-convex procedure whose local minima have not been fully analyzed yet. In this paper, we consider a probabilistic model of sparse signals, and show that, with high probability, sparse coding admits a local minimum around the reference dictionary generating the signals. Our study takes into account the case of over-complete dictionaries and noisy signals, thus extending previous work limited to noiseless settings and/or under-complete dictionaries. The analysis we conduct is non-asymptotic and makes it possible to understand how the key quantities of the problem, such as the coherence or the level of noise, can scale with respect to the dimension of the signals, the number of atoms, the sparsity and the number of observations.

preprint2011arXiv

Blind calibration for compressed sensing by convex optimization

We consider the problem of calibrating a compressed sensing measurement system under the assumption that the decalibration consists in unknown gains on each measure. We focus on {\em blind} calibration, using measures performed on a few unknown (but sparse) signals. A naive formulation of this blind calibration problem, using $\ell_{1}$ minimization, is reminiscent of blind source separation and dictionary learning, which are known to be highly non-convex and riddled with local minima. In the considered context, we show that in fact this formulation can be exactly expressed as a convex optimization problem, and can be solved using off-the-shelf algorithms. Numerical simulations demonstrate the effectiveness of the approach even for highly uncalibrated measures, when a sufficient number of (unknown, but sparse) calibrating signals is provided. We observe that the success/failure of the approach seems to obey sharp phase transitions.

preprint2011arXiv

The Cosparse Analysis Model and Algorithms

After a decade of extensive study of the sparse representation synthesis model, we can safely say that this is a mature and stable field, with clear theoretical foundations, and appealing applications. Alongside this approach, there is an analysis counterpart model, which, despite its similarity to the synthesis alternative, is markedly different. Surprisingly, the analysis model did not get a similar attention, and its understanding today is shallow and partial. In this paper we take a closer look at the analysis approach, better define it as a generative model for signals, and contrast it with the synthesis one. This work proposes effective pursuit methods that aim to solve inverse problems regularized with the analysis-model prior, accompanied by a preliminary theoretical study of their performance. We demonstrate the effectiveness of the analysis model in several experiments.

preprint2011arXiv

The restricted isometry property meets nonlinear approximation with redundant frames

It is now well known that sparse or compressible vectors can be stably recovered from their low-dimensional projection, provided the projection matrix satisfies a Restricted Isometry Property (RIP). We establish new implications of the RIP with respect to nonlinear approximation in a Hilbert space with a redundant frame. The main ingredients of our approach are: a) Jackson and Bernstein inequalities, associated to the characterization of certain approximation spaces with interpolation spaces; b) a new proof that for overcomplete frames which satisfy a Bernstein inequality, these interpolation spaces are nothing but the collection of vectors admitting a representation in the dictionary with compressible coefficients; c) the proof that the RIP implies Bernstein inequalities. As a result, we obtain that in most overcomplete random Gaussian dictionaries with fixed aspect ratio, just as in any orthonormal basis, the error of best $m$-term approximation of a vector decays at a certain rate if, and only if, the vector admits a compressible expansion in the dictionary. Yet, for mildly overcomplete dictionaries with a one-dimensional kernel, we give examples where the Bernstein inequality holds, but the same inequality fails for even the smallest perturbation of the dictionary.

preprint2011arXiv

Universal and efficient compressed sensing by spread spectrum and application to realistic Fourier imaging techniques

We advocate a compressed sensing strategy that consists of multiplying the signal of interest by a wide bandwidth modulation before projection onto randomly selected vectors of an orthonormal basis. Firstly, in a digital setting with random modulation, considering a whole class of sensing bases including the Fourier basis, we prove that the technique is universal in the sense that the required number of measurements for accurate recovery is optimal and independent of the sparsity basis. This universality stems from a drastic decrease of coherence between the sparsity and the sensing bases, which for a Fourier sensing basis relates to a spread of the original signal spectrum by the modulation (hence the name "spread spectrum"). The approach is also efficient as sensing matrices with fast matrix multiplication algorithms can be used, in particular in the case of Fourier measurements. Secondly, these results are confirmed by a numerical analysis of the phase transition of the l1- minimization problem. Finally, we show that the spread spectrum technique remains effective in an analog setting with chirp modulation for application to realistic Fourier imaging. We illustrate these findings in the context of radio interferometry and magnetic resonance imaging.

preprint2011arXiv

Well-posedness of the permutation problem in sparse filter estimation with lp minimization

Convolutive source separation is often done in two stages: 1) estimation of the mixing filters and 2) estimation of the sources. Traditional approaches suffer from the ambiguities of arbitrary permutations and scaling in each frequency bin of the estimated filters and/or the sources, and they are usually corrected by taking into account some special properties of the filters/sources. This paper focusses on the filter permutation problem in the absence of scaling, investigating the possible use of the temporal sparsity of the filters as a property enabling permutation correction. Theoretical and experimental results highlight the potential as well as the limits of sparsity as an hypothesis to obtain a well-posed permutation problem.