Source author record

Simon Foucart

Simon Foucart appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT math.FA math.OC math.ST Statistics Theory Genomics Machine Learning math.NA Numerical Analysis

Catalog footprint

What is connected

10works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

On the sparsity of LASSO minimizers in sparse data recovery

We present a detailed analysis of the unconstrained $\ell_1$-weighted LASSO method for recovery of sparse data from its observation by randomly generated matrices, satisfying the Restricted Isometry Property (RIP) with constant $δ<1$, and subject to negligible measurement and compressibility errors. We prove that if the data is $k$-sparse, then the size of support of the LASSO minimizer, $s$, maintains a comparable sparsity, $s\leq C_δk$. For example, if $δ=0.7$ then $s< 11k$ and a slightly smaller $δ=0.4$ yields $s< 4k$. We also derive new $\ell_2/\ell_1$ error bounds which highlight precise dependence on $k$ and on the LASSO parameter $λ$, before the error is driven below the scale of negligible measurement/ and compressiblity errors.

preprint2022arXiv

The Sparsity of LASSO-type Minimizers

This note extends an attribute of the LASSO procedure to a whole class of related procedures, including square-root LASSO, square LASSO, LAD-LASSO, and an instance of generalized LASSO. Namely, under the assumption that the input matrix satisfies an $\ell_p$-restricted isometry property (which in some sense is weaker than the standard $\ell_2$-restricted isometry property assumption), it is shown that if the input vector comes from the exact measurement of a sparse vector, then the minimizer of any such LASSO-type procedure has sparsity comparable to the sparsity of the measured vector. The result remains valid in the presence of moderate measurement error when the regularization parameter is not too small.

preprint2020arXiv

Finer Metagenomic Reconstruction via Biodiversity Optimization

When analyzing communities of microorganisms from their sequenced DNA, an important task is taxonomic profiling: enumerating the presence and relative abundance of all organisms, or merely of all taxa, contained in the sample. This task can be tackled via compressive-sensing-based approaches, which favor communities featuring the fewest organisms among those consistent with the observed DNA data. Despite their successes, these parsimonious approaches sometimes conflict with biological realism by overlooking organism similarities. Here, we leverage a recently developed notion of biological diversity that simultaneously accounts for organism similarities and retains the optimization strategy underlying compressive-sensing-based approaches. We demonstrate that minimizing biological diversity still produces sparse taxonomic profiles and we experimentally validate superiority to existing compressive-sensing-based approaches. Despite showing that the objective function is almost never convex and often concave, generally yielding NP-hard problems, we exhibit ways of representing organism similarities for which minimizing diversity can be performed via a sequence of linear programs guaranteed to decrease diversity. Better yet, when biological similarity is quantified by $k$-mer co-occurrence (a popular notion in bioinformatics), minimizing diversity actually reduces to one linear program that can utilize multiple $k$-mer sizes to enhance performance. In proof-of-concept experiments, we verify that the latter procedure can lead to significant gains when taxonomically profiling a metagenomic sample, both in terms of reconstruction accuracy and computational performance. Reproducible code is available at https://github.com/dkoslicki/MinimizeBiologicalDiversity.

preprint2020arXiv

Instances of Computational Optimal Recovery: Dealing with Observation Errors

When attempting to recover functions from observational data, one naturally seeks to do so in an optimal manner with respect to some modeling assumption. With a focus put on the worst-case setting, this is the standard goal of Optimal Recovery. The distinctive twists here are the consideration of inaccurate data through some boundedness models and the emphasis on computational realizability. Several scenarios are unraveled through the efficient constructions of optimal recovery maps: local optimality under linearly or semidefinitely describable models, global optimality for the estimation of linear functionals under approximability models, and global near-optimality under approximability models in the space of continuous functions.

preprint2020arXiv

Instances of Computational Optimal Recovery: Refined Approximability Models

Models based on approximation capabilities have recently been studied in the context of Optimal Recovery. These models, however, are not compatible with overparametrization, since model- and data-consistent functions could then be unbounded. This drawback motivates the introduction of refined approximability models featuring an added boundedness condition. Thus, two new models are proposed in this article: one where the boundedness applies to the target functions (first type) and one where the boundedness applies to the approximants (second type). For both types of model, optimal maps for the recovery of linear functionals are first described on an abstract level before their efficient constructions are addressed. By exploiting techniques from semidefinite programming, these constructions are explicitly carried out on a common example involving polynomial subspaces of $\mathcal{C}[-1,1]$.

preprint2020arXiv

Learning from Non-Random Data in Hilbert Spaces: An Optimal Recovery Perspective

The notion of generalization in classical Statistical Learning is often attached to the postulate that data points are independent and identically distributed (IID) random variables. While relevant in many applications, this postulate may not hold in general, encouraging the development of learning frameworks that are robust to non-IID data. In this work, we consider the regression problem from an Optimal Recovery perspective. Relying on a model assumption comparable to choosing a hypothesis class, a learner aims at minimizing the worst-case error, without recourse to any probabilistic assumption on the data. We first develop a semidefinite program for calculating the worst-case error of any recovery map in finite-dimensional Hilbert spaces. Then, for any Hilbert space, we show that Optimal Recovery provides a formula which is user-friendly from an algorithmic point-of-view, as long as the hypothesis class is linear. Interestingly, this formula coincides with kernel ridgeless regression in some cases, proving that minimizing the average error and worst-case error can yield the same solution. We provide numerical experiments in support of our theoretical findings.

preprint2016arXiv

One-Bit Compressive Sensing of Dictionary-Sparse Signals

One-bit compressive sensing has extended the scope of sparse recovery by showing that sparse signals can be accurately reconstructed even when their linear measurements are subject to the extreme quantization scenario of binary samples---only the sign of each linear measurement is maintained. Existing results in one-bit compressive sensing rely on the assumption that the signals of interest are sparse in some fixed orthonormal basis. However, in most practical applications, signals are sparse with respect to an overcomplete dictionary, rather than a basis. There has already been a surge of activity to obtain recovery guarantees under such a generalized sparsity model in the classical compressive sensing setting. Here, we extend the one-bit framework to this important model, providing a unified theory of one-bit compressive sensing under dictionary sparsity. Specifically, we analyze several different algorithms---based on convex programming and on hard thresholding---and show that, under natural assumptions on the sensing matrix (satisfied by Gaussian matrices), these algorithms can efficiently recover analysis-dictionary-sparse signals in the one-bit model.

preprint2014arXiv

Exponential decay of reconstruction error from binary measurements of sparse signals

Binary measurements arise naturally in a variety of statistical and engineering applications. They may be inherent to the problem---e.g., in determining the relationship between genetics and the presence or absence of a disease---or they may be a result of extreme quantization. In one-bit compressed sensing it has recently been shown that the number of one-bit measurements required for signal estimation mirrors that of unquantized compressed sensing. Indeed, $s$-sparse signals in $\mathbb{R}^n$ can be estimated (up to normalization) from $Ω(s \log (n/s))$ one-bit measurements. Nevertheless, controlling the precise accuracy of the error estimate remains an open challenge. In this paper, we focus on optimizing the decay of the error as a function of the oversampling factor $λ:= m/(s \log(n/s))$, where $m$ is the number of measurements. It is known that the error in reconstructing sparse signals from standard one-bit measurements is bounded below by $Ω(λ^{-1})$. Without adjusting the measurement procedure, reducing this polynomial error decay rate is impossible. However, we show that an adaptive choice of the thresholds used for quantization may lower the error rate to $e^{-Ω(λ)}$. This improves upon guarantees for other methods of adaptive thresholding as proposed in Sigma-Delta quantization. We develop a general recursive strategy to achieve this exponential decay and two specific polynomial-time algorithms which fall into this framework, one based on convex programming and one on hard thresholding. This work is inspired by the one-bit compressed sensing model, in which the engineer controls the measurement procedure. Nevertheless, the principle is extendable to signal reconstruction problems in a variety of binary statistical models as well as statistical estimation problems like logistic regression.

preprint2013arXiv

On the Hermite spline conjecture and its connection to k-monotone densities

The k-monotone classes of densities defined on (0, \infty) have been known in the mathematical literature but were for the first time considered from a statistical point of view by Balabdaoui and Wellner (2007, 2010). In these works, the authors generalized the results established for monotone (k=1) and convex (k=2) densities by giving a characterization of the Maximum Likelihood and Least Square estimators (MLE and LSE) and deriving minimax bounds for rates of convergence. For k strictly larger than 2, the pointwise asymptotic behavior of the MLE and LSE studied by Balabdaoui and Wellner (2007) would show that the MLE and LSE attain the minimax lower bounds in a local pointwise sense. However, the theory assumes that a certain conjecture about the approximation error of a Hermite spline holds true. The main goal of the present note is to show why such a conjecture cannot be true. We also suggest how to bypass the conjecture and rebuild the key proofs in the limit theory of the estimators.

preprint2010arXiv

The Gelfand widths of $\ell_p$-balls for $0<p\leq 1$

We provide sharp lower and upper bounds for the Gelfand widths of $\ell_p$-balls in the $N$-dimensional $\ell_q^N$-space for $0<p\leq 1$ and $p<q \leq 2$. Such estimates are highly relevant to the novel theory of compressive sensing, and our proofs rely on methods from this area.

Simon Foucart

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

On the sparsity of LASSO minimizers in sparse data recovery

The Sparsity of LASSO-type Minimizers

Finer Metagenomic Reconstruction via Biodiversity Optimization

Instances of Computational Optimal Recovery: Dealing with Observation Errors

Instances of Computational Optimal Recovery: Refined Approximability Models

Learning from Non-Random Data in Hilbert Spaces: An Optimal Recovery Perspective

One-Bit Compressive Sensing of Dictionary-Sparse Signals

Exponential decay of reconstruction error from binary measurements of sparse signals

On the Hermite spline conjecture and its connection to k-monotone densities

The Gelfand widths of $\ell_p$-balls for $0<p\leq 1$