Researcher profile

Simon Foucart

Simon Foucart contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2022arXiv

On the sparsity of LASSO minimizers in sparse data recovery

We present a detailed analysis of the unconstrained $\ell_1$-weighted LASSO method for recovery of sparse data from its observation by randomly generated matrices, satisfying the Restricted Isometry Property (RIP) with constant $δ<1$, and subject to negligible measurement and compressibility errors. We prove that if the data is $k$-sparse, then the size of support of the LASSO minimizer, $s$, maintains a comparable sparsity, $s\leq C_δk$. For example, if $δ=0.7$ then $s< 11k$ and a slightly smaller $δ=0.4$ yields $s< 4k$. We also derive new $\ell_2/\ell_1$ error bounds which highlight precise dependence on $k$ and on the LASSO parameter $λ$, before the error is driven below the scale of negligible measurement/ and compressiblity errors.

preprint2022arXiv

The Sparsity of LASSO-type Minimizers

This note extends an attribute of the LASSO procedure to a whole class of related procedures, including square-root LASSO, square LASSO, LAD-LASSO, and an instance of generalized LASSO. Namely, under the assumption that the input matrix satisfies an $\ell_p$-restricted isometry property (which in some sense is weaker than the standard $\ell_2$-restricted isometry property assumption), it is shown that if the input vector comes from the exact measurement of a sparse vector, then the minimizer of any such LASSO-type procedure has sparsity comparable to the sparsity of the measured vector. The result remains valid in the presence of moderate measurement error when the regularization parameter is not too small.

preprint2020arXiv

Finer Metagenomic Reconstruction via Biodiversity Optimization

When analyzing communities of microorganisms from their sequenced DNA, an important task is taxonomic profiling: enumerating the presence and relative abundance of all organisms, or merely of all taxa, contained in the sample. This task can be tackled via compressive-sensing-based approaches, which favor communities featuring the fewest organisms among those consistent with the observed DNA data. Despite their successes, these parsimonious approaches sometimes conflict with biological realism by overlooking organism similarities. Here, we leverage a recently developed notion of biological diversity that simultaneously accounts for organism similarities and retains the optimization strategy underlying compressive-sensing-based approaches. We demonstrate that minimizing biological diversity still produces sparse taxonomic profiles and we experimentally validate superiority to existing compressive-sensing-based approaches. Despite showing that the objective function is almost never convex and often concave, generally yielding NP-hard problems, we exhibit ways of representing organism similarities for which minimizing diversity can be performed via a sequence of linear programs guaranteed to decrease diversity. Better yet, when biological similarity is quantified by $k$-mer co-occurrence (a popular notion in bioinformatics), minimizing diversity actually reduces to one linear program that can utilize multiple $k$-mer sizes to enhance performance. In proof-of-concept experiments, we verify that the latter procedure can lead to significant gains when taxonomically profiling a metagenomic sample, both in terms of reconstruction accuracy and computational performance. Reproducible code is available at https://github.com/dkoslicki/MinimizeBiologicalDiversity.

preprint2020arXiv

Instances of Computational Optimal Recovery: Dealing with Observation Errors

When attempting to recover functions from observational data, one naturally seeks to do so in an optimal manner with respect to some modeling assumption. With a focus put on the worst-case setting, this is the standard goal of Optimal Recovery. The distinctive twists here are the consideration of inaccurate data through some boundedness models and the emphasis on computational realizability. Several scenarios are unraveled through the efficient constructions of optimal recovery maps: local optimality under linearly or semidefinitely describable models, global optimality for the estimation of linear functionals under approximability models, and global near-optimality under approximability models in the space of continuous functions.

preprint2020arXiv

Instances of Computational Optimal Recovery: Refined Approximability Models

Models based on approximation capabilities have recently been studied in the context of Optimal Recovery. These models, however, are not compatible with overparametrization, since model- and data-consistent functions could then be unbounded. This drawback motivates the introduction of refined approximability models featuring an added boundedness condition. Thus, two new models are proposed in this article: one where the boundedness applies to the target functions (first type) and one where the boundedness applies to the approximants (second type). For both types of model, optimal maps for the recovery of linear functionals are first described on an abstract level before their efficient constructions are addressed. By exploiting techniques from semidefinite programming, these constructions are explicitly carried out on a common example involving polynomial subspaces of $\mathcal{C}[-1,1]$.

preprint2020arXiv

Learning from Non-Random Data in Hilbert Spaces: An Optimal Recovery Perspective

The notion of generalization in classical Statistical Learning is often attached to the postulate that data points are independent and identically distributed (IID) random variables. While relevant in many applications, this postulate may not hold in general, encouraging the development of learning frameworks that are robust to non-IID data. In this work, we consider the regression problem from an Optimal Recovery perspective. Relying on a model assumption comparable to choosing a hypothesis class, a learner aims at minimizing the worst-case error, without recourse to any probabilistic assumption on the data. We first develop a semidefinite program for calculating the worst-case error of any recovery map in finite-dimensional Hilbert spaces. Then, for any Hilbert space, we show that Optimal Recovery provides a formula which is user-friendly from an algorithmic point-of-view, as long as the hypothesis class is linear. Interestingly, this formula coincides with kernel ridgeless regression in some cases, proving that minimizing the average error and worst-case error can yield the same solution. We provide numerical experiments in support of our theoretical findings.