Source author record

Sylvain Sardy

Sylvain Sardy appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Machine Learning Applications physics.data-an

Catalog footprint

What is connected

8works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks

To fit sparse linear associations, a LASSO sparsity inducing penalty with a single hyperparameter provably allows to recover the important features (needles) with high probability in certain regimes even if the sample size is smaller than the dimension of the input vector (haystack). More recently learners known as artificial neural networks (ANN) have shown great successes in many machine learning tasks, in particular fitting nonlinear associations. Small learning rate, stochastic gradient descent algorithm and large training set help to cope with the explosion in the number of parameters present in deep neural networks. Yet few ANN learners have been developed and studied to find needles in nonlinear haystacks. Driven by a single hyperparameter, our ANN learner, like for sparse linear associations, exhibits a phase transition in the probability of retrieving the needles, which we do not observe with other ANN learners. To select our penalty parameter, we generalize the universal threshold of Donoho and Johnstone (1994) which is a better rule than the conservative (too many false detections) and expensive cross-validation. In the spirit of simulated annealing, we propose a warm-start sparsity inducing algorithm to solve the high-dimensional, non-convex and non-differentiable optimization problem. We perform precise Monte Carlo simulations to show the effectiveness of our approach.

preprint2022arXiv

Robust Lasso-Zero for sparse corruption and model selection with missing covariates

We propose Robust Lasso-Zero, an extension of the Lasso-Zero methodology, initially introduced for sparse linear models, to the sparse corruptions problem. We give theoretical guarantees on the sign recovery of the parameters for a slightly simplified version of the estimator, called Thresholded Justice Pursuit. The use of Robust Lasso-Zero is showcased for variable selection with missing values in the covariates. In addition to not requiring the specification of a model for the covariates, nor estimating their covariance matrix or the noise variance, the method has the great advantage of handling missing not-at random values without specifying a parametric model. Numerical experiments and a medical application underline the relevance of Robust Lasso-Zero in such a context with few available competitors. The method is easy to use and implemented in the R library lass0.

preprint2022arXiv

Sparse additive models in high dimensions with wavelets

In multivariate regression, when covariates are numerous, it is often reasonable to assume that only a small number of them has predictive information. In some medical applications for instance, it is believed that only a few genes out of thousands are responsible for cancers. In that case, the aim is not only to propose a good fit, but also to select the relevant covariates (genes). We propose to perform model selection with additive models in high dimensions (sample size and number of covariates). Our approach is computationally efficient thanks to fast wavelet transforms, it does not rely on cross validation, and it solves a convex optimization problem for a prescribed penalty parameter, called the quantile universal threshold. We also propose a second rule based on Stein unbiased risk estimation geared towards prediction. We use Monte Carlo simulations and real data to compare various methods based on false discovery rate (FDR), true positive rate (TPR) and mean squared error. Our approach is the only one to handle high dimensions, and has the best FDR--TPR trade-off.

preprint2020arXiv

What needles do sparse neural networks find in nonlinear haystacks

Using a sparsity inducing penalty in artificial neural networks (ANNs) avoids over-fitting, especially in situations where noise is high and the training set is small in comparison to the number of features. For linear models, such an approach provably also recovers the important features with high probability in regimes for a well-chosen penalty parameter. The typical way of setting the penalty parameter is by splitting the data set and performing the cross-validation, which is (1) computationally expensive and (2) not desirable when the data set is already small to be further split (for example, whole-genome sequence data). In this study, we establish the theoretical foundation to select the penalty parameter without cross-validation based on bounding with a high probability the infinite norm of the gradient of the loss function at zero under the zero-feature assumption. Our approach is a generalization of the universal threshold of Donoho and Johnstone (1994) to nonlinear ANN learning. We perform a set of comprehensive Monte Carlo simulations on a simple model, and the numerical results show the effectiveness of the proposed approach.

preprint2016arXiv

Threshold Selection for Total Variation Denoising

Total variation (TV) denoising is a nonparametric smoothing method that has good properties for preserving sharp edges and contours in objects with spatial structures like natural images. The estimate is sparse in the sense that TV reconstruction leads to a piecewise constant function with a small number of jumps. A threshold parameter controls the number of jumps and the quality of the estimation. In practice, this threshold is often selected by minimizing a goodness-of-fit criterion like cross-validation, which can be costly as it requires solving the high-dimensional and non-differentiable TV optimization problem many times. We propose instead a two step adaptive procedure via a connection to large deviation of stochastic processes. We also give conditions under which TV denoising achieves exact segmentation. We then apply our procedure to denoise a collection of 1D and 2D test signals verifying the effectiveness of our approach in practice.

preprint2014arXiv

Adaptive Shrinkage of singular values

To recover a low rank structure from a noisy matrix, truncated singular value decomposition has been extensively used and studied. Recent studies suggested that the signal can be better estimated by shrinking the singular values. We pursue this line of research and propose a new estimator offering a continuum of thresholding and shrinking functions. To avoid an unstable and costly cross-validation search, we propose new rules to select two thresholding and shrinking parameters from the data. In particular we propose a generalized Stein unbiased risk estimation criterion that does not require knowledge of the variance of the noise and that is computationally fast. A Monte Carlo simulation reveals that our estimator outperforms the tested methods in terms of mean squared error on both low-rank and general signal matrices across different signal to noise ratio regimes. In addition, it accurately estimates the rank of the signal when it is detectable.

preprint2013arXiv

Blockwise and coordinatewise thresholding to combine tests of different natures in modern ANOVA

We derive new tests for fixed and random ANOVA based on a thresholded point estimate. The pivotal quantity is the threshold that sets all the coefficients of the null hypothesis to zero. Thresholding can be employed coordinatewise or blockwise, or both, which leads to tests with good power properties under alternative hypotheses that are either sparse or dense.

preprint2011arXiv

Smooth blockwise iterative thresholding: a smooth fixed point estimator based on the likelihood's block gradient

The proposed smooth blockwise iterative thresholding estimator (SBITE) is a model selection technique defined as a fixed point reached by iterating a likelihood gradient-based thresholding function. The smooth James-Stein thresholding function has two regularization parameters $λ$ and $ν$, and a smoothness parameter $s$. It enjoys smoothness like ridge regression and selects variables like lasso. Focusing on Gaussian regression, we show that SBITE is uniquely defined, and that its Stein unbiased risk estimate is a smooth function of $λ$ and $ν$, for better selection of the two regularization parameters. We perform a Monte-Carlo simulation to investigate the predictive and oracle properties of this smooth version of adaptive lasso. The motivation is a gravitational wave burst detection problem from several concomitant time series. A nonparametric wavelet-based estimator is developed to combine information from all captors by block-thresholding multiresolution coefficients. We study how the smoothness parameter $s$ tempers the erraticity of the risk estimate, and derive a universal threshold, an information criterion and an oracle inequality in this canonical setting.

Sylvain Sardy

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks

Robust Lasso-Zero for sparse corruption and model selection with missing covariates

Sparse additive models in high dimensions with wavelets

What needles do sparse neural networks find in nonlinear haystacks

Threshold Selection for Total Variation Denoising

Adaptive Shrinkage of singular values

Blockwise and coordinatewise thresholding to combine tests of different natures in modern ANOVA

Smooth blockwise iterative thresholding: a smooth fixed point estimator based on the likelihood's block gradient