Source author record

Samuel Vaiter

Samuel Vaiter appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC Information Theory math.IT math.ST Statistics Theory Applications

Catalog footprint

What is connected

21works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

On the Hardness of Junking LLMs

Large language models (LLMs) are known to be vulnerable to jailbreak attacks, which typically rely on carefully designed prompts containing explicit semantic structure. These attacks generally operate by fixing an adversarial instruction and optimizing small adversarial components (e.g., suffixes or prefixes). In this setting, prompt structure is fundamental for performance, and recent results show that even simple random search can achieve strong performance when combined with sophisticated prompt design. Recently, it has been observed that harmful behaviors can be elicited even without the adversarial prompt, relying solely on optimized token sequences. This suggests the existence of natural backdoors, i.e., token sequences naturally emerged during LLMs training that trigger unsafe outputs without any meaningful instruction. However, despite these observations, this setting remains largely unexplored, and in particular the hardness of finding natural backdoors has not been assessed yet. In this work, we provide a first proof-of-concept study investigating the hardness of this task, which we refer to as the junking problem. We formalize it as the problem of finding token sequences that maximize the probability of generating a target prefix of harmful responses, propose a greedy random-search method to assess is such sequences can be discovered easily. Our results show that this problem is harder than standard jailbreak attacks, confirming the importance of semantic information in prompt design. At the same time, we find that our simple strategy is sufficient to solve it with a high success rate, suggesting that natural backdoors are present and easily recoverable. Finally, through perplexity analysis, we observe that the discovered token sequences lie in low-probability regions of the model distribution, supporting the hypothesis that they emerged implicitly from the training process.

preprint2026arXiv

Proximal basin hopping: global optimization with guarantees

Global optimization is a challenging problem, with plenty of algorithms displaying empirical success, but scarce theoretical backing. In this work, we propose a new theoretical framework called Proximal Basin Hopping (PBH), carefully tailored to combine proximal optimization and local minimization. We use it to construct a practical algorithm that converges to the global minimizer with high probability, when using a finite amount of samples. Proximal Basin Hopping outperforms well known algorithms with theoretical backing on standard synthetic hard functions, and real problems such as fitting scaling laws for deep learning. Furthermore, the higher the dimension, the better the performance gap.

preprint2022arXiv

Automatic differentiation of nonsmooth iterative algorithms

Differentiation along algorithms, i.e., piggyback propagation of derivatives, is now routinely used to differentiate iterative solvers in differentiable programming. Asymptotics is well understood for many smooth problems but the nondifferentiable case is hardly considered. Is there a limiting object for nonsmooth piggyback automatic differentiation (AD)? Does it have any variational meaning and can it be used effectively in machine learning? Is there a connection with classical derivative? All these questions are addressed under appropriate nonexpansivity conditions in the framework of conservative derivatives which has proved useful in understanding nonsmooth AD. For nonsmooth piggyback iterations, we characterize the attractor set of nonsmooth piggyback iterations as a set-valued fixed point which remains in the conservative framework. This has various consequences and in particular almost everywhere convergence of classical derivatives. Our results are illustrated on parametric convex optimization problems with forward-backward, Douglas-Rachford and Alternating Direction of Multiplier algorithms as well as the Heavy-Ball method.

preprint2022arXiv

Dual Extrapolation for Sparse Generalized Linear Models

Generalized Linear Models (GLM) form a wide class of regression and classification models, where prediction is a function of a linear combination of the input variables. For statistical inference in high dimension, sparsity inducing regularizations have proven to be useful while offering statistical guarantees. However, solving the resulting optimization problems can be challenging: even for popular iterative algorithms such as coordinate descent, one needs to loop over a large number of variables. To mitigate this, techniques known as screening rules and working sets diminish the size of the optimization problem at hand, either by progressively removing variables, or by solving a growing sequence of smaller problems. For both techniques, significant variables are identified thanks to convex duality arguments. In this paper, we show that the dual iterates of a GLM exhibit a Vector AutoRegressive (VAR) behavior after sign identification, when the primal problem is solved with proximal gradient descent or cyclic coordinate descent. Exploiting this regularity, one can construct dual points that offer tighter certificates of optimality, enhancing the performance of screening rules and helping to design competitive working set algorithms.

preprint2022arXiv

Implicit differentiation for fast hyperparameter selection in non-smooth convex learning

Finding the optimal hyperparameters of a model can be cast as a bilevel optimization problem, typically solved using zero-order techniques. In this work we study first-order methods when the inner optimization problem is convex but non-smooth. We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian. Using implicit differentiation, we show it is possible to leverage the non-smoothness of the inner problem to speed up the computation. Finally, we provide a bound on the error made on the hypergradient when the inner optimization problem is solved approximately. Results on regression and classification problems reveal computational benefits for hyperparameter optimization, especially when multiple hyperparameters are required.

preprint2022arXiv

The derivatives of Sinkhorn-Knopp converge

We show that the derivatives of the Sinkhorn-Knopp algorithm, or iterative proportional fitting procedure, converge towards the derivatives of the entropic regularization of the optimal transport problem with a locally uniform linear convergence rate.

preprint2022arXiv

The Geometry of Sparse Analysis Regularization

Analysis sparsity is a common prior in inverse problem or machine learning including special cases such as Total Variation regularization, Edge Lasso and Fused Lasso. We study the geometry of the solution set (a polyhedron) of the analysis l1-regularization (with l2 data fidelity term) when it is not reduced to a singleton without any assumption of the analysis dictionary nor the degradation operator. In contrast with most theoretical work, we do not focus on giving uniqueness and/or stability results, but rather describe a worst-case scenario where the solution set can be big in terms of dimension. Leveraging a fine analysis of the sub-level set of the regularizer itself, we draw a connection between support of a solution and the minimal face containing it, and in particular prove that extremal points can be recovered thanks to an algebraic test. Moreover, we draw a connection between the sign pattern of a solution and the ambient dimension of the smallest face containing it. Finally, we show that any arbitrary sub-polyhedra of the level set can be seen as a solution set of sparse analysis regularization with explicit parameters.

preprint2020arXiv

Automated data-driven selection of the hyperparameters for Total-Variation based texture segmentation

Penalized Least Squares are widely used in signal and image processing. Yet, it suffers from a major limitation since it requires fine-tuning of the regularization parameters. Under assumptions on the noise probability distribution, Stein-based approaches provide unbiased estimator of the quadratic risk. The Generalized Stein Unbiased Risk Estimator is revisited to handle correlated Gaussian noise without requiring to invert the covariance matrix. Then, in order to avoid expansive grid search, it is necessary to design algorithmic scheme minimizing the quadratic risk with respect to regularization parameters. This work extends the Stein's Unbiased GrAdient estimator of the Risk of Deledalle et al. to the case of correlated Gaussian noise, deriving a general automatic tuning of regularization parameters. First, the theoretical asymptotic unbiasedness of the gradient estimator is demonstrated in the case of general correlated Gaussian noise. Then, the proposed parameter selection strategy is particularized to fractal texture segmentation, where problem formulation naturally entails inter-scale and spatially correlated noise. Numerical assessment is provided, as well as discussion of the practical issues.

preprint2020arXiv

Implicit differentiation of Lasso-type models for hyperparameter optimization

Setting regularization parameters for Lasso-type estimators is notoriously difficult, though crucial in practice. The most popular hyperparameter optimization approach is grid-search using held-out validation data. Grid-search however requires to choose a predefined grid for each parameter, which scales exponentially in the number of parameters. Another approach is to cast hyperparameter optimization as a bi-level optimization problem, one can solve by gradient descent. The key challenge for these methods is the estimation of the gradient with respect to the hyperparameters. Computing this gradient via forward or backward automatic differentiation is possible yet usually suffers from high memory consumption. Alternatively implicit differentiation typically involves solving a linear system which can be prohibitive and numerically unstable in high dimension. In addition, implicit differentiation usually assumes smooth loss functions, which is not the case for Lasso-type problems. This work introduces an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems. Our approach scales to high-dimensional data by leveraging the sparsity of the solutions. Experiments demonstrate that the proposed method outperforms a large number of standard methods to optimize the error on held-out data, or the Stein Unbiased Risk Estimator (SURE).

preprint2020arXiv

Sparse and Smooth: improved guarantees for Spectral Clustering in the Dynamic Stochastic Block Model

In this paper, we analyse classical variants of the Spectral Clustering (SC) algorithm in the Dynamic Stochastic Block Model (DSBM). Existing results show that, in the relatively sparse case where the expected degree grows logarithmically with the number of nodes, guarantees in the static case can be extended to the dynamic case and yield improved error bounds when the DSBM is sufficiently smooth in time, that is, the communities do not change too much between two time steps. We improve over these results by drawing a new link between the sparsity and the smoothness of the DSBM: the more regular the DSBM is, the more sparse it can be, while still guaranteeing consistent recovery. In particular, a mild condition on the smoothness allows to treat the sparse case with bounded degree. We also extend these guarantees to the normalized Laplacian, and as a by-product of our analysis, we obtain to our knowledge the best spectral concentration bound available for the normalized Laplacian of matrices with independent Bernoulli entries.

preprint2016arXiv

The Degrees of Freedom of Partly Smooth Regularizers

In this paper, we are concerned with regularized regression problems where the prior regularizer is a proper lower semicontinuous and convex function which is also partly smooth relative to a Riemannian submanifold. This encompasses as special cases several known penalties such as the Lasso ($\ell^1$-norm), the group Lasso ($\ell^1-\ell^2$-norm), the $\ell^\infty$-norm, and the nuclear norm. This also includes so-called analysis-type priors, i.e. composition of the previously mentioned penalties with linear operators, typical examples being the total variation or fused Lasso penalties.We study the sensitivity of any regularized minimizer to perturbations of the observations and provide its precise local parameterization.Our main sensitivity analysis result shows that the predictor moves locally stably along the same active submanifold as the observations undergo small perturbations. This local stability is a consequence of the smoothness of the regularizer when restricted to the active submanifold, which in turn plays a pivotal role to get a closed form expression for the variations of the predictor w.r.t. observations. We also show that, for a variety of regularizers, including polyhedral ones or the group Lasso and its analysis counterpart, this divergence formula holds Lebesgue almost everywhere.When the perturbation is random (with an appropriate continuous distribution), this allows us to derive an unbiased estimator of the degrees of freedom and of the risk of the estimator prediction.Our results hold true without requiring the design matrix to be full column rank.They generalize those already known in the literature such as the Lasso problem, the general Lasso problem (analysis $\ell^1$-penalty), or the group Lasso where existing results for the latter assume that the design is full column rank.

preprint2014arXiv

Low Complexity Regularization of Linear Inverse Problems

Inverse problems and regularization theory is a central theme in contemporary signal processing, where the goal is to reconstruct an unknown signal from partial indirect, and possibly noisy, measurements of it. A now standard method for recovering the unknown signal is to solve a convex optimization problem that enforces some prior knowledge about its structure. This has proved efficient in many problems routinely encountered in imaging sciences, statistics and machine learning. This chapter delivers a review of recent advances in the field where the regularization prior promotes solutions conforming to some notion of simplicity/low-complexity. These priors encompass as popular examples sparsity and group sparsity (to capture the compressibility of natural signals and images), total variation and analysis sparsity (to promote piecewise regularity), and low-rank (as natural extension of sparsity to matrix-valued data). Our aim is to provide a unified treatment of all these regularizations under a single umbrella, namely the theory of partial smoothness. This framework is very general and accommodates all low-complexity regularizers just mentioned, as well as many others. Partial smoothness turns out to be the canonical way to encode low-dimensional models that can be linear spaces or more general smooth manifolds. This review is intended to serve as a one stop shop toward the understanding of the theoretical properties of the so-regularized solutions. It covers a large spectrum including: (i) recovery guarantees and stability to noise, both in terms of $\ell^2$-stability and model (manifold) identification; (ii) sensitivity analysis to perturbations of the parameters involved (in particular the observations), with applications to unbiased risk estimation ; (iii) convergence properties of the forward-backward proximal splitting scheme, that is particularly well suited to solve the corresponding large-scale regularized optimization problem.

preprint2014arXiv

Model Consistency of Partly Smooth Regularizers

This paper studies least-square regression penalized with partly smooth convex regularizers. This class of functions is very large and versatile allowing to promote solutions conforming to some notion of low-complexity. Indeed, they force solutions of variational problems to belong to a low-dimensional manifold (the so-called model) which is stable under small perturbations of the function. This property is crucial to make the underlying low-complexity model robust to small noise. We show that a generalized "irrepresentable condition" implies stable model selection under small noise perturbations in the observations and the design matrix, when the regularization parameter is tuned proportionally to the noise level. This condition is shown to be almost a necessary condition. We then show that this condition implies model consistency of the regularized estimator. That is, with a probability tending to one as the number of measurements increases, the regularized estimator belongs to the correct low-dimensional model manifold. This work unifies and generalizes several previous ones, where model consistency is known to hold for sparse, group sparse, total variation and low-rank regularizations.

preprint2014arXiv

Model Selection with Low Complexity Priors

Regularization plays a pivotal role when facing the challenge of solving ill-posed inverse problems, where the number of observations is smaller than the ambient dimension of the object to be estimated. A line of recent work has studied regularization models with various types of low-dimensional structures. In such settings, the general approach is to solve a regularized optimization problem, which combines a data fidelity term and some regularization penalty that promotes the assumed low-dimensional/simple structure. This paper provides a general framework to capture this low-dimensional structure through what we coin partly smooth functions relative to a linear manifold. These are convex, non-negative, closed and finite-valued functions that will promote objects living on low-dimensional subspaces. This class of regularizers encompasses many popular examples such as the L1 norm, L1-L2 norm (group sparsity), as well as several others including the Linfty norm. We also show that the set of partly smooth functions relative to a linear manifold is closed under addition and pre-composition by a linear operator, which allows to cover mixed regularization, and the so-called analysis-type priors (e.g. total variation, fused Lasso, finite-valued polyhedral gauges). Our main result presents a unified sharp analysis of exact and robust recovery of the low-dimensional subspace model associated to the object to recover from partial measurements. This analysis is illustrated on a number of special and previously studied cases, and on an analysis of the performance of Linfty regularization in a compressed sensing scenario.

preprint2014arXiv

Stein Unbiased GrAdient estimator of the Risk (SUGAR) for multiple parameter selection

Algorithms to solve variational regularization of ill-posed inverse problems usually involve operators that depend on a collection of continuous parameters. When these operators enjoy some (local) regularity, these parameters can be selected using the so-called Stein Unbiased Risk Estimate (SURE). While this selection is usually performed by exhaustive search, we address in this work the problem of using the SURE to efficiently optimize for a collection of continuous parameters of the model. When considering non-smooth regularizers, such as the popular l1-norm corresponding to soft-thresholding mapping, the SURE is a discontinuous function of the parameters preventing the use of gradient descent optimization techniques. Instead, we focus on an approximation of the SURE based on finite differences as proposed in (Ramani et al., 2008). Under mild assumptions on the estimation mapping, we show that this approximation is a weakly differentiable function of the parameters and its weak gradient, coined the Stein Unbiased GrAdient estimator of the Risk (SUGAR), provides an asymptotically (with respect to the data dimension) unbiased estimate of the gradient of the risk. Moreover, in the particular case of soft-thresholding, it is proved to be also a consistent estimator. This gradient estimate can then be used as a basis to perform a quasi-Newton optimization. The computation of the SUGAR relies on the closed-form (weak) differentiation of the non-smooth function. We provide its expression for a large class of iterative methods including proximal splitting ones and apply our strategy to regularizations involving non-smooth convex structured penalties. Illustrations on various image restoration and matrix completion problems are given.

preprint2013arXiv

Robust Polyhedral Regularization

In this paper, we establish robustness to noise perturbations of polyhedral regularization of linear inverse problems. We provide a sufficient condition that ensures that the polyhedral face associated to the true vector is equal to that of the recovered one. This criterion also implies that the $\ell^2$ recovery error is proportional to the noise level for a range of parameter. Our criterion is expressed in terms of the hyperplanes supporting the faces of the unit polyhedral ball of the regularization. This generalizes to an arbitrary polyhedral regularization results that are known to hold for sparse synthesis and analysis $\ell^1$ regularization which are encompassed in this framework. As a byproduct, we obtain recovery guarantees for $\ell^\infty$ and $\ell^1-\ell^\infty$ regularization.

preprint2012arXiv

Local Behavior of Sparse Analysis Regularization: Applications to Risk Estimation

In this paper, we aim at recovering an unknown signal x0 from noisy L1measurements y=Phi*x0+w, where Phi is an ill-conditioned or singular linear operator and w accounts for some noise. To regularize such an ill-posed inverse problem, we impose an analysis sparsity prior. More precisely, the recovery is cast as a convex optimization program where the objective is the sum of a quadratic data fidelity term and a regularization term formed of the L1-norm of the correlations between the sought after signal and atoms in a given (generally overcomplete) dictionary. The L1-sparsity analysis prior is weighted by a regularization parameter lambda>0. In this paper, we prove that any minimizers of this problem is a piecewise-affine function of the observations y and the regularization parameter lambda. As a byproduct, we exploit these properties to get an objectively guided choice of lambda. In particular, we develop an extension of the Generalized Stein Unbiased Risk Estimator (GSURE) and show that it is an unbiased and reliable estimator of an appropriately defined risk. The latter encompasses special cases such as the prediction risk, the projection risk and the estimation risk. We apply these risk estimators to the special case of L1-sparsity analysis regularization. We also discuss implementation issues and propose fast algorithms to solve the L1 analysis minimization problem and to compute the associated GSURE. We finally illustrate the applicability of our framework to parameter(s) selection on several imaging problems.

preprint2012arXiv

Risk estimation for matrix recovery with spectral regularization

In this paper, we develop an approach to recursively estimate the quadratic risk for matrix recovery problems regularized with spectral functions. Toward this end, in the spirit of the SURE theory, a key step is to compute the (weak) derivative and divergence of a solution with respect to the observations. As such a solution is not available in closed form, but rather through a proximal splitting algorithm, we propose to recursively compute the divergence from the sequence of iterates. A second challenge that we unlocked is the computation of the (weak) derivative of the proximity operator of a spectral function. To show the potential applicability of our approach, we exemplify it on a matrix completion problem to objectively and automatically select the regularization parameter.

preprint2012arXiv

Robust Sparse Analysis Regularization

This paper investigates the theoretical guarantees of L1-analysis regularization when solving linear inverse problems. Most of previous works in the literature have mainly focused on the sparse synthesis prior where the sparsity is measured as the L1 norm of the coefficients that synthesize the signal from a given dictionary. In contrast, the more general analysis regularization minimizes the L1 norm of the correlations between the signal and the atoms in the dictionary, where these correlations define the analysis support. The corresponding variational problem encompasses several well-known regularizations such as the discrete total variation and the Fused Lasso. Our main contributions consist in deriving sufficient conditions that guarantee exact or partial analysis support recovery of the true signal in presence of noise. More precisely, we give a sufficient condition to ensure that a signal is the unique solution of the L1-analysis regularization in the noiseless case. The same condition also guarantees exact analysis support recovery and L2-robustness of the L1-analysis minimizer vis-a-vis an enough small noise in the measurements. This condition turns to be sharp for the robustness of the analysis support. To show partial support recovery and L2-robustness to an arbitrary bounded noise, we introduce a stronger sufficient condition. When specialized to the L1-synthesis regularization, our results recover some corresponding recovery and robustness guarantees previously known in the literature. From this perspective, our work is a generalization of these results. We finally illustrate these theoretical findings on several examples to study the robustness of the 1-D total variation and Fused Lasso regularizations.

preprint2012arXiv

The Degrees of Freedom of the Group Lasso

This paper studies the sensitivity to the observations of the block/group Lasso solution to an overdetermined linear regression model. Such a regularization is known to promote sparsity patterns structured as nonoverlapping groups of coefficients. Our main contribution provides a local parameterization of the solution with respect to the observations. As a byproduct, we give an unbiased estimate of the degrees of freedom of the group Lasso. Among other applications of such results, one can choose in a principled and objective way the regularization parameter of the Lasso through model selection criteria.

preprint2012arXiv

The degrees of freedom of the Group Lasso for a General Design

In this paper, we are concerned with regression problems where covariates can be grouped in nonoverlapping blocks, and where only a few of them are assumed to be active. In such a situation, the group Lasso is an at- tractive method for variable selection since it promotes sparsity of the groups. We study the sensitivity of any group Lasso solution to the observations and provide its precise local parameterization. When the noise is Gaussian, this allows us to derive an unbiased estimator of the degrees of freedom of the group Lasso. This result holds true for any fixed design, no matter whether it is under- or overdetermined. With these results at hand, various model selec- tion criteria, such as the Stein Unbiased Risk Estimator (SURE), are readily available which can provide an objectively guided choice of the optimal group Lasso fit.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint