Source author record

Yohann De Castro

Yohann De Castro appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Machine Learning math.PR Methodology Information Theory math.IT math.OC Social and Information Networks Applications Computation eess.SP math.FA math.NA

Catalog footprint

What is connected

21works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Fast Spawn\&Prune (FS\&P): Global convergence of stochastic conic particle gradient descent via birth/death process

We investigate the global optimization of the objective function arising in continuous sparse regression, specifically the Beurling LASSO (BLASSO), over the space of measures. While Conic Particle Gradient Descent (CPGD) methods are computationally efficient, they may become trapped in local minima due to the non-convexity of the parameterization. To overcome this limitation, we introduce Fast Spawn\&Prune (FS\&P), a stochastic algorithm that extends FastPart introduced in De Castro et al. (2025) and combines CPGD with a birth-death process. The birth mechanism ensures asymptotic global exploration by introducing particles in regions where first-order optimality conditions are violated, while the death process preserves computational efficiency by pruning non-informative particles. We provide the first theoretical guarantee of global convergence for this class of discrete-time stochastic algorithms, without requiring exponentially large initializations. Furthermore, we derive explicit convergence rates for the excess risk, which scale as $\mathcal{O}\big(\left(\log K / K\right)^{\frac{1}{2(2+d)}}\big)$, where $K$ denotes the number of iterations and d the dimension of the domain, thereby quantifying the trade-off between global exploration and local refinement. Moreover, the sample complexity is $\mathcal{O}\big(N^{-\frac{1}{4(2+d)}}\big)$ (up to logarithmic factors). We also propose a horizon-free variant that does not require prior knowledge of the iteration budget.

preprint2022arXiv

Concentration inequality for U-statistics of order two for uniformly ergodic Markov chains

We prove a new concentration inequality for U-statistics of order two for uniformly ergodic Markov chains. Working with bounded and $π$-canonical kernels, we show that we can recover the convergence rate of Arcones and Gin{é} who proved a concentration result for U-statistics of independent random variables and canonical kernels. Our result allows for a dependence of the kernels $h_{i,j}$ with the indexes in the sums, which prevents the use of standard blocking tools. Our proof relies on an inductive analysis where we use martingale techniques, uniform ergodicity, Nummelin splitting and Bernstein's type inequality. Assuming further that the Markov chain starts from its invariant distribution, we prove a Bernstein-type concentration inequality that provides sharper convergence rate for small variance terms.

preprint2022arXiv

Markov Random Geometric Graph (MRGG): A Growth Model for Temporal Dynamic Networks

We introduce Markov Random Geometric Graphs (MRGGs), a growth model for temporal dynamic networks. It is based on a Markovian latent space dynamic: consecutive latent points are sampled on the Euclidean Sphere using an unknown Markov kernel; and two nodes are connected with a probability depending on a unknown function of their latent geodesic distance. More precisely, at each stamp-time $k$ we add a latent point $X_k$ sampled by jumping from the previous one $X_{k-1}$ in a direction chosen uniformly $Y_k$ and with a length $r_k$ drawn from an unknown distribution called the latitude function. The connection probabilities between each pair of nodes are equal to the envelope function of the distance between these two latent points. We provide theoretical guarantees for the non-parametric estimation of the latitude and the envelope functions.We propose an efficient algorithm that achieves those non-parametric estimation tasks based on an ad-hoc Hierarchical Agglomerative Clustering approach. As a by product, we show how MRGGs can be used to detect dependence structure in growing graphs and to solve link prediction problems.

preprint2022arXiv

Minimax Estimation of Partially-Observed Vector AutoRegressions

High-dimensional time series are a core ingredient of the statistical modeling toolkit, for which numerous estimation methods are known.But when observations are scarce or corrupted, the learning task becomes much harder.The question is: how much harder? In this paper, we study the properties of a partially-observed Vector AutoRegressive process, which is a state-space model endowed with a stochastic observation mechanism.Our goal is to estimate its sparse transition matrix, but we only have access to a small and noisy subsample of the state components.Interestingly, the sampling process itself is random and can exhibit temporal correlations, a feature shared by many realistic data acquisition scenarios.We start by describing an estimator based on the Yule-Walker equation and the Dantzig selector, and we give an upper bound on its non-asymptotic error.Then, we provide a matching minimax lower bound, thus proving near-optimality of our estimator.The convergence rate we obtain sheds light on the role of several key parameters such as the sampling ratio, the amount of noise and the number of non-zero coefficients in the transition matrix.These theoretical findings are commented and illustrated by numerical experiments on simulated data.

preprint2022arXiv

Random Geometric Graph: Some recent developments and perspectives

The Random Geometric Graph (RGG) is a random graph model for network data with an underlying spatial representation. Geometry endows RGGs with a rich dependence structure and often leads to desirable properties of real-world networks such as the small-world phenomenon and clustering. Originally introduced to model wireless communication networks, RGGs are now very popular with applications ranging from network user profiling to protein-protein interactions in biology. RGGs are also of purely theoretical interest since the underlying geometry gives rise to challenging mathematical questions. Their resolutions involve results from probability, statistics, combinatorics or information theory, placing RGGs at the intersection of a large span of research communities. This paper surveys the recent developments in RGGs from the lens of high dimensional settings and non-parametric inference. We also explain how this model differs from classical community based random graph models and we review recent works that try to take the best of both worlds. As a by-product, we expose the scope of the mathematical tools used in the proofs.

preprint2021arXiv

Forecasting Nonnegative Time Series via Sliding Mask Method (SMM) and Latent Clustered Forecast (LCF)

We consider nonnegative time series forecasting framework. Based on recent advances in Nonnegative Matrix Factorization (NMF) and Archetypal Analysis, we introduce two procedures referred to as Sliding Mask Method (SMM) and Latent Clustered Forecast (LCF). SMM is a simple and powerful method based on time window prediction using Completion of Nonnegative Matrices. This new procedure combines low nonnegative rank decomposition and matrix completion where the hidden values are to be forecasted. LCF is two stage: it leverages archetypal analysis for dimension reduction and clustering of time series, then it uses any black-box supervised forecast solver on the clustered latent representation. Theoretical guarantees on uniqueness and robustness of the solution of NMF Completion-type problems are also provided for the first time. Finally, numerical experiments on real-world and synthetic data-set confirms forecasting accuracy for both the methodologies.

preprint2020arXiv

Adaptive Estimation of Nonparametric Geometric Graphs

This article studies the recovery of graphons when they are convolution kernels on compact (symmetric) metric spaces. This case is of particular interest since it covers the situation where the probability of an edge depends only on some unknown nonparametric function of the distance between latent points, referred to as Nonparametric Geometric Graphs (NGG). In this setting, adaptive estimation of NGG is possible using a spectral procedure combined with a Goldenshluger-Lepski adaptation method. The latent spaces covered by our framework encompass (among others) compact symmetric spaces of rank one, namely real spheres and projective spaces. For these latter, explicit computations of the eigen-basis and of the model complexity can be achieved, leading to quantitative non-asymptotic results. The time complexity of our method scales cubicly in the size of the graph and exponentially in the regularity of the graphon. Hence, this paper offers an algorithmically and theoretically efficient procedure to estimate smooth NGG. As a by product, this paper shows a non-asymptotic concentration result on the spectrum of integral operators defined by symmetric kernels (not necessarily positive).

preprint2020arXiv

SuperMix: Sparse Regularization for Mixtures

This paper investigates the statistical estimation of a discrete mixing measure $μ$0 involved in a kernel mixture model. Using some recent advances in l1-regularization over the space of measures, we introduce a "data fitting and regularization" convex program for estimating $μ$0 in a grid-less manner from a sample of mixture law, this method is referred to as Beurling-LASSO. Our contribution is twofold: we derive a lower bound on the bandwidth of our data fitting term depending only on the support of $μ$0 and its so-called "minimum separation" to ensure quantitative support localization error bounds; and under a so-called "non-degenerate source condition" we derive a non-asymptotic support stability property. This latter shows that for a sufficiently large sample size n, our estimator has exactly as many weighted Dirac masses as the target $μ$0 , converging in amplitude and localization towards the true ones. Finally, we also introduce some tractable algorithms for solving this convex program based on "Sliding Frank-Wolfe" or "Conic Particle Gradient Descent". Statistical performances of this estimator are investigated designing a so-called "dual certificate", which is appropriate to our setting. Some classical situations, as e.g. mixtures of super-smooth distributions (e.g. Gaussian distributions) or ordinary-smooth distributions (e.g. Laplace distributions), are discussed at the end of the paper.

preprint2016arXiv

Adapting to unknown noise level in sparse deconvolution

In this paper, we study sparse spike deconvolution over the space of complex-valued measures when the input measure is a finite sum of Dirac masses. We introduce a modified version of the Beurling Lasso (BLasso), a semi-definite program that we refer to as the Concomitant Beurling Lasso (CBLasso). This new procedure estimates the target measure and the unknown noise level simultaneously. Contrary to previous estimators in the literature, theory holds for a tuning parameter that depends only on the sample size, so that it can be used for unknown noise level problems. Consistent noise level estimation is standardly proved. As for Radon measure estimation, theoretical guarantees match the previous state-of-the-art results in Super-Resolution regarding minimax prediction and localization. The proofs are based on a bound on the noise level given by a new tail estimate of the supremum of a stationary non-Gaussian process through the Rice method.

preprint2016arXiv

Recovering Multiple Nonnegative Time Series From a Few Temporal Aggregates

Motivated by electricity consumption metering, we extend existing nonnegative matrix factorization (NMF) algorithms to use linear measurements as observations, instead of matrix entries. The objective is to estimate multiple time series at a fine temporal scale from temporal aggregates measured on each individual series. Furthermore, our algorithm is extended to take into account individual autocorrelation to provide better estimation, using a recent convex relaxation of quadratically constrained quadratic program. Extensive experiments on synthetic and real-world electricity consumption datasets illustrate the effectiveness of our matrix recovery algorithms.

preprint2015arXiv

Consistent estimation of the filtering and marginal smoothing distributions in nonparametric hidden Markov models

In this paper, we consider the filtering and smoothing recursions in nonparametric finite state space hidden Markov models (HMMs) when the parameters of the model are unknown and replaced by estimators. We provide an explicit and time uniform control of the filtering and smoothing errors in total variation norm as a function of the parameter estimation errors. We prove that the risk for the filtering and smoothing errors may be uniformly upper bounded by the risk of the estimators. It has been proved very recently that statistical inference for finite state space nonparametric HMMs is possible. We study how the recent spectral methods developed in the parametric setting may be extended to the nonparametric framework and we give explicit upper bounds for the L2-risk of the nonparametric spectral estimators. When the observation space is compact, this provides explicit rates for the filtering and smoothing errors in total variation norm. The performance of the spectral method is assessed with simulated data for both the estimation of the (nonparametric) conditional distribution of the observations and the estimation of the marginal smoothing distributions.

preprint2015arXiv

Minimax adaptive estimation of nonparametric hidden Markov models

We consider stationary hidden Markov models with finite state space and nonparametric modeling of the emission distributions. It has remained unknown until very recently that such models are identifiable. In this paper, we propose a new penalized least-squares esti-mator for the emission distributions which is statistically optimal and practically tractable. We prove a non asymptotic oracle inequality for our nonparametric estimator of the emission distributions. A consequence is that this new estimator is rate minimax adaptive up to a logarithmic term. Our methodology is based on projections of the emission distributions onto nested subspaces of increasing complexity. The popular spectral estimators are unable to achieve the optimal rate but may be used as initial points in our procedure. Simulations are given that show the improvement obtained when applying the least-squares minimization consecutively to the spectral estimation.

preprint2015arXiv

Non-uniform spline recovery from small degree polynomial approximation

We investigate the sparse spikes deconvolution problem onto spaces of algebraic polynomials. Our framework encompasses the measure reconstruction problem from a combination of noiseless and noisy moment measurements. We study a TV-norm regularization procedure to localize the support and estimate the weights of a target discrete measure in this frame. Furthermore, we derive quantitative bounds on the support recovery and the amplitudes errors under a Chebyshev-type minimal separation condition on its support. Incidentally, we study the localization of the knots of non-uniform splines when a Gaussian perturbation of their inner-products with a known polynomial basis is observed (i.e. a small degree polynomial approximation is known) and the boundary conditions are known. We prove that the knots can be recovered in a grid-free manner using semidefinite programming.

preprint2015arXiv

Power of the Spacing test for Least-Angle Regression

Recent advances in Post-Selection Inference have shown that conditional testing is relevant and tractable in high-dimensions. In the Gaussian linear model, further works have derived unconditional test statistics such as the Kac-Rice Pivot for general penalized problems. In order to test the global null, a prominent offspring of this breakthrough is the spacing test that accounts the relative separation between the first two knots of the celebrated least-angle regression (LARS) algorithm. However, no results have been shown regarding the distribution of these test statistics under the alternative. For the first time, this paper addresses this important issue for the spacing test and shows that it is unconditionally unbiased. Furthermore, we provide the first extension of the spacing test to the frame of unknown noise variance. More precisely, we investigate the power of the spacing test for LARS and prove that it is unbiased: its power is always greater or equal to the significance level $α$. In particular, we describe the power of this test under various scenarii: we prove that its rejection region is optimal when the predictors are orthogonal; as the level $α$ goes to zero, we show that the probability of getting a true positive is much greater than $α$; and we give a detailed description of its power in the case of two predictors. Moreover, we numerically investigate a comparison between the spacing test for LARS and the Pearson's chi-squared test (goodness of fit).

preprint2014arXiv

Estimating the transition matrix of a Markov chain observed at random times

In this paper we develop a statistical estimation technique to recover the transition kernel $P$ of a Markov chain $X=(X_m)_{m \in \mathbb N}$ in presence of censored data. We consider the situation where only a sub-sequence of $X$ is available and the time gaps between the observations are iid random variables. Under the assumption that neither the time gaps nor their distribution are known, we provide an estimation method which applies when some transitions in the initial Markov chain $X$ are known to be unfeasible. A consistent estimator of $P$ is derived in closed form as a solution of a minimization problem. The asymptotic performance of the estimator is then discussed in theory and through numerical simulations.

preprint2014arXiv

Optimal designs for Lasso and Dantzig selector using Expander Codes

We investigate the high-dimensional regression problem using adjacency matrices of unbalanced expander graphs. In this frame, we prove that the $\ell_{2}$-prediction error and the $\ell_{1}$-risk of the lasso and the Dantzig selector are optimal up to an explicit multiplicative constant. Thus we can estimate a high-dimensional target vector with an error term similar to the one obtained in a situation where one knows the support of the largest coordinates in advance. Moreover, we show that these design matrices have an explicit restricted eigenvalue. Precisely, they satisfy the restricted eigenvalue assumption and the compatibility condition with an explicit constant. Eventually, we capitalize on the recent construction of unbalanced expander graphs due to Guruswami, Umans, and Vadhan, to provide a deterministic polynomial time construction of these design matrices.

preprint2014arXiv

Randomized pick-freeze for sparse Sobol indices estimation in high dimension

This article investigates a new procedure to estimate the influence of each variable of a given function defined on a high-dimensional space. More precisely, we are concerned with describing a function of a large number $p$ of parameters that depends only on a small number $s$ of them. Our proposed method is an unconstrained $\ell_{1}$-minimization based on the Sobol's method. We prove that, with only $\mathcal O(s\log p)$ evaluations of $f$, one can find which are the relevant parameters.

preprint2014arXiv

Spike detection from inaccurate samplings

This article investigates the support detection problem using the LASSO estimator in the space of measures. More precisely, we study the recovery of a discrete measure (spike train) from few noisy observations (Fourier samples, moments...) using an $\ell_{1}$-regularization procedure. In particular, we provide an explicit quantitative localization of the spikes.

preprint2012arXiv

A Remark on the Lasso and the Dantzig Selector

This article investigates a new parameter for the high-dimensional regression with noise: the distortion. This latter has attracted a lot of attention recently with the appearance of new deterministic constructions of 'almost'-Euclidean sections of the L1-ball. It measures how far is the intersection between the kernel of the design matrix and the unit L1-ball from an L2-ball. We show that the distortion holds enough information to derive oracle inequalities (i.e. a comparison to an ideal situation where one knows the s largest coefficients of the target) for the lasso and the Dantzig selector.

preprint2012arXiv

Exact Reconstruction using Beurling Minimal Extrapolation

We show that measures with finite support on the real line are the unique solution to an algorithm, named generalized minimal extrapolation, involving only a finite number of generalized moments (which encompass the standard moments, the Laplace transform, the Stieltjes transformation, etc). Generalized minimal extrapolation shares related geometric properties with basis pursuit of Chen, Donoho and Saunders [CDS98]. Indeed we also extend some standard results of compressed sensing (the dual polynomial, the nullspace property) to the signed measure framework. We express exact reconstruction in terms of a simple interpolation problem. We prove that every nonnegative measure, supported by a set containing s points,can be exactly recovered from only 2s + 1 generalized moments. This result leads to a new construction of deterministic sensing matrices for compressed sensing.

preprint2011arXiv

Quantitative Isoperimetric Inequalities on the Real Line

In a recent paper A. Cianchi, N. Fusco, F. Maggi, and A. Pratelli have shown that, in the Gauss space, a set of given measure and almost minimal Gauss boundary measure is necessarily close to be a half-space. Using only geometric tools, we extend their result to all symmetric log-concave measures μon the real line. We give sharp quantitative isoperimetric inequalities and prove that among sets of given measure and given asymmetry (distance to half line, i.e. distance to sets of minimal perimeter), the intervals or complements of intervals have minimal perimeter.

Yohann De Castro

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

Fast Spawn\&Prune (FS\&P): Global convergence of stochastic conic particle gradient descent via birth/death process

Concentration inequality for U-statistics of order two for uniformly ergodic Markov chains

Markov Random Geometric Graph (MRGG): A Growth Model for Temporal Dynamic Networks

Minimax Estimation of Partially-Observed Vector AutoRegressions

Random Geometric Graph: Some recent developments and perspectives

Forecasting Nonnegative Time Series via Sliding Mask Method (SMM) and Latent Clustered Forecast (LCF)

Adaptive Estimation of Nonparametric Geometric Graphs

SuperMix: Sparse Regularization for Mixtures

Adapting to unknown noise level in sparse deconvolution

Recovering Multiple Nonnegative Time Series From a Few Temporal Aggregates

Consistent estimation of the filtering and marginal smoothing distributions in nonparametric hidden Markov models

Minimax adaptive estimation of nonparametric hidden Markov models

Non-uniform spline recovery from small degree polynomial approximation

Power of the Spacing test for Least-Angle Regression

Estimating the transition matrix of a Markov chain observed at random times

Optimal designs for Lasso and Dantzig selector using Expander Codes

Randomized pick-freeze for sparse Sobol indices estimation in high dimension

Spike detection from inaccurate samplings

A Remark on the Lasso and the Dantzig Selector

Exact Reconstruction using Beurling Minimal Extrapolation

Quantitative Isoperimetric Inequalities on the Real Line