Source author record

Zacharie Naulet

Zacharie Naulet appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Machine Learning Methodology

Catalog footprint

What is connected

6works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Near-optimal estimation of the unseen under regularly varying tail populations

Given $n$ samples from a population of individuals belonging to different species, what is the number $U$ of hitherto unseen species that would be observed if $λn$ new samples were collected? This is an important problem in many scientific endeavors, and it has been the subject of recent works introducing non-parametric estimators of $U$ that are minimax near-optimal and consistent all the way up to $λ\asymp\log n$. These works do not rely on any assumption on the underlying unknown distribution $p$ of the population, and therefore, while providing a theory in its greatest generality, worst-case distributions may severely hamper the estimation of $U$ in concrete applications. In this paper, we consider the problem of strengthening the non-parametric framework for estimating $U$. Inspired by the estimation of rare probabilities in extreme value theory, and motivated by the ubiquitous power-law type distributions in many natural and social phenomena, we make use of a semi-parametric assumption regular variation of index $α\in (0,1)$ for the tail behaviour of $p$. Under this assumption, we introduce an estimator of $U$ that is simple, linear in the sampling information, computationally efficient, and scalable to massive datasets. Then, uniformly over our class of regularly varying tail distributions, we show that the proposed estimator has provable guarantees: i) it is minimax near-optimal, up to a power of $\log n$ factor; ii) it is consistent all of the way up to $\logλ\asymp n^{α/2}/\sqrt{\log n}$, and this range is the best possible. This work presents the first study on the estimation of the unseen under regularly varying tail distributions. A numerical illustration of our methodology is presented for synthetic data and real data.

preprint2021arXiv

Bootstrap estimators for the tail-index and for the count statistics of graphex processes

Graphex processes resolve some pathologies in traditional random graph models, notably, providing models that are both projective and allow sparsity. Most of the literature on graphex processes study them from a probabilistic point of view. Techniques for inferring the parameter of these processes -- the so-called \textit{graphon} -- are still marginal; exceptions are a few papers considering parametric families of graphons. Nonparametric estimation remains unconsidered. In this paper, we propose estimators for a selected choice of functionals of the graphon. Our estimators originate from the subsampling theory for graphex processes, hence can be seen as a form of bootstrap procedure.

preprint2020arXiv

Risk of the Least Squares Minimum Norm Estimator under the Spike Covariance Model

We study risk of the minimum norm linear least squares estimator in when the number of parameters $d$ depends on $n$, and $\frac{d}{n} \rightarrow \infty$. We assume that data has an underlying low rank structure by restricting ourselves to spike covariance matrices, where a fixed finite number of eigenvalues grow with $n$ and are much larger than the rest of the eigenvalues, which are (asymptotically) in the same order. We show that in this setting risk of minimum norm least squares estimator vanishes in compare to risk of the null estimator. We give asymptotic and non asymptotic upper bounds for this risk, and also leverage the assumption of spike model to give an analysis of the bias that leads to tighter bounds in compare to previous works.

preprint2016arXiv

Bayesian nonparametric estimation for Quantum Homodyne Tomography

We estimate the quantum state of a light beam from results of quantum homodyne tomography noisy measurements performed on identically prepared quantum systems. We propose two Bayesian nonparametric approaches. The first approach is based on mixture models and is illustrated through simulation examples. The second approach is based on random basis expansions. We study the theoretical performance of the second approach by quantifying the rate of contraction of the posterior distribution around the true quantum state in the $L^2$ metric.

preprint2016arXiv

Some aspects of symmetric Gamma process mixtures

In this article, we present some specific aspects of symmetric Gamma process mixtures for use in regression models. We propose a new Gibbs sampler for simulating the posterior and we establish adaptive posterior rates of convergence related to the Gaussian mean regression problem.

preprint2016arXiv

Tails assumptions and posterior concentration rates for mixtures of Gaussians

Nowadays in density estimation, posterior rates of convergence for location and location-scale mixtures of Gaussians are only known under light-tail assumptions; with better rates achieved by location mixtures. It is conjectured, but not proved, that the situation should be reversed under heavy tails assumptions. The conjecture is based on the feeling that there is no need to achieve a good order of approximation in regions with few data (say, in the tails), favoring location-scale mixtures which allow for spatially varying order of approximation. Here we test the previous argument on the Gaussian errors mean regression model with random design, for which the light tail assumption is not required for proofs. Although we cannot invalidate the conjecture due to the lack of lower bound, we find that even with heavy tails assumptions, location-scale mixtures apparently perform always worst than location mixtures. However, the proofs suggest to introduce hybrid location-scale mixtures that are find to outperform both location and location-scale mixtures, whatever the nature of the tails. Finally, we show that all tails assumptions can be released at the price of making the prior distribution covariate dependent.

Zacharie Naulet

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Near-optimal estimation of the unseen under regularly varying tail populations

Bootstrap estimators for the tail-index and for the count statistics of graphex processes

Risk of the Least Squares Minimum Norm Estimator under the Spike Covariance Model

Bayesian nonparametric estimation for Quantum Homodyne Tomography

Some aspects of symmetric Gamma process mixtures

Tails assumptions and posterior concentration rates for mixtures of Gaussians