Source author record

Stefano Favaro

Stefano Favaro appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory math.PR Machine Learning Methodology Applications

Catalog footprint

What is connected

29works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Gaussian process limit for the self-normalized Ewens-Pitman process

For an integer $n\geq1$, consider a random partition $Π_{n}$ of $\{1,\ldots,n\}$ into $K_{n}$ partition sets with $K_{r,n}$ partition subsets of size $r=1,\ldots,n$, and assume $Π_{n}$ distributed according to the Ewens-Pitman model with parameters $α\in]0,1[$ and $θ>-α$. Although the large-$n$ asymptotic behaviors of $K_{n}$ and $K_{r,n}$ are well understood in terms of almost sure convergence and Gaussian fluctuations, much less is known about the asymptotic behavior of $P_{r,n}=K_{r,n}/K_n$ and of the self-normalized Ewens-Pitman process $(P_{1,n},P_{2,n},\dots)$. Motivated by the almost sure convergence of $(P_{1,n},P_{2,n},\dots)$ to the Sibuya distribution $p_α=(p_α(1),p_α(2),\ldots)$, where $p_α(r)$ is the probability mass at $r=1,2,\ldots$, we establish the $\ell^{2}$ distributional convergence \begin{displaymath} \sqrt{K_{n}}((P_{1,n},\,P_{2,n},\ldots)-p_α)\underset{n\rightarrow+\infty}{\overset{\cL}{\longrightarrow}}\mathcal{G}(Γ_α), \end{displaymath} where $\mathcal{G}(Γ_α)$ stands for a centered Gaussian process with covariance matrix $Γ_α=diag(p_α) - p_α p_α^T$. We apply our result to the estimation of the parameter

preprint2023arXiv

Large-width asymptotics for ReLU neural networks with $α$-Stable initializations

There is a recent and growing literature on large-width asymptotic properties of Gaussian neural networks (NNs), namely NNs whose weights are initialized as Gaussian distributions. Two popular problems are: i) the study of the large-width distributions of NNs, which characterizes the infinitely wide limit of a rescaled NN in terms of a Gaussian stochastic process; ii) the study of the large-width training dynamics of NNs, which characterizes the infinitely wide dynamics in terms of a deterministic kernel, referred to as the neural tangent kernel (NTK), and shows that, for a sufficiently large width, the gradient descent achieves zero training error at a linear rate. In this paper, we consider these problems for $α$-Stable NNs, namely NNs whose weights are initialized as $α$-Stable distributions with $α\in(0,2]$. First, for $α$-Stable NNs with a ReLU activation function, we show that if the NN's width goes to infinity then a rescaled NN converges weakly to an $α$-Stable stochastic process. As a difference with respect to the Gaussian setting, our result shows that the choice of the activation function affects the scaling of the NN, that is: to achieve the infinitely wide $α$-Stable process, the ReLU activation requires an additional logarithmic term in the scaling with respect to sub-linear activations. Then, we study the large-width training dynamics of $α$-Stable ReLU-NNs, characterizing the infinitely wide dynamics in terms of a random kernel, referred to as the $α$-Stable NTK, and showing that, for a sufficiently large width, the gradient descent achieves zero training error at a linear rate. The randomness of the $α$-Stable NTK is a further difference with respect to the Gaussian setting, that is: within the $α$-Stable setting, the randomness of the NN at initialization does not vanish in the large-width regime of the training.

preprint2022arXiv

A new approach to posterior contraction rates via Wasserstein dynamics

This paper presents a new approach to the classical problem of quantifying posterior contraction rates (PCRs) in Bayesian statistics. Our approach relies on Wasserstein distance, and it leads to two main contributions which improve on the existing literature of PCRs. The first contribution exploits the dynamic formulation of Wasserstein distance, for short referred to as Wasserstein dynamics, in order to establish PCRs under dominated Bayesian statistical models. As a novelty with respect to existing approaches to PCRs, Wasserstein dynamics allows us to circumvent the use of sieves in both stating and proving PCRs, and it sets forth a natural connection between PCRs and three well-known classical problems in statistics and probability theory: the speed of mean Glivenko-Cantelli convergence, the estimation of weighted Poincaré-Wirtinger constants and Sanov large deviation principle for Wasserstein distance. The second contribution combines the use of Wasserstein distance with a suitable sieve construction to establish PCRs under full Bayesian nonparametric models. As a novelty with respect to existing literature of PCRs, our second result provides with the first treatment of PCRs under non-dominated Bayesian models. Applications of our results are presented for some classical Bayesian statistical models, e.g., regular parametric models, infinite-dimensional exponential families, linear regression in infinite dimension and nonparametric models under Dirichlet process priors.

preprint2022arXiv

Deep Stable neural networks: large-width asymptotics and convergence rates

In modern deep learning, there is a recent and growing literature on the interplay between large-width asymptotic properties of deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussian-distributed weights, and Gaussian stochastic processes (SPs). Such an interplay has proved to be critical in Bayesian inference under Gaussian SP priors, kernel regression for infinitely wide deep NNs trained via gradient descent, and information propagation within infinitely wide NNs. Motivated by empirical analyses that show the potential of replacing Gaussian distributions with Stable distributions for the NN's weights, in this paper we present a rigorous analysis of the large-width asymptotic behaviour of (fully connected) feed-forward deep Stable NNs, i.e. deep NNs with Stable-distributed weights. We show that as the width goes to infinity jointly over the NN's layers, i.e. the ``joint growth" setting, a rescaled deep Stable NN converges weakly to a Stable SP whose distribution is characterized recursively through the NN's layers. Because of the non-triangular structure of the NN, this is a non-standard asymptotic problem, to which we propose an inductive approach of independent interest. Then, we establish sup-norm convergence rates of the rescaled deep Stable NN to the Stable SP, under the ``joint growth" and a ``sequential growth" of the width over the NN's layers. Such a result provides the difference between the ``joint growth" and the ``sequential growth" settings, showing that the former leads to a slower rate than the latter, depending on the depth of the layer and the number of inputs of the NN. Our work extends some recent results on infinitely wide limits for deep Gaussian NNs to the more general deep Stable NNs, providing the first result on convergence rates in the ``joint growth" setting.

preprint2022arXiv

Learning-augmented count-min sketches via Bayesian nonparametrics

The count-min sketch (CMS) is a time and memory efficient randomized data structure that provides estimates of tokens' frequencies in a data stream of tokens, i.e. point queries, based on random hashed data. A learning-augmented version of the CMS, referred to as CMS-DP, has been proposed by Cai, Mitzenmacher and Adams (\textit{NeurIPS} 2018), and it relies on Bayesian nonparametric (BNP) modeling of the data stream of tokens via a Dirichlet process (DP) prior, with estimates of a point query being obtained as suitable mean functionals of the posterior distribution of the point query, given the hashed data. While the CMS-DP has proved to improve on some aspects of CMS, it has the major drawback of arising from a ``constructive" proof that builds upon arguments tailored to the DP prior, namely arguments that are not usable for other nonparametric priors. In this paper, we present a ``Bayesian" proof of the CMS-DP that has the main advantage of building upon arguments that are usable, in principle, within a broad class of nonparametric priors arising from normalized completely random measures. This result leads to develop a novel learning-augmented CMS under power-law data streams, referred to as CMS-PYP, which relies on BNP modeling of the data stream of tokens via a Pitman-Yor process (PYP) prior. Under this more general framework, we apply the arguments of the ``Bayesian" proof of the CMS-DP, suitably adapted to the PYP prior, in order to compute the posterior distribution of a point query, given the hashed data. Applications to synthetic data and real textual data show that the CMS-PYP outperforms the CMS and the CMS-DP in estimating low-frequency tokens, which are known to be of critical interest in textual data, and it is competitive with respect to a variation of the CMS designed for low-frequency tokens. An extension of our BNP approach to more general queries is also discussed.

preprint2022arXiv

Near-optimal estimation of the unseen under regularly varying tail populations

Given $n$ samples from a population of individuals belonging to different species, what is the number $U$ of hitherto unseen species that would be observed if $λn$ new samples were collected? This is an important problem in many scientific endeavors, and it has been the subject of recent works introducing non-parametric estimators of $U$ that are minimax near-optimal and consistent all the way up to $λ\asymp\log n$. These works do not rely on any assumption on the underlying unknown distribution $p$ of the population, and therefore, while providing a theory in its greatest generality, worst-case distributions may severely hamper the estimation of $U$ in concrete applications. In this paper, we consider the problem of strengthening the non-parametric framework for estimating $U$. Inspired by the estimation of rare probabilities in extreme value theory, and motivated by the ubiquitous power-law type distributions in many natural and social phenomena, we make use of a semi-parametric assumption regular variation of index $α\in (0,1)$ for the tail behaviour of $p$. Under this assumption, we introduce an estimator of $U$ that is simple, linear in the sampling information, computationally efficient, and scalable to massive datasets. Then, uniformly over our class of regularly varying tail distributions, we show that the proposed estimator has provable guarantees: i) it is minimax near-optimal, up to a power of $\log n$ factor; ii) it is consistent all of the way up to $\logλ\asymp n^{α/2}/\sqrt{\log n}$, and this range is the best possible. This work presents the first study on the estimation of the unseen under regularly varying tail distributions. A numerical illustration of our methodology is presented for synthetic data and real data.

preprint2022arXiv

The power of private likelihood-ratio tests for goodness-of-fit in frequency tables

Privacy-protecting data analysis investigates statistical methods under privacy constraints. This is a rising challenge in modern statistics, as the achievement of confidentiality guarantees, which typically occurs through suitable perturbations of the data, may determine a loss in the statistical utility of the data. In this paper, we consider privacy-protecting tests for goodness-of-fit in frequency tables, this being arguably the most common form of releasing data, and present a rigorous analysis of the large sample behaviour of a private likelihood-ratio (LR) test. Under the framework of $(\varepsilon,δ)$-differential privacy for perturbed data, our main contribution is the power analysis of the private LR test, which characterizes the trade-off between confidentiality, measured via the differential privacy parameters $(\varepsilon,δ)$, and statistical utility, measured via the power of the test. This is obtained through a Bahadur-Rao large deviation expansion for the power of the private LR test, bringing out a critical quantity, as a function of the sample size, the dimension of the table and $(\varepsilon,δ)$, that determines a loss in the power of the test. Such a result is then applied to characterize the impact of the sample size and the dimension of the table, in connection with the parameters $(\varepsilon,δ)$, on the loss of the power of the private LR test. In particular, we determine the (sample) cost of $(\varepsilon,δ)$-differential privacy in the private LR test, namely the additional sample size that is required to recover the power of the Multinomial LR test in the absence of perturbation. Our power analysis rely on a non-standard large deviation analysis for the LR, as well as the development of a novel (sharp) large deviation principle for sum of i.i.d. random vectors, which is of independent interest.

preprint2022arXiv

Wasserstein posterior contraction rates in non-dominated Bayesian nonparametric models

Posterior contractions rates (PCRs) strengthen the notion of Bayesian consistency, quantifying the speed at which the posterior distribution concentrates on arbitrarily small neighborhoods of the true model, with probability tending to 1 or almost surely, as the sample size goes to infinity. Under the Bayesian nonparametric framework, a common assumption in the study of PCRs is that the model is dominated for the observations; that is, it is assumed that the posterior can be written through the Bayes formula. In this paper, we consider the problem of establishing PCRs in Bayesian nonparametric models where the posterior distribution is not available through the Bayes formula, and hence models that are non-dominated for the observations. By means of the Wasserstein distance and a suitable sieve construction, our main result establishes PCRs in Bayesian nonparametric models where the posterior is available through a more general disintegration than the Bayes formula. To the best of our knowledge, this is the first general approach to provide PCRs in non-dominated Bayesian nonparametric models, and it relies on minimal modeling assumptions and on a suitable continuity assumption for the posterior distribution. Some refinements of our result are presented under additional assumptions on the prior distribution, and applications are given with respect to the Dirichlet process prior and the normalized extended Gamma process prior.

preprint2021arXiv

A Bayesian nonparametric approach to count-min sketch under power-law data streams

The count-min sketch (CMS) is a randomized data structure that provides estimates of tokens' frequencies in a large data stream using a compressed representation of the data by random hashing. In this paper, we rely on a recent Bayesian nonparametric (BNP) view on the CMS to develop a novel learning-augmented CMS under power-law data streams. We assume that tokens in the stream are drawn from an unknown discrete distribution, which is endowed with a normalized inverse Gaussian process (NIGP) prior. Then, using distributional properties of the NIGP, we compute the posterior distribution of a token's frequency in the stream, given the hashed data, and in turn corresponding BNP estimates. Applications to synthetic and real data show that our approach achieves a remarkable performance in the estimation of low-frequency tokens. This is known to be a desirable feature in the context of natural language processing, where it is indeed common in the context of the power-law behaviour of the data.

preprint2021arXiv

More for less: Predicting and maximizing genetic variant discovery via Bayesian nonparametrics

While the cost of sequencing genomes has decreased dramatically in recent years, this expense often remains non-trivial. Under a fixed budget, then, scientists face a natural trade-off between quantity and quality; they can spend resources to sequence a greater number of genomes (quantity) or spend resources to sequence genomes with increased accuracy (quality). Our goal is to find the optimal allocation of resources between quantity and quality. Optimizing resource allocation promises to reveal as many new variations in the genome as possible, and thus as many new scientific insights as possible. In this paper, we consider the common setting where scientists have already conducted a pilot study to reveal variants in a genome and are contemplating a follow-up study. We introduce a Bayesian nonparametric methodology to predict the number of new variants in the follow-up study based on the pilot study. When experimental conditions are kept constant between the pilot and follow-up, we demonstrate on real data from the gnomAD project that our prediction is more accurate than three recent proposals, and competitive with a more classic proposal. Unlike existing methods, though, our method allows practitioners to change experimental conditions between the pilot and the follow-up. We demonstrate how this distinction allows our method to be used for (i) more realistic predictions and (ii) optimal allocation of a fixed budget between quality and quantity.

preprint2020arXiv

A Berry-Esseen theorem for Pitman's $α$-diversity

This paper is concerned with the study of the random variable $K_n$ denoting the number of distinct elements in a random sample $(X_1, \dots, X_n)$ of exchangeable random variables driven by the two parameter Poisson-Dirichlet distribution, $PD(α,θ)$. For $α\in(0,1)$, Theorem 3.8 in \cite{Pit(06)} shows that $\frac{K_n}{n^α}\stackrel{\text{a.s.}}{\longrightarrow} S_{α,θ}$ as $n\rightarrow+\infty$. Here, $S_{α,θ}$ is a random variable distributed according to the so-called scaled Mittag-Leffler distribution. Our main result states that $$ \sup_{x \geq 0} \Big| \ppsf\Big[\frac{K_n}{n^α} \leq x \Big] - \ppsf[S_{α,θ} \leq x] \Big| \leq \frac{C(α, θ)}{n^α} $$ holds with an explicit constant $C(α, θ)$. The key ingredients of the proof are a novel probabilistic representation of $K_n$ as compound distribution and new, refined versions of certain quantitative bounds for the Poisson approximation and the compound Poisson distribution.

preprint2020arXiv

A Good-Turing estimator for feature allocation models

Feature allocation models generalize species sampling models by allowing every observation to belong to more than one species, now called features. Under the popular Bernoulli product model for feature allocation, given $n$ samples, we study the problem of estimating the missing mass $M_{n}$, namely the expected number hitherto unseen features that would be observed if one additional individual was sampled. This is motivated by numerous applied problems where the sampling procedure is expensive, in terms of time and/or financial resources allocated, and further samples can be only motivated by the possibility of recording new unobserved features. We introduce a simple, robust and theoretically sound nonparametric estimator $\hat{M}_{n}$ of $M_{n}$. $\hat{M}_{n}$ turns out to have the same analytic form of the popular Good-Turing estimator of the missing mass in species sampling models, with the difference that the two estimators have different ranges. We show that $\hat{M}_{n}$ admits a natural interpretation both as a jackknife estimator and as a nonparametric empirical Bayes estimator, we give provable guarantees for the performance of $\hat{M}_{n}$ in terms of minimax rate optimality, and we provide with an interesting connection between $\hat{M}_{n}$ and the Good-Turing estimator for species sampling. Finally, we derive non-asymptotic confidence intervals for $\hat{M}_{n}$, which are easily computable and do not rely on any asymptotic approximation. Our approach is illustrated with synthetic data and SNP data from the ENCODE sequencing genome project.

preprint2020arXiv

Approximating predictive probabilities of Gibbs-type priors

Gibbs-type random probability measures, or Gibbs-type priors, are arguably the most "natural" generalization of the celebrated Dirichlet prior. Among them the two parameter Poisson-Dirichlet prior certainly stands out for the mathematical tractability and interpretability of its predictive probabilities, which made it the natural candidate in several applications. Given a sample of size $n$, in this paper we show that the predictive probabilities of any Gibbs-type prior admit a large $n$ approximation, with an error term vanishing as $o(1/n)$, which maintains the same desirable features as the predictive probabilities of the two parameter Poisson-Dirichlet prior.

preprint2020arXiv

De Finetti's theorem: rate of convergence in Kolmogorov distance

This paper provides a quantitative version of de Finetti law of large numbers. Given an infinite sequence $\{X_n\}_{n \geq 1}$ of exchangeable Bernoulli variables, it is well-known that $\frac{1}{n} \sum_{i = 1}^n X_i \stackrel{a.s.}{\longrightarrow} Y$, for a suitable random variable $Y$ taking values in $[0,1]$. Here, we consider the rate of convergence in law of $\frac{1}{n} \sum_{i = 1}^n X_i$ towards $Y$, with respect to the Kolmogorov distance. After showing that any rate of the type of $1/n^α$ can be obtained for any $α\in (0,1]$, we find a sufficient condition on the probability distribution of $Y$ for the achievement of the optimal rate of convergence, that is $1/n$. Our main result improve on existing literature: in particular, with respect to \cite{MPS}, we study a stronger metric while, with respect to \cite{Mna}, we weaken the regularity hypothesis on the probability distribution of $Y$.

preprint2020arXiv

Infinitely deep neural networks as diffusion processes

When the parameters are independently and identically distributed (initialized) neural networks exhibit undesirable properties that emerge as the number of layers increases, e.g. a vanishing dependency on the input and a concentration on restrictive families of functions including constant functions. We consider parameter distributions that shrink as the number of layers increases in order to recover well-behaved stochastic processes in the limit of infinite depth. This leads to set forth a link between infinitely deep residual networks and solutions to stochastic differential equations, i.e. diffusion processes. We show that these limiting processes do not suffer from the aforementioned issues and investigate their properties.

preprint2020arXiv

Nonparametric Bayesian multi-armed bandits for single cell experiment design

The problem of maximizing cell type discovery under budget constraints is a fundamental challenge for the collection and analysis of single-cell RNA-sequencing (scRNA-seq) data. In this paper, we introduce a simple, computationally efficient, and scalable Bayesian nonparametric sequential approach to optimize the budget allocation when designing a large scale experiment for the collection of scRNA-seq data for the purpose of, but not limited to, creating cell atlases. Our approach relies on the following tools: i) a hierarchical Pitman-Yor prior that recapitulates biological assumptions regarding cellular differentiation, and ii) a Thompson sampling multi-armed bandit strategy that balances exploitation and exploration to prioritize experiments across a sequence of trials. Posterior inference is performed by using a sequential Monte Carlo approach, which allows us to fully exploit the sequential nature of our species sampling problem. We empirically show that our approach outperforms state-of-the-art methods and achieves near-Oracle performance on simulated and scRNA-seq data alike. HPY-TS code is available at https://github.com/fedfer/HPYsinglecell.

preprint2020arXiv

Stable behaviour of infinitely wide deep neural networks

We consider fully connected feed-forward deep neural networks (NNs) where weights and biases are independent and identically distributed as symmetric centered stable distributions. Then, we show that the infinite wide limit of the NN, under suitable scaling on the weights, is a stochastic process whose finite-dimensional distributions are multivariate stable distributions. The limiting process is referred to as the stable process, and it generalizes the class of Gaussian processes recently obtained as infinite wide limits of NNs (Matthews at al., 2018b). Parameters of the stable process can be computed via an explicit recursion over the layers of the network. Our result contributes to the theory of fully connected feed-forward deep NNs, and it paves the way to expand recent lines of research that rely on Gaussian infinite wide limits.

preprint2016arXiv

A characterization of product-form exchangeable feature probability functions

We characterize the class of exchangeable feature allocations assigning probability $V_{n,k}\prod_{l=1}^{k}W_{m_{l}}U_{n-m_{l}}$ to a feature allocation of $n$ individuals, displaying $k$ features with counts $(m_{1},\ldots,m_{k})$ for these features. Each element of this class is parametrized by a countable matrix $V$ and two sequences $U$ and $W$ of non-negative weights. Moreover, a consistency condition is imposed to guarantee that the distribution for feature allocations of $n-1$ individuals is recovered from that of $n$ individuals, when the last individual is integrated out. In Theorem 1.1, we prove that the only members of this class satisfying the consistency condition are mixtures of the Indian Buffet Process over its mass parameter $γ$ and mixtures of the Beta--Bernoulli model over its dimensionality parameter $N$. Hence, we provide a characterization of these two models as the only, up to randomization of the parameters, consistent exchangeable feature allocations having the required product form.

preprint2016arXiv

Frequency of Frequencies Distributions and Size Dependent Exchangeable Random Partitions

Motivated by the fundamental problem of modeling the frequency of frequencies (FoF) distribution, this paper introduces the concept of a cluster structure to define a probability function that governs the joint distribution of a random count and its exchangeable random partitions. A cluster structure, naturally arising from a completely random measure mixed Poisson process, allows the probability distribution of the random partitions of a subset of a population to be dependent on the population size, a distinct and motivated feature that makes it more flexible than a partition structure. This allows it to model an entire FoF distribution whose structural properties change as the population size varies. A FoF vector can be simulated by drawing an infinite number of Poisson random variables, or by a stick-breaking construction with a finite random number of steps. A generalized negative binomial process model is proposed to generate a cluster structure, where in the prior the number of clusters is finite and Poisson distributed, and the cluster sizes follow a truncated negative binomial distribution. We propose a simple Gibbs sampling algorithm to extrapolate the FoF vector of a population given the FoF vector of a sample taken without replacement from the population. We illustrate our results and demonstrate the advantages of the proposed models through the analysis of real text, genomic, and survey data.

preprint2016arXiv

Moderate deviations for Ewens-Pitman exchangeable random partitions

Consider a population of individuals belonging to an infinity number of types, and assume that type proportions follow the two-parameter Poisson-Dirichlet distribution. A sample of size n is selected from the population. The total number of different types and the number of types appearing in the sample with a fixed frequency are important statistics. In this paper we establish the moderate deviation principles for these quantities. The corresponding rate functions are explicitly identified, which help revealing a critical scale and understanding the exact role of the parameters. Conditional, or posterior, counterparts of moderate deviation principles are also established.

preprint2015arXiv

Looking-backward probabilities for Gibbs-type exchangeable random partitions

Gibbs-type random probability measures and the exchangeable random partitions they induce represent the subject of a rich and active literature. They provide a probabilistic framework for a wide range of theoretical and applied problems that are typically referred to as species sampling problems. In this paper, we consider the class of looking-backward species sampling problems introduced in Lijoi et al. (Ann. Appl. Probab. 18 (2008) 1519-1547) in Bayesian nonparametrics. Specifically, given some information on the random partition induced by an initial sample from a Gibbs-type random probability measure, we study the conditional distributions of statistics related to the old species, namely those species detected in the initial sample and possibly re-observed in an additional sample. The proposed results contribute to the analysis of conditional properties of Gibbs-type exchangeable random partitions, so far focused mainly on statistics related to those species generated by the additional sample and not already detected in the initial sample.

preprint2015arXiv

Rediscovery of Good-Turing estimators via Bayesian nonparametrics

The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this paper we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library.

preprint2014arXiv

Large deviation principles for the Ewens-Pitman sampling model

Let $M_{l,n}$ be the number of blocks with frequency $l$ in the exchangeable random partition induced by a sample of size $n$ from the Ewens-Pitman sampling model. We show that, as $n$ tends to infinity, $n^{-1}M_{l,n}$ satisfies a large deviation principle and we characterize the corresponding rate function. A conditional counterpart of this large deviation principle is also presented. Specifically, given an initial sample of size $n$ from the Ewens-Pitman sampling model, we consider an additional sample of size $m$. For any fixed $n$ and as $m$ tends to infinity, we establish a large deviation principle for the conditional number of blocks with frequency $l$ in the enlarged sample, given the initial sample. Interestingly, the conditional and unconditional large deviation principles coincide, namely there is no long lasting impact of the given initial sample. Potential applications of our results are discussed in the context of Bayesian nonparametric inference for discovery probabilities.

preprint2013arXiv

Alpha-diversity processes and normalized inverse-Gaussian diffusions

The infinitely-many-neutral-alleles model has recently been extended to a class of diffusion processes associated with Gibbs partitions of two-parameter Poisson-Dirichlet type. This paper introduces a family of infinite-dimensional diffusions associated with a different subclass of Gibbs partitions, induced by normalized inverse-Gaussian random probability measures. Such diffusions describe the evolution of the frequencies of infinitely-many types together with the dynamics of the time-varying mutation rate, which is driven by an alpha-diversity diffusion. Constructed as a dynamic version, relative to this framework, of the corresponding notion for Gibbs partitions, the latter is explicitly derived from an underlying population model and shown to coincide, in a special case, with the diffusion approximation of a critical Galton-Watson branching process. The class of infinite-dimensional processes is characterized in terms of its infinitesimal generator on an appropriate domain, and shown to be the limit in distribution of a certain sequence of Feller diffusions with finitely-many types. Moreover, a discrete representation is provided by means of appropriately transformed Moran-type particle processes, where the particles are samples from a normalized inverse-Gaussian random probability measure. The relationship between the limit diffusion and the two-parameter model is also discussed.

preprint2013arXiv

Bayesian nonparametric analysis of reversible Markov chains

We introduce a three-parameter random walk with reinforcement, called the $(θ,α,β)$ scheme, which generalizes the linearly edge reinforced random walk to uncountable spaces. The parameter $β$ smoothly tunes the $(θ,α,β)$ scheme between this edge reinforced random walk and the classical exchangeable two-parameter Hoppe urn scheme, while the parameters $α$ and $θ$ modulate how many states are typically visited. Resorting to de Finetti's theorem for Markov chains, we use the $(θ,α,β)$ scheme to define a nonparametric prior for Bayesian analysis of reversible Markov chains. The prior is applied in Bayesian nonparametric inference for species sampling problems with data generated from a reversible Markov chain with an unknown transition kernel. As a real example, we analyze data from molecular dynamics simulations of protein folding.

preprint2013arXiv

Conditional formulae for Gibbs-type exchangeable random partitions

Gibbs-type random probability measures and the exchangeable random partitions they induce represent an important framework both from a theoretical and applied point of view. In the present paper, motivated by species sampling problems, we investigate some properties concerning the conditional distribution of the number of blocks with a certain frequency generated by Gibbs-type random partitions. The general results are then specialized to three noteworthy examples yielding completely explicit expressions of their distributions, moments and asymptotic behaviors. Such expressions can be interpreted as Bayesian nonparametric estimators of the rare species variety and their performance is tested on some real genomic data.

preprint2013arXiv

MCMC for Normalized Random Measure Mixture Models

This paper concerns the use of Markov chain Monte Carlo methods for posterior sampling in Bayesian nonparametric mixture models with normalized random measure priors. Making use of some recent posterior characterizations for the class of normalized random measures, we propose novel Markov chain Monte Carlo methods of both marginal type and conditional type. The proposed marginal samplers are generalizations of Neal's well-regarded Algorithm 8 for Dirichlet process mixture models, whereas the conditional sampler is a variation of those recently introduced in the literature. For both the marginal and conditional methods, we consider as a running example a mixture model with an underlying normalized generalized Gamma process prior, and describe comparative simulation results demonstrating the efficacies of the proposed methods.

preprint2012arXiv

A class of measure-valued Markov chains and Bayesian nonparametrics

Measure-valued Markov chains have raised interest in Bayesian nonparametrics since the seminal paper by (Math. Proc. Cambridge Philos. Soc. 105 (1989) 579--585) where a Markov chain having the law of the Dirichlet process as unique invariant measure has been introduced. In the present paper, we propose and investigate a new class of measure-valued Markov chains defined via exchangeable sequences of random variables. Asymptotic properties for this new class are derived and applications related to Bayesian nonparametric mixture modeling, and to a generalization of the Markov chain proposed by (Math. Proc. Cambridge Philos. Soc. 105 (1989) 579--585), are discussed. These results and their applications highlight once again the interplay between Bayesian nonparametrics and the theory of measure-valued Markov chains.

preprint2012arXiv

Asymptotics for a Bayesian nonparametric estimator of species variety

In Bayesian nonparametric inference, random discrete probability measures are commonly used as priors within hierarchical mixture models for density estimation and for inference on the clustering of the data. Recently, it has been shown that they can also be exploited in species sampling problems: indeed they are natural tools for modeling the random proportions of species within a population thus allowing for inference on various quantities of statistical interest. For applications that involve large samples, the exact evaluation of the corresponding estimators becomes impracticable and, therefore, asymptotic approximations are sought. In the present paper, we study the limiting behaviour of the number of new species to be observed from further sampling, conditional on observed data, assuming the observations are exchangeable and directed by a normalized generalized gamma process prior. Such an asymptotic study highlights a connection between the normalized generalized gamma process and the two-parameter Poisson-Dirichlet process that was previously known only in the unconditional case.

Stefano Favaro

What is connected

Connect this record

See the researcher in context

Building this map preview

29 published item(s)

A Gaussian process limit for the self-normalized Ewens-Pitman process

Large-width asymptotics for ReLU neural networks with $α$-Stable initializations

A new approach to posterior contraction rates via Wasserstein dynamics

Deep Stable neural networks: large-width asymptotics and convergence rates

Learning-augmented count-min sketches via Bayesian nonparametrics

Near-optimal estimation of the unseen under regularly varying tail populations

The power of private likelihood-ratio tests for goodness-of-fit in frequency tables

Wasserstein posterior contraction rates in non-dominated Bayesian nonparametric models

A Bayesian nonparametric approach to count-min sketch under power-law data streams

More for less: Predicting and maximizing genetic variant discovery via Bayesian nonparametrics

A Berry-Esseen theorem for Pitman's $α$-diversity

A Good-Turing estimator for feature allocation models

Approximating predictive probabilities of Gibbs-type priors

De Finetti's theorem: rate of convergence in Kolmogorov distance

Infinitely deep neural networks as diffusion processes

Nonparametric Bayesian multi-armed bandits for single cell experiment design

Stable behaviour of infinitely wide deep neural networks

A characterization of product-form exchangeable feature probability functions

Frequency of Frequencies Distributions and Size Dependent Exchangeable Random Partitions

Moderate deviations for Ewens-Pitman exchangeable random partitions

Looking-backward probabilities for Gibbs-type exchangeable random partitions

Rediscovery of Good-Turing estimators via Bayesian nonparametrics

Large deviation principles for the Ewens-Pitman sampling model

Alpha-diversity processes and normalized inverse-Gaussian diffusions

Bayesian nonparametric analysis of reversible Markov chains

Conditional formulae for Gibbs-type exchangeable random partitions

MCMC for Normalized Random Measure Mixture Models

A class of measure-valued Markov chains and Bayesian nonparametrics

Asymptotics for a Bayesian nonparametric estimator of species variety