Source author record

Tommaso Rigon

Tommaso Rigon appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications Machine Learning Computation math.ST Statistics Theory

Catalog footprint

What is connected

6works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Conjugate priors and bias reduction for logistic regression models

Logistic regression models for binomial responses are routinely used in statistical practice. However, the maximum likelihood estimate may not exist due to data separability. We address this issue by considering a conjugate prior penalty which always produces finite estimates. Such a specification has a clear Bayesian interpretation and enjoys several invariance properties, making it an appealing prior choice. We show that the proposed method leads to an accurate approximation of the reduced-bias approach of Firth (1993), resulting in estimators with smaller asymptotic bias than the maximum-likelihood and whose existence is always guaranteed. Moreover, the considered penalized likelihood can be expressed as a genuine likelihood, in which the original data are replaced with a collection of pseudo-counts. Hence, our approach may leverage well established and scalable algorithms for logistic regression. We compare our estimator with alternative reduced-bias methods, vastly improving their computational performance and achieving appealing inferential results.

preprint2022arXiv

Enriched Pitman-Yor processes

In Bayesian nonparametrics there exists a rich variety of discrete priors, including the Dirichlet process and its generalizations, which are nowadays well-established tools. Despite the remarkable advances, few proposals are tailored for modeling observations lying on product spaces, such as $\mathbb{R}^p$. Indeed, for multivariate random measures, most available priors lack flexibility and do not allow for separate partition structures among the spaces. We introduce a discrete nonparametric prior, termed enriched Pitman-Yor process (EPY), aimed at addressing these issues. Theoretical properties of this novel prior are extensively investigated. We discuss its formal link with the enriched Dirichlet process and normalized random measures, we describe a square-breaking representation and we obtain closed-form expressions for the posterior law and the involved urn schemes. In second place, we show that several existing approaches, including Dirichlet processes with a spike and slab base measure and mixture of mixtures models, implicitly rely on special cases of the EPY, which therefore constitutes a unified probabilistic framework for many Bayesian nonparametric priors. Interestingly, our unifying formulation will allow us to naturally extend these models while preserving their analytical tractability. As an illustration, we employ the EPY for a species sampling problem in ecology and for functional clustering in an e-commerce application.

preprint2022arXiv

Extended Stochastic Block Models with Application to Criminal Networks

Reliably learning group structures among nodes in network data is challenging in several applications. We are particularly motivated by studying covert networks that encode relationships among criminals. These data are subject to measurement errors, and exhibit a complex combination of an unknown number of core-periphery, assortative and disassortative structures that may unveil key architectures of the criminal organization. The coexistence of these noisy block patterns limits the reliability of routinely-used community detection algorithms, and requires extensions of model-based solutions to realistically characterize the node partition process, incorporate information from node attributes, and provide improved strategies for estimation and uncertainty quantification. To cover these gaps, we develop a new class of extended stochastic block models (ESBM) that infer groups of nodes having common connectivity patterns via Gibbs-type priors on the partition process. This choice encompasses many realistic priors for criminal networks, covering solutions with fixed, random and infinite number of possible groups, and facilitates the inclusion of node attributes in a principled manner. Among the new alternatives in our class, we focus on the Gnedin process as a realistic prior that allows the number of groups to be finite, random and subject to a reinforcement process coherent with criminal networks. A collapsed Gibbs sampler is proposed for the whole ESBM class, and refined strategies for estimation, prediction, uncertainty quantification and model selection are outlined. The ESBM performance is illustrated in realistic simulations and in an application to an Italian mafia network, where we unveil key complex block structures, mostly hidden from state-of-the-art alternatives.

preprint2022arXiv

Inferring taxonomic placement from DNA barcoding allowing discovery of new taxa

In ecology it has become common to apply DNA barcoding to biological samples leading to datasets containing a large number of nucleotide sequences. The focus is then on inferring the taxonomic placement of each of these sequences by leveraging on existing databases containing reference sequences having known taxa. This is highly challenging because i) sequencing is typically only available for a relatively small region of the genome due to cost considerations; ii) many of the sequences are from organisms that are either unknown to science or for which there are no reference sequences available. These issues can lead to substantial classification uncertainty, particularly in inferring new taxa. To address these challenges, we propose a new class of Bayesian nonparametric taxonomic classifiers, BayesANT, which use species sampling model priors to allow new taxa to be discovered at each taxonomic rank. Using a simple product multinomial likelihood with conjugate Dirichlet priors at the lowest rank, a highly efficient algorithm is developed to provide a probabilistic prediction of the taxa placement of each sequence at each rank. BayesANT is shown to have excellent performance in real data, including when many sequences in the test set belong to taxa unobserved in training.

preprint2020arXiv

A generalized Bayes framework for probabilistic clustering

Loss-based clustering methods, such as k-means and its variants, are standard tools for finding groups in data. However, the lack of quantification of uncertainty in the estimated clusters is a disadvantage. Model-based clustering based on mixture models provides an alternative, but such methods face computational problems and large sensitivity to the choice of kernel. This article proposes a generalized Bayes framework that bridges between these two paradigms through the use of Gibbs posteriors. In conducting Bayesian updating, the log likelihood is replaced by a loss function for clustering, leading to a rich family of clustering methods. The Gibbs posterior represents a coherent updating of Bayesian beliefs without needing to specify a likelihood for the data, and can be used for characterizing uncertainty in clustering. We consider losses based on Bregman divergence and pairwise similarities, and develop efficient deterministic algorithms for point estimation along with sampling algorithms for uncertainty quantification. Several existing clustering algorithms, including k-means, can be interpreted as generalized Bayes estimators under our framework, and hence we provide a method of uncertainty quantification for these approaches.

preprint2020arXiv

Tractable Bayesian Density Regression via Logit Stick-Breaking Priors

There is a growing interest in learning how the distribution of a response variable changes with a set of predictors. Bayesian nonparametric dependent mixture models provide a flexible approach to address this goal. However, several formulations require computationally demanding algorithms for posterior inference. Motivated by this issue, we study a class of predictor-dependent infinite mixture models, which relies on a simple representation of the stick-breaking prior via sequential logistic regressions. This formulation maintains the same desirable properties of popular predictor-dependent stick-breaking priors, and leverages a recent Pólya-gamma data augmentation to facilitate the implementation of several computational methods for posterior inference. These routines include Markov chain Monte Carlo via Gibbs sampling, expectation-maximization algorithms, and mean-field variational Bayes for scalable inference, thereby stimulating a wider implementation of Bayesian density regression by practitioners. The algorithms associated with these methods are presented in detail and tested in a toxicology study.

Tommaso Rigon

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Conjugate priors and bias reduction for logistic regression models

Enriched Pitman-Yor processes

Extended Stochastic Block Models with Application to Criminal Networks

Inferring taxonomic placement from DNA barcoding allowing discovery of new taxa

A generalized Bayes framework for probabilistic clustering

Tractable Bayesian Density Regression via Logit Stick-Breaking Priors