Source author record

Giacomo Zanella

Giacomo Zanella appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation math.ST Methodology Statistics Theory Machine Learning Applications math.PR

Catalog footprint

What is connected

8works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Scalability of Metropolis-within-Gibbs schemes for high-dimensional Bayesian models

We study general coordinate-wise MCMC schemes (such as Metropolis-within-Gibbs samplers), which are commonly used to fit Bayesian non-conjugate hierarchical models. We relate their convergence properties to the ones of the corresponding (potentially not implementable) Gibbs sampler through the notion of conditional conductance. This allows us to study the performances of popular Metropolis-within-Gibbs schemes for non-conjugate hierarchical models, in high-dimensional regimes where both number of datapoints and parameters increase. Given random data-generating assumptions, we establish dimension-free convergence results, which are in close accordance with numerical evidences. Applications to Bayesian models for binary regression with unknown hyperparameters and discretely observed diffusions are also discussed. Motivated by such statistical applications, auxiliary results of independent interest on approximate conductances and perturbation of Markov operators are provided.

preprint2022arXiv

Optimal design of the Barker proposal and other locally-balanced Metropolis-Hastings algorithms

We study the class of first-order locally-balanced Metropolis--Hastings algorithms introduced in Livingstone & Zanella (2021). To choose a specific algorithm within the class the user must select a balancing function $g:\mathbb{R} \to \mathbb{R}$ satisfying $g(t) = tg(1/t)$, and a noise distribution for the proposal increment. Popular choices within the class are the Metropolis-adjusted Langevin algorithm and the recently introduced Barker proposal. We first establish a universal limiting optimal acceptance rate of 57% and scaling of $n^{-1/3}$ as the dimension $n$ tends to infinity among all members of the class under mild smoothness assumptions on $g$ and when the target distribution for the algorithm is of the product form. In particular we obtain an explicit expression for the asymptotic efficiency of an arbitrary algorithm in the class, as measured by expected squared jumping distance. We then consider how to optimise this expression under various constraints. We derive an optimal choice of noise distribution for the Barker proposal, optimal choice of balancing function under a Gaussian noise distribution, and optimal choice of first-order locally-balanced algorithm among the entire class, which turns out to depend on the specific target distribution. Numerical simulations confirm our theoretical findings and in particular show that a bi-modal choice of noise distribution in the Barker proposal gives rise to a practical algorithm that is consistently more efficient than the original Gaussian version.

preprint2022arXiv

Scalable and Accurate Variational Bayes for High-Dimensional Binary Regression Models

Modern methods for Bayesian regression beyond the Gaussian response setting are often computationally impractical or inaccurate in high dimensions. In fact, as discussed in recent literature, bypassing such a trade-off is still an open problem even in routine binary regression models, and there is limited theory on the quality of variational approximations in high-dimensional settings. To address this gap, we study the approximation accuracy of routinely-used mean-field variational Bayes solutions in high-dimensional probit regression with Gaussian priors, obtaining novel and practically relevant results on the pathological behavior of such strategies in uncertainty quantification, point estimation and prediction. Motivated by these results, we further develop a new partially-factorized variational approximation for the posterior of the probit coefficients which leverages a representation with global and local variables but, unlike for classical mean-field assumptions, it avoids a fully factorized approximation, and instead assumes a factorization only for the local variables. We prove that the resulting approximation belongs to a tractable class of unified skew-normal densities that crucially incorporates skewness and, unlike for state-of-the-art mean-field solutions, converges to the exact posterior density as p goes to infinity. To solve the variational optimization problem, we derive a tractable CAVI algorithm that easily scales to p in the tens of thousands, and provably requires a number of iterations converging to 1 as p goes to infinity. Such findings are also illustrated in extensive empirical studies where our novel solution is shown to improve the approximation accuracy of mean-field variational Bayes for any n and p, with the magnitude of these gains being remarkable in those high-dimensional p>n settings where state-of-the-art methods are computationally impractical.

preprint2020arXiv

Random Partition Models for Microclustering Tasks

Traditional Bayesian random partition models assume that the size of each cluster grows linearly with the number of data points. While this is appealing for some applications, this assumption is not appropriate for other tasks such as entity resolution, modeling of sparse networks, and DNA sequencing tasks. Such applications require models that yield clusters whose sizes grow sublinearly with the total number of data points -- the microclustering property. Motivated by these issues, we propose a general class of random partition models that satisfy the microclustering property with well-characterized theoretical properties. Our proposed models overcome major limitations in the existing literature on microclustering models, namely a lack of interpretability, identifiability, and full characterization of model asymptotic properties. Crucially, we drop the classical assumption of having an exchangeable sequence of data points, and instead assume an exchangeable sequence of clusters. In addition, our framework provides flexibility in terms of the prior distribution of cluster sizes, computational tractability, and applicability to a large number of microclustering tasks. We establish theoretical properties of the resulting class of priors, where we characterize the asymptotic behavior of the number of clusters and of the proportion of clusters of a given size. Our framework allows a simple and efficient Markov chain Monte Carlo algorithm to perform statistical inference. We illustrate our proposed methodology on the microclustering task of entity resolution, where we provide a simulation study and real experiments on survey panel data.

preprint2020arXiv

The Barker proposal: combining robustness and efficiency in gradient-based MCMC

There is a tension between robustness and efficiency when designing Markov chain Monte Carlo (MCMC) sampling algorithms. Here we focus on robustness with respect to tuning parameters, showing that more sophisticated algorithms tend to be more sensitive to the choice of step-size parameter and less robust to heterogeneity of the distribution of interest. We characterise this phenomenon by studying the behaviour of spectral gaps as an increasingly poor step-size is chosen for the algorithm. Motivated by these considerations, we propose a novel and simple gradient-based MCMC algorithm, inspired by the classical Barker accept-reject rule, with improved robustness properties. Extensive theoretical results, dealing with robustness to tuning, geometric ergodicity and scaling with dimension, suggest that the novel scheme combines the robustness of simple schemes with the efficiency of gradient-based ones. We show numerically that this type of robustness is particularly beneficial in the context of adaptive MCMC, giving examples where our proposed scheme significantly outperforms state-of-the-art alternatives.

preprint2016arXiv

Flexible Models for Microclustering with Application to Entity Resolution

Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman--Yor process mixture models make this assumption, as do all other infinitely exchangeable clustering models. However, for some applications, this assumption is inappropriate. For example, when performing entity resolution, the size of each cluster should be unrelated to the size of the data set, and each cluster should contain a negligible fraction of the total number of data points. These applications require models that yield clusters whose sizes grow sublinearly with the size of the data set. We address this requirement by defining the microclustering property and introducing a new class of models that can exhibit this property. We compare models within this class to two commonly used clustering models using four entity-resolution data sets.

preprint2015arXiv

Bayesian complementary clustering, MCMC and Anglo-Saxon placenames

Common cluster models for multi-type point processes model the aggregation of points of the same type. In complete contrast, in the study of Anglo-Saxon settlements it is hypothesized that administrative clusters involving complementary names tend to appear. We investigate the evidence for such an hypothesis by developing a Bayesian Random Partition Model based on clusters formed by points of different types (complementary clustering). As a result we obtain an intractable posterior distribution on the space of matchings contained in a k-partite hypergraph. We apply the Metropolis-Hastings (MH) algorithm to sample from this posterior. We consider the problem of choosing an efficient MH proposal distribution and we obtain consistent mixing improvements compared to the choices found in the literature. Simulated Tempering techniques can be used to overcome multimodality and a multiple proposal scheme is developed to allow for parallel programming. Finally, we discuss results arising from the careful use of convergence diagnostic techniques. This allows us to study a dataset including locations and placenames of 1316 Anglo-Saxon settlements dated approximately around 750-850 AD. Without strong prior knowledge, the model allows for explicit estimation of the number of clusters, the average intra-cluster dispersion and the level of interaction among placenames. The results support the hypothesis of organization of settlements into administrative clusters based on complementary names.

preprint2015arXiv

Branching-stable point processes

The notion of stability can be generalised to point processes by defining the scaling operation in a randomised way: scaling a configuration by $t$ corresponds to letting such a configuration evolve according to a Markov branching particle system for -$\log t$ time. We prove that these are the only stochastic operations satisfying basic associativity and distributivity properties and we thus introduce the notion of branching-stable point processes. We characterise stable distributions with respect to local branching as thinning-stable point processes with multiplicities given by the quasi-stationary (or Yaglom) distribution of the branching process under consideration. Finally we extend branching-stability to random variables with the help of continuous branching (CB) processes, and we show that, at least in some frameworks, $\mathcal{F}$-stable integer random variables are exactly Cox (doubly stochastic Poisson) random variables driven by corresponding CB-stable continuous random variables.

Giacomo Zanella

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Scalability of Metropolis-within-Gibbs schemes for high-dimensional Bayesian models

Optimal design of the Barker proposal and other locally-balanced Metropolis-Hastings algorithms

Scalable and Accurate Variational Bayes for High-Dimensional Binary Regression Models

Random Partition Models for Microclustering Tasks

The Barker proposal: combining robustness and efficiency in gradient-based MCMC

Flexible Models for Microclustering with Application to Entity Resolution

Bayesian complementary clustering, MCMC and Anglo-Saxon placenames

Branching-stable point processes