Source author record

Anthony C. Davison

Anthony C. Davison appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications math.ST Statistics Theory Machine Learning math.PR

Catalog footprint

What is connected

12works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Bayesian nonparametric mixture inconsistency for the number of components: How worried should we be in practice?

We consider the Bayesian mixture of finite mixtures (MFMs) and Dirichlet process mixture (DPM) models for clustering. Recent asymptotic theory has established that DPMs overestimate the number of clusters for large samples and that estimators from both classes of models are inconsistent for the number of clusters under misspecification, but the implications for finite sample analyses are unclear. The final reported estimate after fitting these models is often a single representative clustering obtained using an MCMC summarisation technique, but it is unknown how well such a summary estimates the number of clusters. Here we investigate these practical considerations through simulations and an application to gene expression data, and find that (i) DPMs overestimate the number of clusters even in finite samples, but only to a limited degree that may be correctable using appropriate summaries, and (ii) misspecification can lead to considerable overestimation of the number of clusters in both DPMs and MFMs, but results are nevertheless often still interpretable. We provide recommendations on MCMC summarisation and suggest that although the more appealing asymptotic properties of MFMs provide strong motivation to prefer them, results obtained using MFMs and DPMs are often very similar in practice.

preprint2022arXiv

Functional Peaks-over-threshold Analysis

Peaks-over-threshold analysis using the generalized Pareto distribution is widely applied in modelling tails of univariate random variables, but much information may be lost when complex extreme events are studied using univariate results. In this paper, we extend peaks-over-threshold analysis to extremes of functional data. Threshold exceedances defined using a functional $r$ are modelled by the generalized $r$-Pareto process, a functional generalization of the generalized Pareto distribution that covers the three classical regimes for the decay of tail probabilities, and that is the only possible continuous limit for $r$-exceedances of a properly rescaled process. We give construction rules, simulation algorithms and inference procedures for generalized $r$-Pareto processes, discuss model validation, and use the new methodology to study extreme European windstorms and heavy spatial rainfall.

preprint2022arXiv

The Tangent Exponential Model

The likelihood function is central to both frequentist and Bayesian formulations of parametric statistical inference, and large-sample approximations to the sampling distributions of estimators and test statistics, and to posterior densities, are widely used in practice. Improved approximations have been widely studied and can provide highly accurate inferences when samples are small or there are many nuisance parameters. This article reviews improved approximations based on the tangent exponential model developed in a series of articles by D.~A.~S.~Fraser and co-workers, attempting to explain the theoretical basis of this model and to provide a guide to the associated literature, including a partially-annotated bibliography.

preprint2021arXiv

Improved inference on risk measures for univariate extremes

We discuss the use of likelihood asymptotics for inference on risk measures in univariate extreme value problems, focusing on estimation of high quantiles and similar summaries of risk for uncertainty quantification. We study whether higher-order approximation based on the tangent exponential model can provide improved inferences, and conclude that inference based on maxima is generally robust to mild model misspecification and that profile likelihood-based confidence intervals will often be adequate, whereas inferences based on threshold exceedances can be badly biased but may be improved by higher-order methods, at least for moderate sample sizes. We use the methods to shed light on catastrophic rainfall in Venezuela, flooding in Venice, and the lifetimes of Italian semi-supercentenarians.

preprint2020arXiv

A global-local approach for detecting hotspots in multiple-response regression

We tackle modelling and inference for variable selection in regression problems with many predictors and many responses. We focus on detecting hotspots, i.e., predictors associated with several responses. Such a task is critical in statistical genetics, as hotspot genetic variants shape the architecture of the genome by controlling the expression of many genes and may initiate decisive functional mechanisms underlying disease endpoints. Existing hierarchical regression approaches designed to model hotspots suffer from two limitations: their discrimination of hotspots is sensitive to the choice of top-level scale parameters for the propensity of predictors to be hotspots, and they do not scale to large predictor and response vectors, e.g., of dimensions $10^3-10^5$ in genetic applications. We address these shortcomings by introducing a flexible hierarchical regression framework that is tailored to the detection of hotspots and scalable to the above dimensions. Our proposal implements a fully Bayesian model for hotspots based on the horseshoe shrinkage prior. Its global-local formulation shrinks noise globally and hence accommodates the highly sparse nature of genetic analyses, while being robust to individual signals, thus leaving the effects of hotspots unshrunk. Inference is carried out using a fast variational algorithm coupled with a novel simulated annealing procedure that allows efficient exploration of multimodal distributions.

preprint2016arXiv

Bayesian Uncertainty Management in Temporal Dependence of Extremes

Both marginal and dependence features must be described when modelling the extremes of a stationary time series. There are standard approaches to marginal modelling, but long- and short-range dependence of extremes may both appear. In applications, an assumption of long-range independence often seems reasonable, but short-range dependence, i.e., the clustering of extremes, needs attention. The extremal index $0<θ\le 1$ is a natural limiting measure of clustering, but for wide classes of dependent processes, including all stationary Gaussian processes, it cannot distinguish dependent processes from independent processes with $θ=1$. Eastoe and Tawn (2012) exploit methods from multivariate extremes to treat the subasymptotic extremal dependence structure of stationary time series, covering both $0<θ<1$ and $θ=1$, through the introduction of a threshold-based extremal index. Inference for their dependence models uses an inefficient stepwise procedure that has various weaknesses and has no reliable assessment of uncertainty. We overcome these issues using a Bayesian semiparametric approach. Simulations and the analysis of a UK daily river flow time series show that the new approach provides improved efficiency for estimating properties of functionals of clusters.

preprint2016arXiv

Extremes on river networks

Max-stable processes are the natural extension of the classical extreme-value distributions to the functional setting, and they are increasingly widely used to estimate probabilities of complex extreme events. In this paper we broaden them from the usual situation in which dependence varies according to functions of Euclidean distance to situations in which extreme river discharges at two locations on a river network may be dependent because the locations are flow-connected or because of common meteorological events. In the former case dependence depends on river distance, and in the second it depends on the hydrological distance between the locations, either of which may be very different from their Euclidean distance. Inference for the model parameters is performed using a multivariate threshold likelihood, which is shown by simulation to work well. The ideas are illustrated with data from the upper Danube basin.

preprint2016arXiv

Statistical regionalization for estimation of extreme river discharges

Regionalization methods have long been used to estimate high return levels of river discharges at ungauged locations on a river network. In these methods, the recorded discharge measurements of a group of similar, gauged, stations is used to estimate high quantiles at the target catchment that has no observations. This group is called the region of influence and its similarity to the ungauged location is measured in terms of physical and meteorological catchment attributes. We develop a statistical method for estimation of high return levels based on regionalizing the parameters of a generalized extreme value distribution. The region of influence is chosen in an optimal way, ensuring similarity and in-group homogeneity. Our method is applied to discharge data from the Rhine basin in Switzerland, and its performance at ungauged locations is compared to that of classical regionalization methods. For gauged locations we show how our approach improves the estimation uncertainty for long return periods by combining local measurements with those from the region of influence.

preprint2015arXiv

Likelihood estimators for multivariate extremes

The main approach to inference for multivariate extremes consists in approximating the joint upper tail of the observations by a parametric family arising in the limit for extreme events. The latter may be expressed in terms of componentwise maxima, high threshold exceedances or point processes, yielding different but related asymptotic characterizations and estimators. The present paper clarifies the connections between the main likelihood estimators, and assesses their practical performance. We investigate their ability to estimate the extremal dependence structure and to predict future extremes, using exact calculations and simulation, in the case of the logistic model.

preprint2011arXiv

Bayesian Inference from Composite Likelihoods, with an Application to Spatial Extremes

Composite likelihoods are increasingly used in applications where the full likelihood is analytically unknown or computationally prohibitive. Although the maximum composite likelihood estimator has frequentist properties akin to those of the usual maximum likelihood estimator, Bayesian inference based on composite likelihoods has yet to be explored. In this paper we investigate the use of the Metropolis--Hastings algorithm to compute a pseudo-posterior distribution based on the composite likelihood. Two methodologies for adjusting the algorithm are presented and their performance on approximating the true posterior distribution is investigated using simulated data sets and real data on spatial extremes of rainfall.

preprint2011arXiv

Spatial modeling of extreme snow depth

The spatial modeling of extreme snow is important for adequate risk management in Alpine and high altitude countries. A natural approach to such modeling is through the theory of max-stable processes, an infinite-dimensional extension of multivariate extreme value theory. In this paper we describe the application of such processes in modeling the spatial dependence of extreme snow depth in Switzerland, based on data for the winters 1966--2008 at 101 stations. The models we propose rely on a climate transformation that allows us to account for the presence of climate regions and for directional effects, resulting from synoptic weather patterns. Estimation is performed through pairwise likelihood inference and the models are compared using penalized likelihood criteria. The max-stable models provide a much better fit to the joint behavior of the extremes than do independence or full dependence models.

preprint2010arXiv

Model misspecification in peaks over threshold analysis

Classical peaks over threshold analysis is widely used for statistical modeling of sample extremes, and can be supplemented by a model for the sizes of clusters of exceedances. Under mild conditions a compound Poisson process model allows the estimation of the marginal distribution of threshold exceedances and of the mean cluster size, but requires the choice of a threshold and of a run parameter, $K$, that determines how exceedances are declustered. We extend a class of estimators of the reciprocal mean cluster size, known as the extremal index, establish consistency and asymptotic normality, and use the compound Poisson process to derive misspecification tests of model validity and of the choice of run parameter and threshold. Simulated examples and real data on temperatures and rainfall illustrate the ideas, both for estimating the extremal index in nonstandard situations and for assessing the validity of extremal models.

Anthony C. Davison

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Bayesian nonparametric mixture inconsistency for the number of components: How worried should we be in practice?

Functional Peaks-over-threshold Analysis

The Tangent Exponential Model

Improved inference on risk measures for univariate extremes

A global-local approach for detecting hotspots in multiple-response regression

Bayesian Uncertainty Management in Temporal Dependence of Extremes

Extremes on river networks

Statistical regionalization for estimation of extreme river discharges

Likelihood estimators for multivariate extremes

Bayesian Inference from Composite Likelihoods, with an Application to Spatial Extremes

Spatial modeling of extreme snow depth

Model misspecification in peaks over threshold analysis