Researcher profile

Andrew C. Parnell

Andrew C. Parnell contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - Emerging
9works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2022arXiv

Variational Inference for Additive Main and Multiplicative Interaction Effects Models

In plant breeding the presence of a genotype by environment (GxE) interaction has a strong impact on cultivation decision making and the introduction of new crop cultivars. The combination of linear and bilinear terms has been shown to be very useful in modelling this type of data. A widely-used approach to identify GxE is the Additive Main Effects and Multiplicative Interaction Effects (AMMI) model. However, as data frequently can be high-dimensional, Markov chain Monte Carlo (MCMC) approaches can be computationally infeasible. In this article, we consider a variational inference approach for such a model. We derive variational approximations for estimating the parameters and we compare the approximations to MCMC using both simulated and real data. The new inferential framework we propose is on average two times faster whilst maintaining the same predictive performance as MCMC.

preprint2021arXiv

Bayesian Additive Regression Trees with Model Trees

Bayesian Additive Regression Trees (BART) is a tree-based machine learning method that has been successfully applied to regression and classification problems. BART assumes regularisation priors on a set of trees that work as weak learners and is very flexible for predicting in the presence of non-linearity and high-order interactions. In this paper, we introduce an extension of BART, called Model Trees BART (MOTR-BART), that considers piecewise linear functions at node levels instead of piecewise constants. In MOTR-BART, rather than having a unique value at node level for the prediction, a linear predictor is estimated considering the covariates that have been used as the split variables in the corresponding tree. In our approach, local linearities are captured more efficiently and fewer trees are required to achieve equal or better performance than BART. Via simulation studies and real data applications, we compare MOTR-BART to its main competitors. R code for MOTR-BART implementation is available at https://github.com/ebprado/MOTR-BART.

preprint2015arXiv

A Bayesian Hierarchical Model for Reconstructing Sea Levels: From Raw Data to Rates of Change

We present a holistic Bayesian hierarchical model for reconstructing the continuous and dynamic evolution of relative sea-level (RSL) change with fully quantified uncertainty. The reconstruction is produced from biological (foraminifera) and geochemical (δ13C) sea-level indicators preserved in dated cores of salt-marsh sediment. Our model is comprised of three modules: (1) A Bayesian transfer function for the calibration of foraminifera into tidal elevation, which is flexible enough to formally accommodate additional proxies (in this case bulk-sediment δ13C values); (2) A chronology developed from an existing Bchron age-depth model, and (3) An existing errors-in-variables integrated Gaussian process (EIV-IGP) model for estimating rates of sea-level change. We illustrate our approach using a case study of Common Era sea-level variability from New Jersey, U.S.A. We develop a new Bayesian transfer function (B-TF), with and without the δ13C proxy and compare our results to those from a widely-used weighted-averaging transfer function (WA-TF). The formal incorporation of a second proxy into the B-TF model results in smaller vertical uncertainties and improved accuracy for reconstructed RSL. The vertical uncertainty from the multi-proxy B-TF is ~28% smaller on average compared to the WA-TF. When evaluated against historic tide-gauge measurements, the multi-proxy B-TF most accurately reconstructs the RSL changes observed in the instrumental record (MSE = 0.003). The holistic model provides a single, unifying framework for reconstructing and analysing sea level through time. This approach is suitable for reconstructing other paleoenvironmental variables using biological proxies.

preprint2015arXiv

Bayesian Additive Regression Trees using Bayesian Model Averaging

Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However for data sets where the number of variables $p$ is large (e.g. $p>5,000$) the algorithm can become prohibitively expensive, computationally. Another method which is popular for high dimensional data is random forests, a machine learning algorithm which grows trees using a greedy search for the best split points. However, as it is not a statistical model, it cannot produce probabilistic estimates or predictions. We propose an alternative algorithm for BART called BART-BMA, which uses Bayesian Model Averaging and a greedy search algorithm to produce a model which is much more efficient than BART for datasets with large $p$. BART-BMA incorporates elements of both BART and random forests to offer a model-based algorithm which can deal with high-dimensional data. We have found that BART-BMA can be run in a reasonable time on a standard laptop for the "small $n$ large $p$" scenario which is common in many areas of bioinformatics. We showcase this method using simulated data and data from two real proteomic experiments; one to distinguish between patients with cardiovascular disease and controls and another to classify agressive from non-agressive prostate cancer. We compare our results to their main competitors. Open source code written in R and Rcpp to run BART-BMA can be found at: https://github.com/BelindaHernandez/BART-BMA.git

preprint2015arXiv

Modeling sea-level change using errors-in-variables integrated Gaussian processes

We perform Bayesian inference on historical and late Holocene (last 2000 years) rates of sea-level change. The input data to our model are tide-gauge measurements and proxy reconstructions from cores of coastal sediment. These data are complicated by multiple sources of uncertainty, some of which arise as part of the data collection exercise. Notably, the proxy reconstructions include temporal uncertainty from dating of the sediment core using techniques such as radiocarbon. The model we propose places a Gaussian process prior on the rate of sea-level change, which is then integrated and set in an errors-in-variables framework to take account of age uncertainty. The resulting model captures the continuous and dynamic evolution of sea-level change with full consideration of all sources of uncertainty. We demonstrate the performance of our model using two real (and previously published) example data sets. The global tide-gauge data set indicates that sea-level rise increased from a rate with a posterior mean of 1.13 mm$/$yr in 1880 AD (0.89 to 1.28 mm$/$yr 95% credible interval for the posterior mean) to a posterior mean rate of 1.92 mm$/$yr in 2009 AD (1.84 to 2.03 mm$/$yr 95% credible interval for the posterior mean). The proxy reconstruction from North Carolina (USA) after correction for land-level change shows the 2000 AD rate of rise to have a posterior mean of 2.44 mm$/$yr (1.91 to 3.01 mm$/$yr 95% credible interval). This is unprecedented in at least the last 2000 years.

preprint2014arXiv

Frequency behaviour for multinomial counts of fisheries discards via a nested wavelet zero and N inflated binomial model

In this paper we identify the changing frequency behaviour of multinomial counts of fish species discarded by vessels in the Irish Sea. We use a Bayesian hierarchical model which captures dynamic frequency changes via a shrinkage model applied to wavelet basis functions. Wavelets are known for capturing data features at different temporal scales; we use a recently-proposed shrinkage prior from the factor analysis literature so that features at the finest levels of detail exhibit the greatest shrinkage. Rather than using a multinomial distribution for monitoring the changes in discards over time, which can be slow to fit and inflexible, we use a nested zero-and-N inflated (ZaNI) binomial distribution which enables much faster computation with no obvious deterioration in model flexibility. Our results show that seasonal behaviour in these data are not regular and occur at different frequencies. We also show that the nested ZaNI binomial distribution is a good fit to multinomial count data of this sort when an informative nested structure is applied.

preprint2014arXiv

Joint Inference of Misaligned Irregular Time Series with Application to Greenland Ice Core Data

Ice cores provide insight into the past climate over many millennia. Due to ice compaction, the raw data for any single core are irregular in time. Multiple cores have different irregularities; jointly these series are misaligned. After processing, such data are made available to researchers as regular time series: a data product. Typically, these cores are independently processed. In this paper, we consider a fast Bayesian method for the joint processing of multiple irregular series. This is shown to be more efficient. Further, our approach permits a realistic modelling of the impact of the multiple sources of uncertainty. The methodology is illustrated with the analysis of a pair of ice cores (GISP2 and GRIP). Our data products, in the form of marginal posterior distributions on an arbitrary temporal grid, are finite Gaussian mixtures. We can also produce sample paths from the joint posterior distribution to study non-linear functionals of interest. More generally, the concept of joint analysis via hierarchical Gaussian process model can be widely extended as the models used can be viewed within the larger context of continuous space-time processes.

preprint2012arXiv

Bayesian Stable Isotope Mixing Models

In this paper we review recent advances in Stable Isotope Mixing Models (SIMMs) and place them into an over-arching Bayesian statistical framework which allows for several useful extensions. SIMMs are used to quantify the proportional contributions of various sources to a mixture. The most widely used application is quantifying the diet of organisms based on the food sources they have been observed to consume. At the centre of the multivariate statistical model we propose is a compositional mixture of the food sources corrected for various metabolic factors. The compositional component of our model is based on the isometric log ratio (ilr) transform of Egozcue (2003). Through this transform we can apply a range of time series and non-parametric smoothing relationships. We illustrate our models with 3 case studies based on real animal dietary behaviour.

preprint2012arXiv

On Bayesian Modelling of the Uncertainties in Palaeoclimate Reconstruction

We outline a model and algorithm to perform inference on the palaeoclimate and palaeoclimate volatility from pollen proxy data. We use a novel multivariate non-linear non-Gaussian state space model consisting of an observation equation linking climate to proxy data and an evolution equation driving climate change over time. The link from climate to proxy data is defined by a pre-calibrated forward model, as developed in Salter-Townshend and Haslett (2012) and Sweeney (2012). Climatic change is represented by a temporally-uncertain Normal-Inverse Gaussian Levy process, being able to capture large jumps in multivariate climate whilst remaining temporally consistent. The pre-calibrated nature of the forward model allows us to cut feedback between the observation and evolution equations and thus integrate out the state variable entirely whilst making minimal simplifying assumptions. A key part of this approach is the creation of mixtures of marginal data posteriors representing the information obtained about climate from each individual time point. Our approach allows for an extremely efficient MCMC algorithm, which we demonstrate with a pollen core from Sluggan Bog, County Antrim, Northern Ireland.