Source author record

Chris J. Oates

Chris J. Oates appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Machine Learning math.NA Numerical Analysis math.ST Statistics Theory Computation eess.SP

Catalog footprint

What is connected

17works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Regularized Zero-Variance Control Variates

Zero-variance control variates (ZV-CV) are a post-processing method to reduce the variance of Monte Carlo estimators of expectations using the derivatives of the log target. Once the derivatives are available, the only additional computational effort lies in solving a linear regression problem. Significant variance reductions have been achieved with this method in low dimensional examples, but the number of covariates in the regression rapidly increases with the dimension of the target. In this paper, we present compelling empirical evidence that the use of penalized regression techniques in the selection of high-dimensional control variates provides performance gains over the classical least squares method. Another type of regularization based on using subsets of derivatives, or a priori regularization as we refer to it in this paper, is also proposed to reduce computational and storage requirements. Several examples showing the utility and limitations of regularized ZV-CV for Bayesian inference are given. The methods proposed in this paper are accessible through the R package ZVCV.

preprint2022arXiv

Statistical Properties of the Probabilistic Numeric Linear Solver BayesCG

We analyse the calibration of BayesCG under the Krylov prior, a probabilistic numeric extension of the Conjugate Gradient (CG) method for solving systems of linear equations with symmetric positive definite coefficient matrix. Calibration refers to the statistical quality of the posterior covariances produced by a solver. Since BayesCG is not calibrated in the strict existing notion, we propose instead two test statistics that are necessary but not sufficient for calibration: the Z-statistic and the new S-statistic. We show analytically and experimentally that under low-rank approximate Krylov posteriors, BayesCG exhibits desirable properties of a calibrated solver, is only slightly optimistic, and is computationally competitive with CG.

preprint2022arXiv

Testing whether a Learning Procedure is Calibrated

A learning procedure takes as input a dataset and performs inference for the parameters $θ$ of a model that is assumed to have given rise to the dataset. Here we consider learning procedures whose output is a probability distribution, representing uncertainty about $θ$ after seeing the dataset. Bayesian inference is a prime example of such a procedure, but one can also construct other learning procedures that return distributional output. This paper studies conditions for a learning procedure to be considered calibrated, in the sense that the true data-generating parameters are plausible as samples from its distributional output. A learning procedure whose inferences and predictions are systematically over- or under-confident will fail to be calibrated. On the other hand, a learning procedure that is calibrated need not be statistically efficient. A hypothesis-testing framework is developed in order to assess, using simulation, whether a learning procedure is calibrated. Several vignettes are presented to illustrate different aspects of the framework.

preprint2022arXiv

The Ridgelet Prior: A Covariance Function Approach to Prior Specification for Bayesian Neural Networks

Bayesian neural networks attempt to combine the strong predictive performance of neural networks with formal quantification of uncertainty associated with the predictive output in the Bayesian framework. However, it remains unclear how to endow the parameters of the network with a prior distribution that is meaningful when lifted into the output space of the network. A possible solution is proposed that enables the user to posit an appropriate Gaussian process covariance function for the task at hand. Our approach constructs a prior distribution for the parameters of the network, called a ridgelet prior, that approximates the posited Gaussian process in the output space of the network. In contrast to existing work on the connection between neural networks and Gaussian processes, our analysis is non-asymptotic, with finite sample-size error bounds provided. This establishes the universality property that a Bayesian neural network can approximate any Gaussian process whose covariance function is sufficiently regular. Our experimental assessment is limited to a proof-of-concept, where we demonstrate that the ridgelet prior can out-perform an unstructured prior on regression problems for which a suitable Gaussian process prior can be provided.

preprint2021arXiv

Probabilistic Iterative Methods for Linear Systems

This paper presents a probabilistic perspective on iterative methods for approximating the solution $\mathbf{x}_* \in \mathbb{R}^d$ of a nonsingular linear system $\mathbf{A} \mathbf{x}_* = \mathbf{b}$. In the approach a standard iterative method on $\mathbb{R}^d$ is lifted to act on the space of probability distributions $\mathcal{P}(\mathbb{R}^d)$. Classically, an iterative method produces a sequence $\mathbf{x}_m$ of approximations that converge to $\mathbf{x}_*$. The output of the iterative methods proposed in this paper is, instead, a sequence of probability distributions $μ_m \in \mathcal{P}(\mathbb{R}^d)$. The distributional output both provides a "best guess" for $\mathbf{x}_*$, for example as the mean of $μ_m$, and also probabilistic uncertainty quantification for the value of $\mathbf{x}_*$ when it has not been exactly determined. Theoretical analysis is provided in the prototypical case of a stationary linear iterative method. In this setting we characterise both the rate of contraction of $μ_m$ to an atomic measure on $\mathbf{x}_*$ and the nature of the uncertainty quantification being provided. We conclude with an empirical illustration that highlights the insight into solution uncertainty that can be provided by probabilistic iterative methods.

preprint2020arXiv

Discussion of "Unbiased Markov chain Monte Carlo with couplings" by Pierre E. Jacob, John O'Leary and Yves F. Atchadé

This is a contribution for the discussion on "Unbiased Markov chain Monte Carlo with couplings" by Pierre E. Jacob, John O'Leary and Yves F. Atchadé to appear in the Journal of the Royal Statistical Society Series B.

preprint2020arXiv

Improved Calibration of Numerical Integration Error in Sigma-Point Filters

The sigma-point filters, such as the UKF, which exploit numerical quadrature to obtain an additional order of accuracy in the moment transformation step, are popular alternatives to the ubiquitous EKF. The classical quadrature rules used in the sigma-point filters are motivated via polynomial approximation of the integrand, however in the applied context these assumptions cannot always be justified. As a result, quadrature error can introduce bias into estimated moments, for which there is no compensatory mechanism in the classical sigma-point filters. This can lead in turn to estimates and predictions that are poorly calibrated. In this article, we investigate the Bayes-Sard quadrature method in the context of sigma-point filters, which enables uncertainty due to quadrature error to be formalised within a probabilistic model. Our first contribution is to derive the well-known classical quadratures as special cases of the Bayes-Sard quadrature method. Then a general-purpose moment transform is developed and utilised in the design of novel sigma-point filters, so that uncertainty due to quadrature error is explicitly quantified. Numerical experiments on a challenging tracking example with misspecified initial conditions show that the additional uncertainty quantification built into our method leads to better-calibrated state estimates with improved RMSE.

preprint2020arXiv

Maximum likelihood estimation and uncertainty quantification for Gaussian process approximation of deterministic functions

Despite the ubiquity of the Gaussian process regression model, few theoretical results are available that account for the fact that parameters of the covariance kernel typically need to be estimated from the dataset. This article provides one of the first theoretical analyses in the context of Gaussian process regression with a noiseless dataset. Specifically, we consider the scenario where the scale parameter of a Sobolev kernel (such as a Matérn kernel) is estimated by maximum likelihood. We show that the maximum likelihood estimation of the scale parameter alone provides significant adaptation against misspecification of the Gaussian process model in the sense that the model can become "slowly" overconfident at worst, regardless of the difference between the smoothness of the data-generating function and that expected by the model. The analysis is based on a combination of techniques from nonparametric regression and scattered data interpolation. Empirical results are provided in support of the theoretical findings.

preprint2016arXiv

Control functionals for Monte Carlo integration

A non-parametric extension of control variates is presented. These leverage gradient information on the sampling density to achieve substantial variance reduction. It is not required that the sampling density be normalised. The novel contribution of this work is based on two important insights; (i) a trade-off between random sampling and deterministic approximation and (ii) a new gradient-based function space derived from Stein's identity. Unlike classical control variates, our estimators achieve super-root-$n$ convergence, often requiring orders of magnitude fewer simulations to achieve a fixed level of precision. Theoretical and empirical results are presented, the latter focusing on integration problems arising in hierarchical models and models based on non-linear ordinary differential equations.

preprint2016arXiv

Discussion of "Causal inference using invariant prediction: identification and confidence intervals" by Peters, Bühlmann and Meinshausen

Contribution to the discussion of the paper "Causal inference using invariant prediction: identification and confidence intervals" by Peters, Bühlmann and Meinshausen, to appear in the Journal of the Royal Statistical Society, Series B.

preprint2014arXiv

Estimating causal structure using conditional DAG models

This paper considers inference of causal structure in a class of graphical models called "conditional DAGs". These are directed acyclic graph (DAG) models with two kinds of variables, primary and secondary. The secondary variables are used to aid in estimation of causal relationships between the primary variables. We give causal semantics for this model class and prove that, under certain assumptions, the direction of causal influence is identifiable from the joint observational distribution of the primary and secondary variables. A score-based approach is developed for estimation of causal structure using these models and consistency results are established. Empirical results demonstrate gains compared with formulations that treat all variables on an equal footing, or that ignore secondary variables. The methodology is motivated by applications in molecular biology and is illustrated here using simulated data and in an analysis of proteomic data from the Cancer Genome Atlas.

preprint2014arXiv

Exact Estimation of Multiple Directed Acyclic Graphs

This paper considers the problem of estimating the structure of multiple related directed acyclic graph (DAG) models. Building on recent developments in exact estimation of DAGs using integer linear programming (ILP), we present an ILP approach for joint estimation over multiple DAGs, that does not require that the vertices in each DAG share a common ordering. Furthermore, we allow also for (potentially unknown) dependency structure between the DAGs. Results are presented on both simulated data and fMRI data obtained from multiple subjects.

preprint2014arXiv

Joint estimation of multiple related biological networks

Graphical models are widely used to make inferences concerning interplay in multivariate systems. In many applications, data are collected from multiple related but nonidentical units whose underlying networks may differ but are likely to share features. Here we present a hierarchical Bayesian formulation for joint estimation of multiple networks in this nonidentically distributed setting. The approach is general: given a suitable class of graphical models, it uses an exchangeability assumption on networks to provide a corresponding joint formulation. Motivated by emerging experimental designs in molecular biology, we focus on time-course data with interventions, using dynamic Bayesian networks as the graphical models. We introduce a computationally efficient, deterministic algorithm for exact joint inference in this setting. We provide an upper bound on the gains that joint estimation offers relative to separate estimation for each network and empirical results that support and extend the theory, including an extensive simulation study and an application to proteomic data from human cancer cell lines. Finally, we describe approximations that are still more computationally efficient than the exact algorithm and that also demonstrate good empirical performance.

preprint2014arXiv

Joint Structure Learning of Multiple Non-Exchangeable Networks

Several methods have recently been developed for joint structure learning of multiple (related) graphical models or networks. These methods treat individual networks as exchangeable, such that each pair of networks are equally encouraged to have similar structures. However, in many practical applications, exchangeability in this sense may not hold, as some pairs of networks may be more closely related than others, for example due to group and sub-group structure in the data. Here we present a novel Bayesian formulation that generalises joint structure learning beyond the exchangeable case. In addition to a general framework for joint learning, we (i) provide a novel default prior over the joint structure space that requires no user input; (ii) allow for latent networks; (iii) give an efficient, exact algorithm for the case of time series data and dynamic Bayesian networks. We present empirical results on non-exchangeable populations, including a real data example from biology, where cell-line-specific networks are related according to genomic features.

preprint2014arXiv

Quantifying the Multi-Scale Performance of Network Inference Algorithms

Graphical models are widely used to study complex multivariate biological systems. Network inference algorithms aim to reverse-engineer such models from noisy experimental data. It is common to assess such algorithms using techniques from classifier analysis. These metrics, based on ability to correctly infer individual edges, possess a number of appealing features including invariance to rank-preserving transformation. However, regulation in biological systems occurs on multiple scales and existing metrics do not take into account the correctness of higher-order network structure. In this paper novel performance scores are presented that share the appealing properties of existing scores, whilst capturing ability to uncover regulation on multiple scales. Theoretical results confirm that performance of a network inference algorithm depends crucially on the scale at which inferences are to be made; in particular strong local performance does not guarantee accurate reconstruction of higher-order topology. Applying these scores to a large corpus of data from the DREAM5 challenge, we undertake a data-driven assessment of estimator performance. We find that the ``wisdom of crowds'' network, that demonstrated superior local performance in the DREAM5 challenge, is also among the best performing methodologies for inference of regulation on multiple length scales. MATLAB R2013b code "net_assess" is provided as Supplement.

preprint2014arXiv

The Controlled Thermodynamic Integral for Bayesian Model Comparison

Bayesian model comparison relies upon the model evidence, yet for many models of interest the model evidence is unavailable in closed form and must be approximated. Many of the estimators for evidence that have been proposed in the Monte Carlo literature suffer from high variability. This paper considers the reduction of variance that can be achieved by exploiting control variates in this setting. Our methodology is based on thermodynamic integration and applies whenever the gradient of both the log-likelihood and the log-prior with respect to the parameters can be efficiently evaluated. Results obtained on regression models and popular benchmark datasets demonstrate a significant and sometimes dramatic reduction in estimator variance and provide insight into the wider applicability of control variates to Bayesian model comparison.

preprint2014arXiv

Towards a Multi-Subject Analysis of Neural Connectivity

Directed acyclic graphs (DAGs) and associated probability models are widely used to model neural connectivity and communication channels. In many experiments, data are collected from multiple subjects whose connectivities may differ but are likely to share many features. In such circumstances it is natural to leverage similarity between subjects to improve statistical efficiency. The first exact algorithm for estimation of multiple related DAGs was recently proposed by Oates et al. 2014; in this letter we present examples and discuss implications of the methodology as applied to the analysis of fMRI data from a multi-subject experiment. Elicitation of tuning parameters requires care and we illustrate how this may proceed retrospectively based on technical replicate data. In addition to joint learning of subject-specific connectivity, we allow for heterogeneous collections of subjects and simultaneously estimate relationships between the subjects themselves. This letter aims to highlight the potential for exact estimation in the multi-subject setting.

Chris J. Oates

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Regularized Zero-Variance Control Variates

Statistical Properties of the Probabilistic Numeric Linear Solver BayesCG

Testing whether a Learning Procedure is Calibrated

The Ridgelet Prior: A Covariance Function Approach to Prior Specification for Bayesian Neural Networks

Probabilistic Iterative Methods for Linear Systems

Discussion of "Unbiased Markov chain Monte Carlo with couplings" by Pierre E. Jacob, John O'Leary and Yves F. Atchadé

Improved Calibration of Numerical Integration Error in Sigma-Point Filters

Maximum likelihood estimation and uncertainty quantification for Gaussian process approximation of deterministic functions

Control functionals for Monte Carlo integration

Discussion of "Causal inference using invariant prediction: identification and confidence intervals" by Peters, Bühlmann and Meinshausen

Estimating causal structure using conditional DAG models

Exact Estimation of Multiple Directed Acyclic Graphs

Joint estimation of multiple related biological networks

Joint Structure Learning of Multiple Non-Exchangeable Networks

Quantifying the Multi-Scale Performance of Network Inference Algorithms

The Controlled Thermodynamic Integral for Bayesian Model Comparison

Towards a Multi-Subject Analysis of Neural Connectivity