Source author record

Mark D. Risser

Mark D. Risser appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications Computation Machine Learning math.PR

Catalog footprint

What is connected

6works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Exact Gaussian Processes for Massive Datasets via Non-Stationary Sparsity-Discovering Kernels

A Gaussian Process (GP) is a prominent mathematical framework for stochastic function approximation in science and engineering applications. This success is largely attributed to the GP's analytical tractability, robustness, non-parametric structure, and natural inclusion of uncertainty quantification. Unfortunately, the use of exact GPs is prohibitively expensive for large datasets due to their unfavorable numerical complexity of $O(N^3)$ in computation and $O(N^2)$ in storage. All existing methods addressing this issue utilize some form of approximation -- usually considering subsets of the full dataset or finding representative pseudo-points that render the covariance matrix well-structured and sparse. These approximate methods can lead to inaccuracies in function approximations and often limit the user's flexibility in designing expressive kernels. Instead of inducing sparsity via data-point geometry and structure, we propose to take advantage of naturally-occurring sparsity by allowing the kernel to discover -- instead of induce -- sparse structure. The premise of this paper is that GPs, in their most native form, are often naturally sparse, but commonly-used kernels do not allow us to exploit this sparsity. The core concept of exact, and at the same time sparse GPs relies on kernel definitions that provide enough flexibility to learn and encode not only non-zero but also zero covariances. This principle of ultra-flexible, compactly-supported, and non-stationary kernels, combined with HPC and constrained optimization, lets us scale exact GPs well beyond 5 million data points.

preprint2020arXiv

Bayesian inference for high-dimensional nonstationary Gaussian processes

In spite of the diverse literature on nonstationary spatial modeling and approximate Gaussian process (GP) methods, there are no general approaches for conducting fully Bayesian inference for moderately sized nonstationary spatial data sets on a personal laptop. For statisticians and data scientists who wish to learn about spatially-referenced data and conduct posterior inference and prediction with appropriate uncertainty quantification, the lack of such approaches and corresponding software is a significant limitation. In this paper, we develop methodology for implementing formal Bayesian inference for a general class of nonstationary GPs. Our novel approach uses pre-existing frameworks for characterizing nonstationarity in a new way that is applicable for small to moderately sized data sets via modern GP likelihood approximations. Posterior sampling is implemented using flexible MCMC methods, with nonstationary posterior prediction conducted as a post-processing step. We demonstrate our novel methods on two data sets, ranging from several hundred to several thousand locations, and compare our methodology with related statistical methods that provide off-the-shelf software. All of our methods are implemented in the freely available BayesNSGP software package for R.

preprint2020arXiv

The effect of geographic sampling on evaluation of extreme precipitation in high resolution climate models

Traditional approaches for comparing global climate models and observational data products typically fail to account for the geographic location of the underlying weather station data. For modern high-resolution models, this is an oversight since there are likely grid cells where the physical output of a climate model is compared with a statistically interpolated quantity instead of actual measurements of the climate system. In this paper, we quantify the impact of geographic sampling on the relative performance of high resolution climate models' representation of precipitation extremes in Boreal winter (DJF) over the contiguous United States (CONUS), comparing model output from five early submissions to the HighResMIP subproject of the CMIP6 experiment. We find that properly accounting for the geographic sampling of weather stations can significantly change the assessment of model performance. Across the models considered, failing to account for sampling impacts the different metrics (extreme bias, spatial pattern correlation, and spatial variability) in different ways (both increasing and decreasing). We argue that the geographic sampling of weather stations should be accounted for in order to yield a more straightforward and appropriate comparison between models and observational data sets, particularly for high resolution models. While we focus on the CONUS in this paper, our results have important implications for other global land regions where the sampling problem is more severe.

preprint2019arXiv

Detected changes in precipitation extremes at their native scales derived from in situ measurements

The gridding of daily accumulated precipitation -- especially extremes -- from ground-based station observations is problematic due to the fractal nature of precipitation, and therefore estimates of long period return values and their changes based on such gridded daily data sets are generally underestimated. In this paper, we characterize high-resolution changes in observed extreme precipitation from 1950 to 2017 for the contiguous United States (CONUS) based on in situ measurements only. Our analysis utilizes spatial statistical methods that allow us to derive gridded estimates that do not smooth extreme daily measurements and are consistent with statistics from the original station data while increasing the resulting signal to noise ratio. Furthermore, we use a robust statistical technique to identify significant pointwise changes in the climatology of extreme precipitation while carefully controlling the rate of false positives. We present and discuss seasonal changes in the statistics of extreme precipitation: the largest and most spatially-coherent pointwise changes are in fall (SON), with approximately 33% of CONUS exhibiting significant changes (in an absolute sense). Other seasons display very few meaningful pointwise changes (in either a relative or absolute sense), illustrating the difficulty in detecting pointwise changes in extreme precipitation based on in situ measurements. While our main result involves seasonal changes, we also present and discuss annual changes in the statistics of extreme precipitation. In this paper we only seek to detect changes over time and leave attribution of the underlying causes of these changes for future work.

preprint2016arXiv

Review: Nonstationary Spatial Modeling, with Emphasis on Process Convolution and Covariate-Driven Approaches

In many environmental applications involving spatially-referenced data, limitations on the number and locations of observations motivate the need for practical and efficient models for spatial interpolation, or kriging. A key component of models for continuously-indexed spatial data is the covariance function, which is traditionally assumed to belong to a parametric class of stationary models. While convenient, the assumption of stationarity is rarely realistic; as a result, there is a rich literature on alternative methodologies which capture and model the nonstationarity present in most environmental processes. This review document provides a rigorous and concise description of the existing literature on nonstationary methods, paying particular attention to process convolution (also called kernel smoothing or moving average) approaches. A summary is also provided of more recent methods which leverage covariate information and yield both interpretational and computational benefits. Note: the article is borrowed from Chapters 1 and 2 of the author's Ph.D. dissertation, joint with Catherine A. Calder.

preprint2015arXiv

Regression-based covariance functions for nonstationary spatial modeling

In many environmental applications involving spatially-referenced data, limitations on the number and locations of observations motivate the need for practical and efficient models for spatial interpolation, or kriging. A key component of models for continuously-indexed spatial data is the covariance function, which is traditionally assumed to belong to a parametric class of stationary models. However, stationarity is rarely a realistic assumption. Alternative methods which more appropriately model the nonstationarity present in environmental processes often involve high-dimensional parameter spaces, which lead to difficulties in model fitting and interpretability. To overcome this issue, we build on the growing literature of covariate-driven nonstationary spatial modeling. Using process convolution techniques, we propose a Bayesian model for continuously-indexed spatial data based on a flexible parametric covariance regression structure for a convolution-kernel covariance matrix. The resulting model is a parsimonious representation of the kernel process, and we explore properties of the implied model, including a description of the resulting nonstationary covariance function and the interpretational benefits in the kernel parameters. Furthermore, we demonstrate that our model provides a practical compromise between stationary and highly parameterized nonstationary spatial covariance functions that do not perform well in practice. We illustrate our approach through an analysis of annual precipitation data.