Source author record

Alan E. Gelfand

Alan E. Gelfand appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications Computation math.ST Statistics Theory

Catalog footprint

What is connected

17works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Preferential Sampling for Bivariate Spatial Data

Preferential sampling provides a formal modeling specification to capture the effect of bias in a set of sampling locations on inference when a geostatistical model is used to explain observed responses at the sampled locations. In particular, it enables modification of spatial prediction adjusted for the bias. Its original presentation in the literature addressed assessment of the presence of such sampling bias while follow on work focused on regression specification to improve spatial interpolation under such bias. All of the work in the literature to date considers the case of a univariate response variable at each location, either continuous or modeled through a latent continuous variable. The contribution here is to extend the notion of preferential sampling to the case of bivariate response at each location. This exposes sampling scenarios where both responses are observed at a given location as well as scenarios where, for some locations, only one of the responses is recorded. That is, there may be different sampling bias for one response than for the other. It leads to assessing the impact of such bias on co-kriging. It also exposes the possibility that preferential sampling can bias inference regarding dependence between responses at a location. We develop the idea of bivariate preferential sampling through various model specifications and illustrate the effect of these specifications on prediction and dependence behavior. We do this both through simulation examples as well as with a forestry dataset that provides mean diameter at breast height (MDBH) and trees per hectare (TPH) as the point-referenced bivariate responses.

preprint2022arXiv

Spatial modeling of day-within-year temperature time series: an examination of daily maximum temperatures in Aragón, Spain

Acknowledging a considerable literature on modeling daily temperature data, we propose a multi-level spatio-temporal model which introduces several innovations in order to explain the daily maximum temperature in the summer period over 60 years in a region containing Aragón, Spain. The model operates over continuous space but adopts two discrete temporal scales, year and day within year. It captures temporal dependence through autoregression on days within year and also on years. Spatial dependence is captured through spatial process modeling of intercepts, slope coefficients, variances, and autocorrelations. The model is expressed in a form which separates fixed effects from random effects and also separates space, years, and days for each type of effect. Motivated by exploratory data analysis, fixed effects to capture the influence of elevation, seasonality and a linear trend are employed. Pure errors are introduced for years, for locations within years, and for locations at days within years. The performance of the model is checked using a leave-one-out cross-validation. Applications of the model are presented including prediction of the daily temperature series at unobserved or partially observed sites and inference to investigate climate change comparison.

preprint2022arXiv

Zero-inflated Beta distribution regression modeling

A frequent challenge encountered with ecological data is how to interpret, analyze, or model data having a high proportion of zeros. Much attention has been given to zero-inflated count data, whereas models for non-negative continuous data with an abundance of 0s are lacking. We consider zero-inflated data on the unit interval and provide modeling to capture two types of 0s in the context of the Beta regression model. We model 0s due to missing by chance through left censoring of a latent regression, and 0s due to unsuitability using an independent Bernoulli specification to create a point mass at 0. We first develop the model as a spatial regression in environmental features and then extend to introduce spatial random effects. We specify models hierarchically, employing latent variables, fit them within a Bayesian framework, and present new model comparison tools. Our motivating dataset consists of percent cover abundance of two plant species at a collection of sites in the Cape Floristic Region of South Africa. We find that environmental features enable learning about the incidence of both types of 0s as well as the positive percent covers. We also show that the spatial random effects model improves predictive performance. The proposed modeling enables ecologists, using environmental regressors, to extract a better understanding of the presence/absence of species in terms of absence due to unsuitability vs. missingness by chance, as well as abundance when present.

preprint2020arXiv

Approximate Bayesian inference for a spatial point process model exhibiting regularity and random aggregation

In this paper, we propose a doubly stochastic spatial point process model with both aggregation and repulsion. This model combines the ideas behind Strauss processes and log Gaussian Cox processes. The likelihood for this model is not expressible in closed form but it is easy to simulate realisations under the model. We therefore explain how to use approximate Bayesian computation (ABC) to carry out statistical inference for this model. We suggest a method for model validation based on posterior predictions and global envelopes. We illustrate the ABC procedure and model validation approach using both simulated point patterns and a real data example.

preprint2020arXiv

Long-term Spatial Modeling for Characteristics of Extreme Heat Events

There is increasing evidence that global warming manifests itself in more frequent warm days and that heat waves will become more frequent. Presently, a formal definition of a heat wave is not agreed upon in the literature. To avoid this debate, we consider extreme heat events, which, at a given location, are well-defined as a run of consecutive days above an associated local threshold. Characteristics of EHEs are of primary interest, such as incidence and duration, as well as the magnitude of the average exceedance and maximum exceedance above the threshold during the EHE. Using approximately 60-year time series of daily maximum temperature data collected at 18 locations in a given region, we propose a spatio-temporal model to study the characteristics of EHEs over time. The model enables prediction of the behavior of EHE characteristics at unobserved locations within the region. Specifically, our approach employs a two-state space-time model for EHEs with local thresholds where one state defines above threshold daily maximum temperatures and the other below threshold temperatures. We show that our model is able to recover the EHE characteristics of interest and outperforms a corresponding autoregressive model that ignores thresholds based on out-of-sample prediction.

preprint2019arXiv

Generalized Evolutionary Point Processes: Model Specifications and Model Comparison

Generalized evolutionary point processes offer a class of point process models that allows for either excitation or inhibition based upon the history of the process. In this regard, we propose modeling which comprises generalization of the nonlinear Hawkes process. Working within a Bayesian framework, model fitting is implemented through Markov chain Monte Carlo. This entails discussion of computation of the likelihood for such point patterns. Furthermore, for this class of models, we discuss strategies for model comparison. Using simulation, we illustrate how well we can distinguish these models from point pattern specifications with conditionally independent event times, e.g., Poisson processes. Specifically, we demonstrate that these models can correctly identify true relationships (i.e., excitation or inhibition/control). Then, we consider a novel extension of the log Gaussian Cox process that incorporates evolutionary behavior and illustrate that our model comparison approach prefers the evolutionary log Gaussian Cox process compared to simpler models. We also examine a real dataset consisting of violent crime events from the 11th police district in Chicago from the year 2018. This data exhibits strong daily seasonality and changes across the year. After we account for these data attributes, we find significant but mild self-excitation, implying that event occurrence increases the intensity of future events.

preprint2019arXiv

Multivariate Functional Data Modeling with Time-varying Clustering

We consider the situation where multivariate functional data has been collected over time at each of a set of sites. Our illustrative setting is bivariate, monitoring ozone and PM$_{10}$ levels as a function of time over the course of a year at a set of monitoring sites. The data we work with is from 24 monitoring sites in Mexico City which record hourly ozone and PM$_{10}$ levels. We use the data for the year 2017. Hence, we have 48 functions to work with. Our objective is to implement model-based clustering of the functions across the sites. Using our example, such clustering can be considered for ozone and PM$_{10}$ individually or jointly. It may occur differentially for the two pollutants. More importantly for us, we allow that such clustering can vary with time. We model the multivariate functions across sites using a multivariate Gaussian process. With many sites and several functions at each site, we use dimension reduction to provide a stochastic process specification for the distribution of the collection of multivariate functions over the say $n$ sites. Furthermore, to cluster the functions, either individually by component or jointly with all components, we use the Dirichlet process which enables shared labeling of the functions across the sites. Specifically, we cluster functions based on their response to exogenous variables. Though the functions arise in continuous time, clustering in continuous time is extremely computationally demanding and not of practical interest. Therefore, we employ a partitioning of the time scale to capture time-varying clustering.

preprint2018arXiv

Pollution State Modeling for Mexico City

Ground-level ozone and particulate matter pollutants are associated with a variety of health issues and increased mortality. For this reason, Mexican environmental agencies regulate pollutant levels. In addition, Mexico City defines pollution emergencies using thresholds that rely on regional maxima for ozone and particulate matter with diameter less than 10 micrometers ($\text{PM}_{10}$). To predict local pollution emergencies and to assess compliance to Mexican ambient air quality standards, we analyze hourly ozone and $\text{PM}_{10}$ measurements from 24 stations across Mexico City from 2017 using a bivariate spatiotemporal model. Using this model, we predict future pollutant levels using current weather conditions and recent pollutant concentrations. Using hourly pollutant projections, we predict regional maxima needed to estimate the probability of future pollution emergencies. We discuss how predicted compliance to legislated pollution limits varies across regions within Mexico City in 2017. We find that predicted probability of pollution emergencies is limited to a few time periods. In contrast, we show that predicted exceedance of Mexican ambient air quality standards is a common, nearly daily occurrence.

preprint2016arXiv

Disease Mapping with Generative Models

Disease mapping focuses on learning about areal units presenting high relative risk. Disease mapping models for disease counts specify Poisson regressions in relative risks compared with the expected counts. These models typically incorporate spatial random effects to accomplish spatial smoothing. Fitting of these models customarily computes expected disease counts via internal standardization. This places the data on both sides of the model, i.e., the counts are on the left side but they are also used to obtain the expected counts on the right side. As a result, these internally standardized models are incoherent and not generative; probabilistically, they could not produce the observed data. Here, we argue for adopting the direct generative model for disease counts. We model disease incidence instead of relative risks, using a generalized logistic regression. We extract relative risks post model fitting. We also extend the generative model to dynamic settings. We compare the generative models with internally standardized models through simulated datasets and a well-examined lung cancer morbidity data in Ohio. Each model is a spatial smoother and they smooth the data similarly with regard to relative risks. However, the generative models tend to provide tighter credible intervals. Since the generative specification is no more difficult to fit, is coherent, and is at least as good inferentially, we suggest it should be the model of choice for spatial disease mapping.

preprint2016arXiv

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets

Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This manuscript develops a class of highly scalable Nearest Neighbor Gaussian Process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze forest biomass from a massive United States Forest Inventory dataset at a scale that precludes alternative dimension-reducing methods.

preprint2016arXiv

Inference for log Gaussian Cox processes using an approximate marginal posterior

The log Gaussian Cox process is a flexible class of point pattern models for capturing spatial and spatio-temporal dependence for point patterns. Model fitting requires approximation of stochastic integrals which is implemented through discretization of the domain of interest. With fine scale discretization, inference based on Markov chain Monte Carlo is computationally heavy because of the cost of repeated iteration or inversion or Cholesky decomposition (cubic order) of high dimensional covariance matrices associated with latent Gaussian variables. Furthermore, hyperparameters for latent Gaussian variables have strong dependence with sampled latent Gaussian variables. Altogether, standard Markov chain Monte Carlo strategies are inefficient and not well behaved. In this paper, we propose an efficient computational strategy for fitting and inferring with spatial log Gaussian Cox processes. The proposed algorithm is based on a pseudo-marginal Markov chain Monte Carlo approach. We estimate an approximate marginal posterior for parameters of log Gaussian Cox processes and propose comprehensive model inference strategy. We provide details for all of the above along with some simulation investigation for the univariate and multivariate settings. As an example, we present an analysis of a point pattern of locations of three tree species, exhibiting positive and negative interaction between different species.

preprint2016arXiv

Space and circular time log Gaussian Cox processes with application to crime event data

We view the locations and times of a collection of crime events as a space-time point pattern. So, with either a nonhomogeneous Poisson process or with a more general Cox process, we need to specify a space-time intensity. For the latter, we need a \emph{random} intensity which we model as a realization of a spatio-temporal log Gaussian process. Importantly, we view time as circular not linear, necessitating valid separable and nonseparable covariance functions over a bounded spatial region crossed with circular time. In addition, crimes are classified by crime type. Furthermore, each crime event is recorded by day of the year which we convert to day of the week marks. The contribution here is to develop models to accommodate such data. Our specifications take the form of hierarchical models which we fit within a Bayesian framework. In this regard, we consider model comparison between the nonhomogeneous Poisson process and the log Gaussian Cox process. We also compare separable vs. nonseparable covariance specifications. Our motivating dataset is a collection of crime events for the city of San Francisco during the year 2012. We have location, hour, day of the year, and crime type for each event. We investigate models to enhance our understanding of the set of incidences.

preprint2015arXiv

Spatial Process Gradients and Their Use in Sensitivity Analysis for Environmental Processes

This paper develops methodology for local sensitivity analysis based on directional derivatives associated with spatial processes. Formal gradient analysis for spatial processes was elaborated in previous papers, focusing on distribution theory for directional derivatives associated with a response variable assumed to follow a Gaussian process model. In the current work, these ideas are extended to additionally accommodate a continuous covariate whose directional derivatives are also of interest and to relate the behavior of the directional derivatives of the response surface to those of the covariate surface. It is of interest to assess whether, in some sense, the gradients of the response follow those of the explanatory variable. The joint Gaussian structure of all variables, including the directional derivatives, allows for explicit distribution theory and, hence, kriging across the spatial region using multivariate normal theory. Working within a Bayesian hierarchical modeling framework, posterior samples enable all gradient analysis to occur post model fitting. As a proof of concept, we show how our methodology can be applied to a standard geostatistical modeling setting using a simulation example. For a real data illustration, we work with point pattern data, deferring our gradient analysis to the intensity surface, adopting a log-Gaussian Cox process model. In particular, we relate elevation data to point patterns associated with several tree species in Duke Forest.

preprint2013arXiv

Scaling Integral Projection Models for Analyzing Size Demography

Historically, matrix projection models (MPMs) have been employed to study population dynamics with regard to size, age or structure. To work with continuous traits, in the past decade, integral projection models (IPMs) have been proposed. Following the path for MPMs, currently, IPMs are handled first with a fitting stage, then with a projection stage. Model fitting has, so far, been done only with individual-level transition data. These data are used in the fitting stage to estimate the demographic functions (survival, growth, fecundity) that comprise the kernel of the IPM specification. The estimated kernel is then iterated from an initial trait distribution to obtain what is interpreted as steady state population behavior. Such projection results in inference that does not align with observed temporal distributions. This might be expected; a model for population level projection should be fitted with population level transitions.

preprint2013arXiv

spBayes for large univariate and multivariate point-referenced spatio-temporal data models

In this paper we detail the reformulation and rewrite of core functions in the spBayes R package. These efforts have focused on improving computational efficiency, flexibility, and usability for point-referenced data models. Attention is given to algorithm and computing developments that result in improved sampler convergence rate and efficiency by reducing parameter space; decreased sampler run-time by avoiding expensive matrix computations, and; increased scalability to large datasets by implementing a class of predictive process models that attempt to overcome computational hurdles by representing spatial processes in terms of lower-dimensional realizations. Beyond these general computational improvements for existing model functions, we detail new functions for modeling data indexed in both space and time. These new functions implement a class of dynamic spatio-temporal models for settings where space is viewed as continuous and time is taken as discrete.

preprint2010arXiv

A bivariate space-time downscaler under space and time misalignment

Ozone and particulate matter PM2.5 are co-pollutants that have long been associated with increased public health risks. Information on concentration levels for both pollutants come from two sources: monitoring sites and output from complex numerical models that produce concentration surfaces over large spatial regions. In this paper, we offer a fully-model based approach for fusing these two sources of information for the pair of co-pollutants which is computationally feasible over large spatial regions and long periods of time. Due to the association between concentration levels of the two environmental contaminants, it is expected that information regarding one will help to improve prediction of the other. Misalignment is an obvious issue since the monitoring networks for the two contaminants only partly intersect and because the collection rate for PM2.5 is typically less frequent than that for ozone. Extending previous work in Berrocal et al. (2010), we introduce a bivariate downscaler that provides a flexible class of bivariate space-time assimilation models. We discuss computational issues for model fitting and analyze a dataset for ozone and PM2.5 for the ozone season during year 2002. We show a modest improvement in predictive performance, not surprising in a setting where we can anticipate only a small gain.

preprint2010arXiv

Modeling large scale species abundance with latent spatial processes

Modeling species abundance patterns using local environmental features is an important, current problem in ecology. The Cape Floristic Region (CFR) in South Africa is a global hot spot of diversity and endemism, and provides a rich class of species abundance data for such modeling. Here, we propose a multi-stage Bayesian hierarchical model for explaining species abundance over this region. Our model is specified at areal level, where the CFR is divided into roughly $37{,}000$ one minute grid cells; species abundance is observed at some locations within some cells. The abundance values are ordinally categorized. Environmental and soil-type factors, likely to influence the abundance pattern, are included in the model. We formulate the empirical abundance pattern as a degraded version of the potential pattern, with the degradation effect accomplished in two stages. First, we adjust for land use transformation and then we adjust for measurement error, hence misclassification error, to yield the observed abundance classifications. An important point in this analysis is that only $28%$ of the grid cells have been sampled and that, for sampled grid cells, the number of sampled locations ranges from one to more than one hundred. Still, we are able to develop potential and transformed abundance surfaces over the entire region. In the hierarchical framework, categorical abundance classifications are induced by continuous latent surfaces. The degradation model above is built on the latent scale. On this scale, an areal level spatial regression model was used for modeling the dependence of species abundance on the environmental factors.

Alan E. Gelfand

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Preferential Sampling for Bivariate Spatial Data

Spatial modeling of day-within-year temperature time series: an examination of daily maximum temperatures in Aragón, Spain

Zero-inflated Beta distribution regression modeling

Approximate Bayesian inference for a spatial point process model exhibiting regularity and random aggregation

Long-term Spatial Modeling for Characteristics of Extreme Heat Events

Generalized Evolutionary Point Processes: Model Specifications and Model Comparison

Multivariate Functional Data Modeling with Time-varying Clustering

Pollution State Modeling for Mexico City

Disease Mapping with Generative Models

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets

Inference for log Gaussian Cox processes using an approximate marginal posterior

Space and circular time log Gaussian Cox processes with application to crime event data

Spatial Process Gradients and Their Use in Sensitivity Analysis for Environmental Processes

Scaling Integral Projection Models for Analyzing Size Demography

spBayes for large univariate and multivariate point-referenced spatio-temporal data models

A bivariate space-time downscaler under space and time misalignment

Modeling large scale species abundance with latent spatial processes