Researcher profile

Alan E. Gelfand

Alan E. Gelfand contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2022arXiv

Preferential Sampling for Bivariate Spatial Data

Preferential sampling provides a formal modeling specification to capture the effect of bias in a set of sampling locations on inference when a geostatistical model is used to explain observed responses at the sampled locations. In particular, it enables modification of spatial prediction adjusted for the bias. Its original presentation in the literature addressed assessment of the presence of such sampling bias while follow on work focused on regression specification to improve spatial interpolation under such bias. All of the work in the literature to date considers the case of a univariate response variable at each location, either continuous or modeled through a latent continuous variable. The contribution here is to extend the notion of preferential sampling to the case of bivariate response at each location. This exposes sampling scenarios where both responses are observed at a given location as well as scenarios where, for some locations, only one of the responses is recorded. That is, there may be different sampling bias for one response than for the other. It leads to assessing the impact of such bias on co-kriging. It also exposes the possibility that preferential sampling can bias inference regarding dependence between responses at a location. We develop the idea of bivariate preferential sampling through various model specifications and illustrate the effect of these specifications on prediction and dependence behavior. We do this both through simulation examples as well as with a forestry dataset that provides mean diameter at breast height (MDBH) and trees per hectare (TPH) as the point-referenced bivariate responses.

preprint2022arXiv

Spatial modeling of day-within-year temperature time series: an examination of daily maximum temperatures in Aragón, Spain

Acknowledging a considerable literature on modeling daily temperature data, we propose a multi-level spatio-temporal model which introduces several innovations in order to explain the daily maximum temperature in the summer period over 60 years in a region containing Aragón, Spain. The model operates over continuous space but adopts two discrete temporal scales, year and day within year. It captures temporal dependence through autoregression on days within year and also on years. Spatial dependence is captured through spatial process modeling of intercepts, slope coefficients, variances, and autocorrelations. The model is expressed in a form which separates fixed effects from random effects and also separates space, years, and days for each type of effect. Motivated by exploratory data analysis, fixed effects to capture the influence of elevation, seasonality and a linear trend are employed. Pure errors are introduced for years, for locations within years, and for locations at days within years. The performance of the model is checked using a leave-one-out cross-validation. Applications of the model are presented including prediction of the daily temperature series at unobserved or partially observed sites and inference to investigate climate change comparison.

preprint2022arXiv

Zero-inflated Beta distribution regression modeling

A frequent challenge encountered with ecological data is how to interpret, analyze, or model data having a high proportion of zeros. Much attention has been given to zero-inflated count data, whereas models for non-negative continuous data with an abundance of 0s are lacking. We consider zero-inflated data on the unit interval and provide modeling to capture two types of 0s in the context of the Beta regression model. We model 0s due to missing by chance through left censoring of a latent regression, and 0s due to unsuitability using an independent Bernoulli specification to create a point mass at 0. We first develop the model as a spatial regression in environmental features and then extend to introduce spatial random effects. We specify models hierarchically, employing latent variables, fit them within a Bayesian framework, and present new model comparison tools. Our motivating dataset consists of percent cover abundance of two plant species at a collection of sites in the Cape Floristic Region of South Africa. We find that environmental features enable learning about the incidence of both types of 0s as well as the positive percent covers. We also show that the spatial random effects model improves predictive performance. The proposed modeling enables ecologists, using environmental regressors, to extract a better understanding of the presence/absence of species in terms of absence due to unsuitability vs. missingness by chance, as well as abundance when present.

preprint2020arXiv

Approximate Bayesian inference for a spatial point process model exhibiting regularity and random aggregation

In this paper, we propose a doubly stochastic spatial point process model with both aggregation and repulsion. This model combines the ideas behind Strauss processes and log Gaussian Cox processes. The likelihood for this model is not expressible in closed form but it is easy to simulate realisations under the model. We therefore explain how to use approximate Bayesian computation (ABC) to carry out statistical inference for this model. We suggest a method for model validation based on posterior predictions and global envelopes. We illustrate the ABC procedure and model validation approach using both simulated point patterns and a real data example.

preprint2020arXiv

Long-term Spatial Modeling for Characteristics of Extreme Heat Events

There is increasing evidence that global warming manifests itself in more frequent warm days and that heat waves will become more frequent. Presently, a formal definition of a heat wave is not agreed upon in the literature. To avoid this debate, we consider extreme heat events, which, at a given location, are well-defined as a run of consecutive days above an associated local threshold. Characteristics of EHEs are of primary interest, such as incidence and duration, as well as the magnitude of the average exceedance and maximum exceedance above the threshold during the EHE. Using approximately 60-year time series of daily maximum temperature data collected at 18 locations in a given region, we propose a spatio-temporal model to study the characteristics of EHEs over time. The model enables prediction of the behavior of EHE characteristics at unobserved locations within the region. Specifically, our approach employs a two-state space-time model for EHEs with local thresholds where one state defines above threshold daily maximum temperatures and the other below threshold temperatures. We show that our model is able to recover the EHE characteristics of interest and outperforms a corresponding autoregressive model that ignores thresholds based on out-of-sample prediction.

preprint2019arXiv

Generalized Evolutionary Point Processes: Model Specifications and Model Comparison

Generalized evolutionary point processes offer a class of point process models that allows for either excitation or inhibition based upon the history of the process. In this regard, we propose modeling which comprises generalization of the nonlinear Hawkes process. Working within a Bayesian framework, model fitting is implemented through Markov chain Monte Carlo. This entails discussion of computation of the likelihood for such point patterns. Furthermore, for this class of models, we discuss strategies for model comparison. Using simulation, we illustrate how well we can distinguish these models from point pattern specifications with conditionally independent event times, e.g., Poisson processes. Specifically, we demonstrate that these models can correctly identify true relationships (i.e., excitation or inhibition/control). Then, we consider a novel extension of the log Gaussian Cox process that incorporates evolutionary behavior and illustrate that our model comparison approach prefers the evolutionary log Gaussian Cox process compared to simpler models. We also examine a real dataset consisting of violent crime events from the 11th police district in Chicago from the year 2018. This data exhibits strong daily seasonality and changes across the year. After we account for these data attributes, we find significant but mild self-excitation, implying that event occurrence increases the intensity of future events.

preprint2019arXiv

Multivariate Functional Data Modeling with Time-varying Clustering

We consider the situation where multivariate functional data has been collected over time at each of a set of sites. Our illustrative setting is bivariate, monitoring ozone and PM$_{10}$ levels as a function of time over the course of a year at a set of monitoring sites. The data we work with is from 24 monitoring sites in Mexico City which record hourly ozone and PM$_{10}$ levels. We use the data for the year 2017. Hence, we have 48 functions to work with. Our objective is to implement model-based clustering of the functions across the sites. Using our example, such clustering can be considered for ozone and PM$_{10}$ individually or jointly. It may occur differentially for the two pollutants. More importantly for us, we allow that such clustering can vary with time. We model the multivariate functions across sites using a multivariate Gaussian process. With many sites and several functions at each site, we use dimension reduction to provide a stochastic process specification for the distribution of the collection of multivariate functions over the say $n$ sites. Furthermore, to cluster the functions, either individually by component or jointly with all components, we use the Dirichlet process which enables shared labeling of the functions across the sites. Specifically, we cluster functions based on their response to exogenous variables. Though the functions arise in continuous time, clustering in continuous time is extremely computationally demanding and not of practical interest. Therefore, we employ a partitioning of the time scale to capture time-varying clustering.

preprint2018arXiv

Pollution State Modeling for Mexico City

Ground-level ozone and particulate matter pollutants are associated with a variety of health issues and increased mortality. For this reason, Mexican environmental agencies regulate pollutant levels. In addition, Mexico City defines pollution emergencies using thresholds that rely on regional maxima for ozone and particulate matter with diameter less than 10 micrometers ($\text{PM}_{10}$). To predict local pollution emergencies and to assess compliance to Mexican ambient air quality standards, we analyze hourly ozone and $\text{PM}_{10}$ measurements from 24 stations across Mexico City from 2017 using a bivariate spatiotemporal model. Using this model, we predict future pollutant levels using current weather conditions and recent pollutant concentrations. Using hourly pollutant projections, we predict regional maxima needed to estimate the probability of future pollution emergencies. We discuss how predicted compliance to legislated pollution limits varies across regions within Mexico City in 2017. We find that predicted probability of pollution emergencies is limited to a few time periods. In contrast, we show that predicted exceedance of Mexican ambient air quality standards is a common, nearly daily occurrence.

preprint2010arXiv

Modeling large scale species abundance with latent spatial processes

Modeling species abundance patterns using local environmental features is an important, current problem in ecology. The Cape Floristic Region (CFR) in South Africa is a global hot spot of diversity and endemism, and provides a rich class of species abundance data for such modeling. Here, we propose a multi-stage Bayesian hierarchical model for explaining species abundance over this region. Our model is specified at areal level, where the CFR is divided into roughly $37{,}000$ one minute grid cells; species abundance is observed at some locations within some cells. The abundance values are ordinally categorized. Environmental and soil-type factors, likely to influence the abundance pattern, are included in the model. We formulate the empirical abundance pattern as a degraded version of the potential pattern, with the degradation effect accomplished in two stages. First, we adjust for land use transformation and then we adjust for measurement error, hence misclassification error, to yield the observed abundance classifications. An important point in this analysis is that only $28%$ of the grid cells have been sampled and that, for sampled grid cells, the number of sampled locations ranges from one to more than one hundred. Still, we are able to develop potential and transformed abundance surfaces over the entire region. In the hierarchical framework, categorical abundance classifications are induced by continuous latent surfaces. The degradation model above is built on the latent scale. On this scale, an areal level spatial regression model was used for modeling the dependence of species abundance on the environmental factors.