Source author record

Stefano Castruccio

Stefano Castruccio appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Computation

Catalog footprint

What is connected

7works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Calibration of Spatio-Temporal Forecasts from Citizen Science Urban Air Pollution Data with Sparse Recurrent Neural Networks

With their continued increase in coverage and quality, data collected from personal air quality monitors has become an increasingly valuable tool to complement existing public health monitoring systems over urban areas. However, the potential of using such `citizen science data' for automatic early warning systems is hampered by the lack of models able to capture the high resolution, nonlinear spatio-temporal features stemming from local emission sources such as traffic, residential heating and commercial activities. In this work, we propose a machine learning approach to forecast high frequency spatial fields which has two distinctive advantages from standard neural network methods in time: 1) sparsity of the neural network via a spike-and-slab prior, and 2) a small parametric space. The introduction of stochastic neural networks generates additional uncertainty, and in this work we propose a fast approach for ensure that the forecast is correctly assessed (calibration), both marginally and spatially. We focus on assessing exposure to urban air pollution in San Francisco, and our results suggest an improvement of over 58% in the mean squared error over standard time series approach with a calibrated forecast for up to 5 days.

preprint2022arXiv

Sensitivity Analysis of Wind Energy Resources with Bayesian non-Gaussian and nonstationary Functional ANOVA

The transition from non-renewable to renewable energies represents a global societal challenge, and developing a sustainable energy portfolio is an especially daunting task for developing countries where little to no information is available regarding the abundance of renewable resources such as wind. Weather model simulations are key to obtain such information when observational data are scarce and sparse over a country as large and geographically diverse as Saudi Arabia. However, output from such models is uncertain, as it depends on inputs such as the parametrization of the physical processes and the spatial resolution of the simulated domain. In such situations, a sensitivity analysis must be performed and the input may have a spatially heterogeneous influence of wind. In this work, we propose a latent Gaussian functional analysis of variance (ANOVA) model that relies on a nonstationary Gaussian Markov random field approximation of a continuous latent process. The proposed approach is able to capture the local sensitivity of Gaussian and non-Gaussian wind characteristics such as speed and threshold exceedances over a large simulation domain, and a continuous underlying process also allows us to assess the effect of different spatial resolutions. Our results indicate that (1) the non-local planetary boundary layer scheme and high spatial resolution are both instrumental in capturing wind speed and energy (especially over complex mountainous terrain), and (2) the impact of planetary boundary layer scheme and resolution on Saudi Arabia's planned wind farms is small (at most 1.4%). Thus, our results lend support for the construction of these wind farms in the next decade.

preprint2020arXiv

Improving Bayesian Local Spatial Models in Large Data Sets

Environmental processes resolved at a sufficiently small scale in space and time will inevitably display non-stationary behavior. Such processes are both challenging to model and computationally expensive when the data size is large. Instead of modeling the global non-stationarity explicitly, local models can be applied to disjoint regions of the domain. The choice of the size of these regions is dictated by a bias-variance trade-off; large regions will have smaller variance and larger bias, whereas small regions will have higher variance and smaller bias. From both the modeling and computational point of view, small regions are preferable to better accommodate the non-stationarity. However, in practice, large regions are necessary to control the variance. We propose a novel Bayesian three-step approach that allows for smaller regions without compromising the increase of the variance that would follow. We are able to propagate the uncertainty from one step to the next without issues caused by reusing the data. The improvement in inference also results in improved prediction, as our simulated example shows. We illustrate this new approach on a data set of simulated high-resolution wind speed data over Saudi Arabia.

preprint2016arXiv

A Multi-Resolution Spatio-Temporal Model for Brain Activation and Connectivity in fMRI Data

Functional Magnetic Resonance Imaging (fMRI) is a primary modality for studying brain activity. Modeling spatial dependence of imaging data at different scales is one of the main challenges of contemporary neuroimaging, and it could allow for accurate testing for significance in neural activity. The high dimensionality of this type of data (on the order of hundreds of thousands of voxels) poses serious modeling challenges and considerable computational constraints. For the sake of feasibility, standard models typically reduce dimensionality by modeling covariance among regions of interest (ROIs) -- coarser or larger spatial units -- rather than among voxels. However, ignoring spatial dependence at different scales could drastically reduce our ability to detect activation patterns in the brain and hence produce misleading results. To overcome these problems, we introduce a multi-resolution spatio-temporal model and a computationally efficient methodology to estimate cognitive control related activation and whole-brain connectivity. The proposed model allows for testing voxel-specific activation while accounting for non-stationary local spatial dependence within anatomically defined ROIs, as well as regional dependence (between-ROIs). Furthermore, the model allows for detection of interpretable connectivity patterns among ROIs using the graphical Least Absolute Shrinkage Selection Operator (LASSO). The model is used in a motor-task fMRI study to investigate brain activation and connectivity patterns aimed at identifying associations between these patterns and regaining motor functionality following a stroke.

preprint2016arXiv

An Evolutionary Spectrum Approach to Incorporate Large-scale Geographical Descriptors on Global Processes

We introduce a nonstationary spatio-temporal statistical model for gridded data on the sphere. The model specifies a computationally convenient covariance structure that depends on heterogeneous geography. Widely used statistical models on a spherical domain are nonstationary for different latitudes, but stationary at the same latitude (axial symmetry). This assumption has been acknowledged to be too restrictive for quantities such as surface temperature, whose statistical behavior is influenced by large scale geographical descriptors such as land and ocean. We propose an evolutionary spectrum approach that is able to account for different regimes across the Earth's geography, and results in a more general and flexible class of models that vastly outperforms axially symmetric models and captures longitudinal patterns that would otherwise be assumed constant. The model can be estimated with in a multi-step conditional likelihood approximation that preserves the nonstationary features while allowing for easily distributed computations: we show how the fit of a data sets larger than 20 million data can be performed in less than one day on a state-of-the-art workstation. Once the parameters are estimated, it is possible to instantaneously generate surrogate runs from a common laptop. Further, the resulting estimates from the statistical model can be regarded as a synthetic description (i.e. a compression) of the space-time characteristics of an entire initial condition ensemble. Compared to traditional algorithms aiming at compressing the bit-by-bit information on each climate model run, the proposed approach achieves vastly superior compression rates.

preprint2015arXiv

High-order Composite Likelihood Inference for Max-Stable Distributions and Processes

In multivariate or spatial extremes, inference for max-stable processes observed at a large collection of locations is among the most challenging problems in computational statistics, and current approaches typically rely on less expensive composite likelihoods constructed from small subsets of data. In this work, we explore the limits of modern state-of-the-art computational facilities to perform full likelihood inference and to efficiently evaluate high-order composite likelihoods. With extensive simulations, we assess the loss of information of composite likelihood estimators with respect to a full likelihood approach for some widely-used multivariate or spatial extreme models, we discuss how to choose composite likelihood truncation to improve the efficiency, and we also provide recommendations for practitioners.

preprint2013arXiv

Global space-time models for climate ensembles

Global climate models aim to reproduce physical processes on a global scale and predict quantities such as temperature given some forcing inputs. We consider climate ensembles made of collections of such runs with different initial conditions and forcing scenarios. The purpose of this work is to show how the simulated temperatures in the ensemble can be reproduced (emulated) with a global space/time statistical model that addresses the issue of capturing nonstationarities in latitude more effectively than current alternatives in the literature. The model we propose leads to a computationally efficient estimation procedure and, by exploiting the gridded geometry of the data, we can fit massive data sets with millions of simulated data within a few hours. Given a training set of runs, the model efficiently emulates temperature for very different scenarios and therefore is an appealing tool for impact assessment.