Source author record

Marc G. Genton

Marc G. Genton appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications Computation math.ST Statistics Theory Distributed, Parallel, and Cluster Computing physics.ao-ph

Catalog footprint

What is connected

31works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Fisher Scoring for Exact Matérn Covariance Estimation through Stable Smoothness Optimization

Gaussian Random Fields (GRFs) with Matérn covariance functions have emerged as a powerful framework for modeling spatial processes due to their flexibility in capturing different features of the spatial field. However, the smoothness parameter is challenging to estimate using maximum likelihood estimation (MLE), which involves evaluating the likelihood based on the full covariance matrix of the GRF, due to numerical instability. Moreover, MLE remains computationally prohibitive for large spatial datasets. To address this challenge, we propose the Fisher-BackTracking (Fisher-BT) method, which integrates the Fisher scoring algorithm with a backtracking line search strategy and adopts a series approximation for the modified Bessel function. This method enables an efficient MLE estimation for spatial datasets using the ExaGeoStat high-performance computing framework. Our proposed method not only reduces the number of iterations and accelerates convergence compared to derivative-free optimization methods but also improves the numerical stability of the smoothness parameter estimation. Through simulations and real-data analysis using a soil moisture dataset covering the Mississippi River Basin, we show that the proposed Fisher-BT method achieves accuracy comparable to existing approaches while significantly outperforming derivative-free algorithms such as BOBYQA and Nelder-Mead in terms of computational efficiency and numerical stability.

preprint2022arXiv

Are You All Normal? It Depends!

The assumption of normality has underlain much of the development of statistics, including spatial statistics, and many tests have been proposed. In this work, we focus on the multivariate setting and first review the recent advances in multivariate normality tests for i.i.d. data, with emphasis on the skewness and kurtosis approaches. We show through simulation studies that some of these tests cannot be used directly for testing normality of spatial data. We further review briefly the few existing univariate tests under dependence (time or space), and then propose a new multivariate normality test for spatial data by accounting for the spatial dependence. The new test utilizes the union-intersection principle to decompose the null hypothesis into intersections of univariate normality hypotheses for projection data, and it rejects the multivariate normality if any individual hypothesis is rejected. The individual hypotheses for univariate normality are conducted using a Jarque-Bera type test statistic that accounts for the spatial dependence in the data. We also show in simulation studies that the new test has a good control of the type I error and a high empirical power, especially for large sample sizes. We further illustrate our test on bivariate wind data over the Arabian Peninsula.

preprint2022arXiv

Functional Time Series Analysis Based on Records

In many phenomena, data are collected on a large scale and of different frequencies. In this context, functional data analysis (FDA) has become an important statistical methodology for analyzing and modeling such data. The approach of FDA is to assume that data are continuous functions and that each continuous function is considered as a single observation. Thus, FDA deals with large-scale and complex data. However, visualization and exploratory data analysis, which is very important in practice, can be challenging due to the complexity of the continuous functions. Here we propose some nonparametric tools for functional data observed over time (functional time series). For that, we propose to use the concept of record. We study the properties of the trajectory of the number of record curves under different scenarios. Also, we propose a unit root test based on the number of records. The trajectory of the number of records over time and the unit root test can be used as visualization and exploratory data analysis. We illustrate the advantages of our proposal through a Monte Carlo simulation study. We also illustrate our method on two different datasets: Annual mortality rates in France and daily wind speed curves at Yanbu, Saudi Arabia. Overall, we can identify the type of functional time series being studied based on the number of record curves observed.

preprint2022arXiv

Multivariate Functional Outlier Detection using the FastMUOD Indices

We present definitions and properties of the fast massive unsupervised outlier detection (FastMUOD) indices, used for outlier detection (OD) in functional data. FastMUOD detects outliers by computing, for each curve, an amplitude, magnitude and shape index meant to target the corresponding types of outliers. Some methods adapting FastMUOD to outlier detection in multivariate functional data are then proposed. These include applying FastMUOD on the components of the multivariate data and using random projections. Moreover, these techniques are tested on various simulated and real multivariate functional datasets. Compared with the state of the art in multivariate functional OD, the use of random projections showed the most effective results with similar, and in some cases improved, OD performance.

preprint2022arXiv

Nonseparable Space-Time Stationary Covariance Functions on Networks cross Time

The advent of data science has provided an increasing number of challenges with high data complexity. This paper addresses the challenge of space-time data where the spatial domain is not a planar surface, a sphere, or a linear network, but a generalized network (termed a graph with Euclidean edges). Additionally, data are repeatedly measured over different temporal instants. We provide new classes of nonseparable space-time stationary covariance functions where {\em space} can be a generalized network, a Euclidean tree, or a linear network, and where time can be linear or circular (seasonal). Because the construction principles are technical, we focus on illustrations that guide the reader through the construction of statistically interpretable examples. A simulation study demonstrates that we can recover the correct model when compared to misspecified models. In addition, our simulation studies show that we effectively recover simulation parameters. In our data analysis, we consider a traffic accident dataset that shows improved model performance based on covariance specifications and network-based metrics.

preprint2022arXiv

Scalable computation of predictive probabilities in probit models with Gaussian process priors

Predictive models for binary data are fundamental in various fields, and the growing complexity of modern applications has motivated several flexible specifications for modeling the relationship between the observed predictors and the binary responses. A widely-implemented solution is to express the probability parameter via a probit mapping of a Gaussian process indexed by predictors. However, unlike for continuous settings, there is a lack of closed-form results for predictive distributions in binary models with Gaussian process priors. Markov chain Monte Carlo methods and approximation strategies provide common solutions to this problem, but state-of-the-art algorithms are either computationally intractable or inaccurate in moderate-to-high dimensions. In this article, we aim to cover this gap by deriving closed-form expressions for the predictive probabilities in probit Gaussian processes that rely either on cumulative distribution functions of multivariate Gaussians or on functionals of multivariate truncated normals. To evaluate these quantities we develop novel scalable solutions based on tile-low-rank Monte Carlo methods for computing multivariate Gaussian probabilities, and on mean-field variational approximations of multivariate truncated normals. Closed-form expressions for the marginal likelihood and for the posterior distribution of the Gaussian process are also discussed. As shown in simulated and real-world empirical studies, the proposed methods scale to dimensions where state-of-the-art solutions are impractical.

preprint2022arXiv

Sensitivity Analysis of Wind Energy Resources with Bayesian non-Gaussian and nonstationary Functional ANOVA

The transition from non-renewable to renewable energies represents a global societal challenge, and developing a sustainable energy portfolio is an especially daunting task for developing countries where little to no information is available regarding the abundance of renewable resources such as wind. Weather model simulations are key to obtain such information when observational data are scarce and sparse over a country as large and geographically diverse as Saudi Arabia. However, output from such models is uncertain, as it depends on inputs such as the parametrization of the physical processes and the spatial resolution of the simulated domain. In such situations, a sensitivity analysis must be performed and the input may have a spatially heterogeneous influence of wind. In this work, we propose a latent Gaussian functional analysis of variance (ANOVA) model that relies on a nonstationary Gaussian Markov random field approximation of a continuous latent process. The proposed approach is able to capture the local sensitivity of Gaussian and non-Gaussian wind characteristics such as speed and threshold exceedances over a large simulation domain, and a continuous underlying process also allows us to assess the effect of different spatial resolutions. Our results indicate that (1) the non-local planetary boundary layer scheme and high spatial resolution are both instrumental in capturing wind speed and energy (especially over complex mountainous terrain), and (2) the impact of planetary boundary layer scheme and resolution on Saudi Arabia's planned wind farms is small (at most 1.4%). Thus, our results lend support for the construction of these wind farms in the next decade.

preprint2022arXiv

Sparse Functional Boxplots for Multivariate Curves

This paper introduces the sparse functional boxplot and the intensity sparse functional boxplot as practical exploratory tools. Besides being available for complete functional data, they can be used in sparse univariate and multivariate functional data. The sparse functional boxplot, based on the functional boxplot, displays sparseness proportions within the 50\% central region. The intensity sparse functional boxplot indicates the relative intensity of fitted sparse point patterns in the central region. The two-stage functional boxplot, which derives from the functional boxplot to detect outliers, is furthermore extended to its sparse form. We also contribute to sparse data fitting improvement and sparse multivariate functional data depth. In a simulation study, we evaluate the goodness of data fitting, several depth proposals for sparse multivariate functional data, and compare the results of outlier detection between the sparse functional boxplot and its two-stage version. The practical applications of the sparse functional boxplot and intensity sparse functional boxplot are illustrated with two public health datasets. Supplementary materials and codes are available for readers to apply our visualization tools and replicate the analysis.

preprint2022arXiv

Spatio-Temporal Cross-Covariance Functions under the Lagrangian Framework with Multiple Advections

When analyzing the spatio-temporal dependence in most environmental and earth sciences variables such as pollutant concentrations at different levels of the atmosphere, a special property is observed: the covariances and cross-covariances are stronger in certain directions. This property is attributed to the presence of natural forces, such as wind, which cause the transport and dispersion of these variables. This spatio-temporal dynamics prompted the use of the Lagrangian reference frame alongside any Gaussian spatio-temporal geostatistical model. Under this modeling framework, a whole new class was birthed and was known as the class of spatio-temporal covariance functions under the Lagrangian framework, with several developments already established in the univariate setting, in both stationary and nonstationary formulations, but less so in the multivariate case. Despite the many advances in this modeling approach, efforts have yet to be directed to probing the case for the use of multiple advections, especially when several variables are involved. Accounting for multiple advections would make the Lagrangian framework a more viable approach in modeling realistic multivariate transport scenarios. In this work, we establish a class of Lagrangian spatio-temporal cross-covariance functions with multiple advections, study its properties, and demonstrate its use on a bivariate pollutant dataset of particulate matter in Saudi Arabia.

preprint2022arXiv

Sub-dimensional Mardia measures of multivariate skewness and kurtosis

The Mardia measures of multivariate skewness and kurtosis summarize the respective characteristics of a multivariate distribution with two numbers. However, these measures do not reflect the sub-dimensional features of the distribution. Consequently, testing procedures based on these measures may fail to detect skewness or kurtosis present in a sub-dimension of the multivariate distribution. We introduce sub-dimensional Mardia measures of multivariate skewness and kurtosis, and investigate the information they convey about all sub-dimensional distributions of some symmetric and skewed families of multivariate distributions. The maxima of the sub-dimensional Mardia measures of multivariate skewness and kurtosis are considered, as these reflect the maximum skewness and kurtosis present in the distribution, and also allow us to identify the sub-dimension bearing the highest skewness and kurtosis. Asymptotic distributions of the vectors of sub-dimensional Mardia measures of multivariate skewness and kurtosis are derived, based on which testing procedures for the presence of skewness and of deviation from Gaussian kurtosis are developed. The performances of these tests are compared with some existing tests in the literature on simulated and real datasets.

preprint2021arXiv

Conditional Normal Extreme-Value Copulas

We propose a new class of extreme-value copulas which are extreme-value limits of conditional normal models. Conditional normal models are generalizations of conditional independence models, where the dependence among observed variables is modeled using one unobserved factor. Conditional on this factor, the distribution of these variables is given by the Gaussian copula. This structure allows one to build flexible and parsimonious models for data with complex dependence structures, such as data with spatial dependence or factor structure. We study the extreme-value limits of these models and show some interesting special cases of the proposed class of copulas. We develop estimation methods for the proposed models and conduct a simulation study to assess the performance of these algorithms. Finally, we apply these copula models to analyze data on monthly wind maxima and stock return minima.

preprint2021arXiv

Tractable Bayes of Skew-Elliptical Link Models for Correlated Binary Data

Correlated binary response data with covariates are ubiquitous in longitudinal or spatial studies. Among the existing statistical models the most well-known one for this type of data is the multivariate probit model, which uses a Gaussian link to model dependence at the latent level. However, a symmetric link may not be appropriate if the data are highly imbalanced. Here, we propose a multivariate skew-elliptical link model for correlated binary responses, which includes the multivariate probit model as a special case. Furthermore, we perform Bayesian inference for this new model and prove that the regression coefficients have a closed-form unified skew-elliptical posterior. The new methodology is illustrated by application to COVID-19 pandemic data from three different counties of the state of California, USA. By jointly modeling extreme spikes in weekly new cases, our results show that the spatial dependence cannot be neglected. Furthermore, the results also show that the skewed latent structure of our proposed model improves the flexibility of the multivariate probit model and provides better fit to our highly imbalanced dataset.

preprint2021arXiv

Vector Autoregressive Models with Spatially Structured Coefficients for Time Series on a Spatial Grid

We propose a parsimonious spatiotemporal model for time series data on a spatial grid. Our model is capable of dealing with high-dimensional time series data that may be collected at hundreds of locations and capturing the spatial non-stationarity. In essence, our model is a vector autoregressive model that utilizes the spatial structure to achieve parsimony of autoregressive matrices at two levels. The first level ensures the sparsity of the autoregressive matrices using a lagged-neighborhood scheme. The second level performs a spatial clustering of the non-zero autoregressive coefficients such that nearby locations share similar coefficients. This model is interpretable and can be used to identify geographical subregions, within each of which, the time series share similar dynamical behavior with homogeneous autoregressive coefficients. The model parameters are obtained using the penalized maximum likelihood with an adaptive fused Lasso penalty. The estimation procedure is easy to implement and can be tailored to the need of a modeler. We illustrate the performance of the proposed estimation algorithm in a simulation study. We apply our model to a wind speed time series dataset generated from a climate model over Saudi Arabia to illustrate its usefulness. Limitations and possible extensions of our method are also discussed.

preprint2020arXiv

A Pairwise Hotelling Method for Testing High-Dimensional Mean Vectors

For high-dimensional small sample size data, Hotelling's T2 test is not applicable for testing mean vectors due to the singularity problem in the sample covariance matrix. To overcome the problem, there are three main approaches in the literature. Note, however, that each of the existing approaches may have serious limitations and only works well in certain situations. Inspired by this, we propose a pairwise Hotelling method for testing high-dimensional mean vectors, which, in essence, provides a good balance between the existing approaches. To effectively utilize the correlation information, we construct the new test statistics as the summation of Hotelling's test statistics for the covariate pairs with strong correlations and the squared $t$ statistics for the individual covariates that have little correlation with others. We further derive the asymptotic null distributions and power functions for the proposed Hotelling tests under some regularity conditions. Numerical results show that our new tests are able to control the type I error rates, and can achieve a higher statistical power compared to existing methods, especially when the covariates are highly correlated. Two real data examples are also analyzed and they both demonstrate the efficacy of our pairwise Hotelling tests.

preprint2020arXiv

Functional Outlier Detection and Taxonomy by Sequential Transformations

Functional data analysis can be seriously impaired by abnormal observations, which can be classified as either magnitude or shape outliers based on their way of deviating from the bulk of data. Identifying magnitude outliers is relatively easy, while detecting shape outliers is much more challenging. We propose turning the shape outliers into magnitude outliers through data transformation and detecting them using the functional boxplot. Besides easing the detection procedure, applying several transformations sequentially provides a reasonable taxonomy for the flagged outliers. A joint functional ranking, which consists of several transformations, is also defined here. Simulation studies are carried out to evaluate the performance of the proposed method using different functional depth notions. Interesting results are obtained in several practical applications.

preprint2020arXiv

Geostatistical Modeling and Prediction Using Mixed-Precision Tile Cholesky Factorization

Geostatistics represents one of the most challenging classes of scientific applications due to the desire to incorporate an ever increasing number of geospatial locations to accurately model and predict environmental phenomena. For example, the evaluation of the Gaussian log-likelihood function, which constitutes the main computational phase, involves solving systems of linear equations with a large dense symmetric and positive definite covariance matrix. Cholesky, the standard algorithm, requires O(n^3) floating point operators and has an O(n^2) memory footprint, where n is the number of geographical locations. Here, we present a mixed-precision tile algorithm to accelerate the Cholesky factorization during the log-likelihood function evaluation. Under an appropriate ordering, it operates with double-precision arithmetic on tiles around the diagonal, while reducing to single-precision arithmetic for tiles sufficiently far off. This translates into an improvement of the performance without any deterioration of the numerical accuracy of the application. We rely on the StarPU dynamic runtime system to schedule the tasks and to overlap them with data movement. To assess the performance and the accuracy of the proposed mixed-precision algorithm, we use synthetic and real datasets on various shared and distributed-memory systems possibly equipped with hardware accelerators. We compare our mixed-precision Cholesky factorization against the double-precision reference implementation as well as an independent block approximation method. We obtain an average of 1.6X performance speedup on massively parallel architectures while maintaining the accuracy necessary for modeling and prediction.

preprint2020arXiv

Improving Bayesian Local Spatial Models in Large Data Sets

Environmental processes resolved at a sufficiently small scale in space and time will inevitably display non-stationary behavior. Such processes are both challenging to model and computationally expensive when the data size is large. Instead of modeling the global non-stationarity explicitly, local models can be applied to disjoint regions of the domain. The choice of the size of these regions is dictated by a bias-variance trade-off; large regions will have smaller variance and larger bias, whereas small regions will have higher variance and smaller bias. From both the modeling and computational point of view, small regions are preferable to better accommodate the non-stationarity. However, in practice, large regions are necessary to control the variance. We propose a novel Bayesian three-step approach that allows for smaller regions without compromising the increase of the variance that would follow. We are able to propagate the uncertainty from one step to the next without issues caused by reusing the data. The improvement in inference also results in improved prediction, as our simulated example shows. We illustrate this new approach on a data set of simulated high-resolution wind speed data over Saudi Arabia.

preprint2020arXiv

Nonparametric Trend Estimation in Functional Time Series with Application to Annual Mortality Rates

Here, we address the problem of trend estimation for functional time series. Existing contributions either deal with detecting a functional trend or assuming a simple model. They consider neither the estimation of a general functional trend nor the analysis of functional time series with a functional trend component. Similarly to univariate time series, we propose an alternative methodology to analyze functional time series, taking into account a functional trend component. We propose to estimate the functional trend by using a tensor product surface that is easy to implement, to interpret, and allows to control the smoothness properties of the estimator. Through a Monte Carlo study, we simulate different scenarios of functional processes to show that our estimator accurately identifies the functional trend component. We also show that the dependency structure of the estimated stationary time series component is not significantly affected by the error approximation of the functional trend component. We apply our methodology to annual mortality rates in France.

preprint2020arXiv

Recent Developments in Complex and Spatially Correlated Functional Data

As high-dimensional and high-frequency data are being collected on a large scale, the development of new statistical models is being pushed forward. Functional data analysis provides the required statistical methods to deal with large-scale and complex data by assuming that data are continuous functions, e.g., a realization of a continuous process (curves) or continuous random fields (surfaces), and that each curve or surface is considered as a single observation. Here, we provide an overview of functional data analysis when data are complex and spatially correlated. We provide definitions and estimators of the first and second moments of the corresponding functional random variable. We present two main approaches: The first assumes that data are realizations of a functional random field, i.e., each observation is a curve with a spatial component. We call them 'spatial functional data'. The second approach assumes that data are continuous deterministic fields observed over time. In this case, one observation is a surface or manifold, and we call them 'surface time series'. For the two approaches, we describe software available for the statistical analysis. We also present a data illustration, using a high-resolution wind speed simulated dataset, as an example of the two approaches. The functional data approach offers a new paradigm of data analysis, where the continuous processes or random fields are considered as a single entity. We consider this approach to be very valuable in the context of big data.

preprint2019arXiv

Spatial Blind Source Separation

Recently a blind source separation model was suggested for spatial data together with an estimator based on the simultaneous diagonalisation of two scatter matrices. The asymptotic properties of this estimator are derived here and a new estimator, based on the joint diagonalisation of more than two scatter matrices, is proposed. The asymptotic properties and merits of the novel estimator are verified in simulation studies. A real data example illustrates the method.

preprint2018arXiv

ExaGeoStat: A High Performance Unified Software for Geostatistics on Manycore Systems

We present ExaGeoStat, a high performance framework for geospatial statistics in climate and environment modeling. In contrast to simulation based on partial differential equations derived from first-principles modeling, ExaGeoStat employs a statistical model based on the evaluation of the Gaussian log-likelihood function, which operates on a large dense covariance matrix. Generated by the parametrizable Matern covariance function, the resulting matrix is symmetric and positive definite. The computational tasks involved during the evaluation of the Gaussian log-likelihood function become daunting as the number n of geographical locations grows, as O(n2) storage and O(n3) operations are required. While many approximation methods have been devised from the side of statistical modeling to ameliorate these polynomial complexities, we are interested here in the complementary approach of evaluating the exact algebraic result by exploiting advances in solution algorithms and many-core computer architectures. Using state-of-the-art high performance dense linear algebra libraries associated with various leading edge parallel architectures (Intel KNLs, NVIDIA GPUs, and distributed-memory systems), ExaGeoStat raises the game for statistical applications from climate and environmental science. ExaGeoStat provides a reference evaluation of statistical parameters, with which to assess the validity of the various approaches based on approximation. The framework takes a first step in the merger of large-scale data analytics and extreme computing for geospatial statistical applications, to be followed by additional complexity reducing improvements from the solver side that can be implemented under the same interface. Thus, a single uncompromised statistical model can ultimately be executed in a wide variety of emerging exascale environments.

preprint2016arXiv

A Multi-Resolution Spatio-Temporal Model for Brain Activation and Connectivity in fMRI Data

Functional Magnetic Resonance Imaging (fMRI) is a primary modality for studying brain activity. Modeling spatial dependence of imaging data at different scales is one of the main challenges of contemporary neuroimaging, and it could allow for accurate testing for significance in neural activity. The high dimensionality of this type of data (on the order of hundreds of thousands of voxels) poses serious modeling challenges and considerable computational constraints. For the sake of feasibility, standard models typically reduce dimensionality by modeling covariance among regions of interest (ROIs) -- coarser or larger spatial units -- rather than among voxels. However, ignoring spatial dependence at different scales could drastically reduce our ability to detect activation patterns in the brain and hence produce misleading results. To overcome these problems, we introduce a multi-resolution spatio-temporal model and a computationally efficient methodology to estimate cognitive control related activation and whole-brain connectivity. The proposed model allows for testing voxel-specific activation while accounting for non-stationary local spatial dependence within anatomically defined ROIs, as well as regional dependence (between-ROIs). Furthermore, the model allows for detection of interpretable connectivity patterns among ROIs using the graphical Least Absolute Shrinkage Selection Operator (LASSO). The model is used in a motor-task fMRI study to investigate brain activation and connectivity patterns aimed at identifying associations between these patterns and regaining motor functionality following a stroke.

preprint2016arXiv

Bayesian linear regression with skew-symmetric error distributions with applications to survival analysis

We study Bayesian linear regression models with skew-symmetric scale mixtures of normal error distributions. These kinds of models can be used to capture departures from the usual assumption of normality of the errors in terms of heavy tails and asymmetry. We propose a general non-informative prior structure for these regression models and show that the corresponding posterior distribution is proper under mild conditions. We extend these propriety results to cases where the response variables are censored. The latter scenario is of interest in the context of accelerated failure time models, which are relevant in survival analysis. We present a simulation study that demonstrates good frequentist properties of the posterior credible intervals associated to the proposed priors. This study also sheds some light on the trade-off between increased model flexibility and the risk of over-fitting. We illustrate the performance of the proposed models with real data. Although we focus on models with univariate response variables, we also present some extensions to the multivariate case in the Supporting Web Material.

preprint2016arXiv

Multi-Level Restricted Maximum Likelihood Covariance Estimation and Kriging for Large Non-Gridded Spatial Datasets

We develop a multi-level restricted Gaussian maximum likelihood method for estimating the covariance function parameters and computing the best unbiased predictor. Our approach produces a new set of multi-level contrasts where the deterministic parameters of the model are filtered out thus enabling the estimation of the covariance parameters to be decoupled from the deterministic component. Moreover, the multi-level covariance matrix of the contrasts exhibit fast decay that is dependent on the smoothness of the covariance function. Due to the fast decay of the multi-level covariance matrix coefficients only a small set is computed with a level dependent criterion. We demonstrate our approach on problems of up to 512,000 observations with a Matern covariance function and highly irregular placements of the observations. In addition, these problems are numerically unstable and hard to solve with traditional methods.

preprint2016arXiv

Non-Stationary Dependence Structures for Spatial Extremes

Max-stable processes are natural models for spatial extremes because they provide suitable asymptotic approximations to the distribution of maxima of random fields. In the recent past, several parametric families of stationary max-stable models have been developed, and fitted to various types of data. However, a recurrent problem is the modeling of non-stationarity. In this paper, we develop non-stationary max-stable dependence structures in which covariates can be easily incorporated. Inference is performed using pairwise likelihoods, and its performance is assessed by an extensive simulation study based on a non-stationary locally isotropic extremal $t$ model. Evidence that unknown parameters are well estimated is provided, and estimation of spatial return level curves is discussed. The methodology is demonstrated with temperature maxima recorded over a complex topography. Models are shown to satisfactorily capture extremal dependence.

preprint2015arXiv

Cross-Covariance Functions for Multivariate Geostatistics

Continuously indexed datasets with multiple variables have become ubiquitous in the geophysical, ecological, environmental and climate sciences, and pose substantial analysis challenges to scientists and statisticians. For many years, scientists developed models that aimed at capturing the spatial behavior for an individual process; only within the last few decades has it become commonplace to model multiple processes jointly. The key difficulty is in specifying the cross-covariance function, that is, the function responsible for the relationship between distinct variables. Indeed, these cross-covariance functions must be chosen to be consistent with marginal covariance functions in such a way that the second-order structure always yields a nonnegative definite covariance matrix. We review the main approaches to building cross-covariance models, including the linear model of coregionalization, convolution methods, the multivariate Matérn and nonstationary and space-time extensions of these among others. We additionally cover specialized constructions, including those designed for asymmetry, compact support and spherical domains, with a review of physics-constrained models. We illustrate select models on a bivariate regional climate model output example for temperature and pressure, along with a bivariate minimum and maximum temperature observational dataset; we compare models by likelihood value as well as via cross-validation co-kriging studies. The article closes with a discussion of unsolved problems.

preprint2015arXiv

Efficient Maximum Approximated Likelihood Inference for Tukey's g-and-h Distribution

Tukey's $g$-and-$h$ distribution has been a powerful tool for data exploration and modeling since its introduction. However, two long standing challenges associated with this distribution family have remained unsolved until this day: how to find an optimal estimation procedure and how to make valid statistical inference on unknown parameters. To overcome these two challenges, a computationally efficient estimation procedure based on maximizing an approximated likelihood function of the Tukey's $g$-and-$h$ distribution is proposed and is shown to have the same estimation efficiency as the maximum likelihood estimator under mild conditions. The asymptotic distribution of the proposed estimator is derived and a series of approximated likelihood ratio test statistics are developed to conduct hypothesis tests involving two shape parameters of Tukey's $g$-and-$h$ distribution. Simulation examples and an analysis of air pollution data are used to demonstrate the effectiveness of the proposed estimation and testing procedures.

preprint2015arXiv

Likelihood estimators for multivariate extremes

The main approach to inference for multivariate extremes consists in approximating the joint upper tail of the observations by a parametric family arising in the limit for extreme events. The latter may be expressed in terms of componentwise maxima, high threshold exceedances or point processes, yielding different but related asymptotic characterizations and estimators. The present paper clarifies the connections between the main likelihood estimators, and assesses their practical performance. We investigate their ability to estimate the extremal dependence structure and to predict future extremes, using exact calculations and simulation, in the case of the logistic model.

preprint2015arXiv

On nomenclature for, and the relative merits of, two formulations of skew distributions

We examine some distributions used extensively within the model-based clustering literature in recent years, paying special attention to} claims that have been made about their relative efficacy. Theoretical arguments are provided as well as real data examples.

preprint2015arXiv

Rejoinder of ``Cross-Covariance Functions for Multivariate Geostatistics''

Rejoinder of ``Cross-Covariance Functions for Multivariate Geostatistics'' by Genton and Kleiber [arXiv:1507.08017].

preprint2014arXiv

Incorporating geostrophic wind information for improved space-time short-term wind speed forecasting

Accurate short-term wind speed forecasting is needed for the rapid development and efficient operation of wind energy resources. This is, however, a very challenging problem. Although on the large scale, the wind speed is related to atmospheric pressure, temperature, and other meteorological variables, no improvement in forecasting accuracy was found by incorporating air pressure and temperature directly into an advanced space-time statistical forecasting model, the trigonometric direction diurnal (TDD) model. This paper proposes to incorporate the geostrophic wind as a new predictor in the TDD model. The geostrophic wind captures the physical relationship between wind and pressure through the observed approximate balance between the pressure gradient force and the Coriolis acceleration due to the Earth's rotation. Based on our numerical experiments with data from West Texas, our new method produces more accurate forecasts than does the TDD model using air pressure and temperature for 1- to 6-hour-ahead forecasts based on three different evaluation criteria. Furthermore, forecasting errors can be further reduced by using moving average hourly wind speeds to fit the diurnal pattern. For example, our new method obtains between 13.9% and 22.4% overall mean absolute error reduction relative to persistence in 2-hour-ahead forecasts, and between 5.3% and 8.2% reduction relative to the best previous space-time methods in this setting.

Marc G. Genton

What is connected

Connect this record

See the researcher in context

Building this map preview

31 published item(s)

Fisher Scoring for Exact Matérn Covariance Estimation through Stable Smoothness Optimization

Are You All Normal? It Depends!

Functional Time Series Analysis Based on Records

Multivariate Functional Outlier Detection using the FastMUOD Indices

Nonseparable Space-Time Stationary Covariance Functions on Networks cross Time

Scalable computation of predictive probabilities in probit models with Gaussian process priors

Sensitivity Analysis of Wind Energy Resources with Bayesian non-Gaussian and nonstationary Functional ANOVA

Sparse Functional Boxplots for Multivariate Curves

Spatio-Temporal Cross-Covariance Functions under the Lagrangian Framework with Multiple Advections

Sub-dimensional Mardia measures of multivariate skewness and kurtosis

Conditional Normal Extreme-Value Copulas

Tractable Bayes of Skew-Elliptical Link Models for Correlated Binary Data

Vector Autoregressive Models with Spatially Structured Coefficients for Time Series on a Spatial Grid

A Pairwise Hotelling Method for Testing High-Dimensional Mean Vectors

Functional Outlier Detection and Taxonomy by Sequential Transformations

Geostatistical Modeling and Prediction Using Mixed-Precision Tile Cholesky Factorization

Improving Bayesian Local Spatial Models in Large Data Sets

Nonparametric Trend Estimation in Functional Time Series with Application to Annual Mortality Rates

Recent Developments in Complex and Spatially Correlated Functional Data

Spatial Blind Source Separation

ExaGeoStat: A High Performance Unified Software for Geostatistics on Manycore Systems

A Multi-Resolution Spatio-Temporal Model for Brain Activation and Connectivity in fMRI Data

Bayesian linear regression with skew-symmetric error distributions with applications to survival analysis

Multi-Level Restricted Maximum Likelihood Covariance Estimation and Kriging for Large Non-Gridded Spatial Datasets

Non-Stationary Dependence Structures for Spatial Extremes

Cross-Covariance Functions for Multivariate Geostatistics

Efficient Maximum Approximated Likelihood Inference for Tukey's g-and-h Distribution

Likelihood estimators for multivariate extremes

On nomenclature for, and the relative merits of, two formulations of skew distributions

Rejoinder of ``Cross-Covariance Functions for Multivariate Geostatistics''

Incorporating geostrophic wind information for improved space-time short-term wind speed forecasting