Source author record

Gustau Camps-Valls

Gustau Camps-Valls appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision physics.geo-ph eess.IV physics.comp-ph Applications math.ST Methodology physics.ao-ph Statistics Theory

Catalog footprint

What is connected

17works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Inference over radiative transfer models using variational and expectation maximization methods

Earth observation from satellites offers the possibility to monitor our planet with unprecedented accuracy. Radiative transfer models (RTMs) encode the energy transfer through the atmosphere, and are used to model and understand the Earth system, as well as to estimate the parameters that describe the status of the Earth from satellite observations by inverse modeling. However, performing inference over such simulators is a challenging problem. RTMs are nonlinear, non-differentiable and computationally costly codes, which adds a high level of difficulty in inference. In this paper, we introduce two computational techniques to infer not only point estimates of biophysical parameters but also their joint distribution. One of them is based on a variational autoencoder approach and the second one is based on a Monte Carlo Expectation Maximization (MCEM) scheme. We compare and discuss benefits and drawbacks of each approach. We also provide numerical comparisons in synthetic simulations and the real PROSAIL model, a popular RTM that combines land vegetation leaf and canopy modeling. We analyze the performance of the two approaches for modeling and inferring the distribution of three key biophysical parameters for quantifying the terrestrial biosphere.

preprint2022arXiv

The Kernelized Taylor Diagram

This paper presents the kernelized Taylor diagram, a graphical framework for visualizing similarities between data populations. The kernelized Taylor diagram builds on the widely used Taylor diagram, which is used to visualize similarities between populations. However, the Taylor diagram has several limitations such as not capturing non-linear relationships and sensitivity to outliers. To address such limitations, we propose the kernelized Taylor diagram. Our proposed kernelized Taylor diagram is capable of visualizing similarities between populations with minimal assumptions of the data distributions. The kernelized Taylor diagram relates the maximum mean discrepancy and the kernel mean embedding in a single diagram, a construction that, to the best of our knowledge, have not been devised prior to this work. We believe that the kernelized Taylor diagram can be a valuable tool in data visualization.

preprint2022arXiv

Unsupervised Anomaly and Change Detection with Multivariate Gaussianization

Anomaly detection is a field of intense research. Identifying low probability events in data/images is a challenging problem given the high-dimensionality of the data, especially when no (or little) information about the anomaly is available a priori. While plenty of methods are available, the vast majority of them do not scale well to large datasets and require the choice of some (very often critical) hyperparameters. Therefore, unsupervised and computationally efficient detection methods become strictly necessary. We propose an unsupervised method for detecting anomalies and changes in remote sensing images by means of a multivariate Gaussianization methodology that allows to estimate multivariate densities accurately, a long-standing problem in statistics and machine learning. The methodology transforms arbitrarily complex multivariate data into a multivariate Gaussian distribution. Since the transformation is differentiable, by applying the change of variables formula one can estimate the probability at any point of the original domain. The assumption is straightforward: pixels with low estimated probability are considered anomalies. Our method can describe any multivariate distribution, makes an efficient use of memory and computational resources, and is parameter-free. We show the efficiency of the method in experiments involving both anomaly detection and change detection in different remote sensing image sets. Results show that our approach outperforms other linear and nonlinear methods in terms of detection power in both anomaly and change detection scenarios, showing robustness and scalability to dimensionality and sample sizes.

preprint2021arXiv

Graph Embedding via High Dimensional Model Representation for Hyperspectral Images

Learning the manifold structure of remote sensing images is of paramount relevance for modeling and understanding processes, as well as to encapsulate the high dimensionality in a reduced set of informative features for subsequent classification, regression, or unmixing. Manifold learning methods have shown excellent performance to deal with hyperspectral image (HSI) analysis but, unless specifically designed, they cannot provide an explicit embedding map readily applicable to out-of-sample data. A common assumption to deal with the problem is that the transformation between the high-dimensional input space and the (typically low) latent space is linear. This is a particularly strong assumption, especially when dealing with hyperspectral images due to the well-known nonlinear nature of the data. To address this problem, a manifold learning method based on High Dimensional Model Representation (HDMR) is proposed, which enables to present a nonlinear embedding function to project out-of-sample samples into the latent space. The proposed method is compared to manifold learning methods along with its linear counterparts and achieves promising performance in terms of classification accuracy of a representative set of hyperspectral images.

preprint2021arXiv

Integrating Domain Knowledge in Data-driven Earth Observation with Process Convolutions

The modelling of Earth observation data is a challenging problem, typically approached by either purely mechanistic or purely data-driven methods. Mechanistic models encode the domain knowledge and physical rules governing the system. Such models, however, need the correct specification of all interactions between variables in the problem and the appropriate parameterization is a challenge in itself. On the other hand, machine learning approaches are flexible data-driven tools, able to approximate arbitrarily complex functions, but lack interpretability and struggle when data is scarce or in extrapolation regimes. In this paper, we argue that hybrid learning schemes that combine both approaches can address all these issues efficiently. We introduce Gaussian process (GP) convolution models for hybrid modelling in Earth observation (EO) problems. We specifically propose the use of a class of GP convolution models called latent force models (LFMs) for EO time series modelling, analysis and understanding. LFMs are hybrid models that incorporate physical knowledge encoded in differential equations into a multioutput GP model. LFMs can transfer information across time-series, cope with missing observations, infer explicit latent functions forcing the system, and learn parameterizations which are very helpful for system analysis and interpretability. We consider time series of soil moisture from active (ASCAT) and passive (SMOS, AMSR2) microwave satellites. We show how assuming a first order differential equation as governing equation, the model automatically estimates the e-folding time or decay rate related to soil moisture persistence and discovers latent forces related to precipitation. The proposed hybrid methodology reconciles the two main approaches in remote sensing parameter estimation by blending statistical learning and mechanistic modeling.

preprint2021arXiv

Retrieval of Coloured Dissolved Organic Matter with Machine Learning Methods

The coloured dissolved organic matter (CDOM) concentration is the standard measure of humic substance in natural waters. CDOM measurements by remote sensing is calculated using the absorption coefficient (a) at a certain wavelength (e.g. 440nm). This paper presents a comparison of four machine learning methods for the retrieval of CDOM from remote sensing signals: regularized linear regression (RLR), random forest (RF), kernel ridge regression (KRR) and Gaussian process regression (GPR). Results are compared with the established polynomial regression algorithms. RLR is revealed as the simplest and most efficient method, followed closely by its nonlinear counterpart KRR.

preprint2020arXiv

A Perspective on Gaussian Processes for Earth Observation

Earth observation (EO) by airborne and satellite remote sensing and in-situ observations play a fundamental role in monitoring our planet. In the last decade, machine learning and Gaussian processes (GPs) in particular has attained outstanding results in the estimation of bio-geo-physical variables from the acquired images at local and global scales in a time-resolved manner. GPs provide not only accurate estimates but also principled uncertainty estimates for the predictions, can easily accommodate multimodal data coming from different sensors and from multitemporal acquisitions, allow the introduction of physical knowledge, and a formal treatment of uncertainty quantification and error propagation. Despite great advances in forward and inverse modelling, GP models still have to face important challenges that are revised in this perspective paper. GP models should evolve towards data-driven physics-aware models that respect signal characteristics, be consistent with elementary laws of physics, and move from pure regression to observational causal inference.

preprint2020arXiv

Accounting for Input Noise in Gaussian Process Parameter Retrieval

Gaussian processes (GPs) are a class of Kernel methods that have shown to be very useful in geoscience and remote sensing applications for parameter retrieval, model inversion, and emulation. They are widely used because they are simple, flexible, and provide accurate estimates. GPs are based on a Bayesian statistical framework which provides a posterior probability function for each estimation. Therefore, besides the usual prediction (given in this case by the mean function), GPs come equipped with the possibility to obtain a predictive variance (i.e., error bars, confidence intervals) for each prediction. Unfortunately, the GP formulation usually assumes that there is no noise in the inputs, only in the observations. However, this is often not the case in earth observation problems where an accurate assessment of the measuring instrument error is typically available, and where there is huge interest in characterizing the error propagation through the processing pipeline. In this letter, we demonstrate how one can account for input noise estimates using a GP model formulation which propagates the error terms using the derivative of the predictive mean function. We analyze the resulting predictive variance term and show how they more accurately represent the model error in a temperature prediction problem from infrared sounding data.

preprint2020arXiv

Efficient Nonlinear RX Anomaly Detectors

Current anomaly detection algorithms are typically challenged by either accuracy or efficiency. More accurate nonlinear detectors are typically slow and not scalable. In this letter, we propose two families of techniques to improve the efficiency of the standard kernel Reed-Xiaoli (RX) method for anomaly detection by approximating the kernel function with either {\em data-independent} random Fourier features or {\em data-dependent} basis with the Nyström approach. We compare all methods for both real multi- and hyperspectral images. We show that the proposed efficient methods have a lower computational cost and they perform similar (or outperform) the standard kernel RX algorithm thanks to their implicit regularization effect. Last but not least, the Nyström approach has an improved power of detection.

preprint2020arXiv

Kernel Methods and their derivatives: Concept and perspectives for the Earth system sciences

Kernel methods are powerful machine learning techniques which implement generic non-linear functions to solve complex tasks in a simple way. They Have a solid mathematical background and exhibit excellent performance in practice. However, kernel machines are still considered black-box models as the feature mapping is not directly accessible and difficult to interpret.The aim of this work is to show that it is indeed possible to interpret the functions learned by various kernel methods is intuitive despite their complexity. Specifically, we show that derivatives of these functions have a simple mathematical formulation, are easy to compute, and can be applied to many different problems. We note that model function derivatives in kernel machines is proportional to the kernel function derivative. We provide the explicit analytic form of the first and second derivatives of the most common kernel functions with regard to the inputs as well as generic formulas to compute higher order derivatives. We use them to analyze the most used supervised and unsupervised kernel learning methods: Gaussian Processes for regression, Support Vector Machines for classification, Kernel Entropy Component Analysis for density estimation, and the Hilbert-Schmidt Independence Criterion for estimating the dependency between random variables. For all cases we expressed the derivative of the learned function as a linear combination of the kernel function derivative. Moreover we provide intuitive explanations through illustrative toy examples and show how to improve the interpretation of real applications in the context of spatiotemporal Earth system data cubes. This work reflects on the observation that function derivatives may play a crucial role in kernel methods analysis and understanding.

preprint2020arXiv

Nonlinear PCA for Spatio-Temporal Analysis of Earth Observation Data

Remote sensing observations, products and simulations are fundamental sources of information to monitor our planet and its climate variability. Uncovering the main modes of spatial and temporal variability in Earth data is essential to analyze and understand the underlying physical dynamics and processes driving the Earth System. Dimensionality reduction methods can work with spatio-temporal datasets and decompose the information efficiently. Principal Component Analysis (PCA), also known as Empirical Orthogonal Functions (EOF) in geophysics, has been traditionally used to analyze climatic data. However, when nonlinear feature relations are present, PCA/EOF fails. In this work, we propose a nonlinear PCA method to deal with spatio-temporal Earth System data. The proposed method, called Rotated Complex Kernel PCA (ROCK-PCA for short), works in reproducing kernel Hilbert spaces to account for nonlinear processes, operates in the complex kernel domain to account for both space and time features, and adds an extra rotation for improved flexibility. The result is an explicitly resolved spatio-temporal decomposition of the Earth data cube. The method is unsupervised and computationally very efficient.We illustrate its ability to uncover spatio-temporal patterns using synthetic experiments and real data. Results of the decomposition of three essential climate variables are shown: satellite-based global Gross Primary Productivity (GPP) and Soil Moisture (SM), and reanalysis Sea Surface Temperature (SST) data. The ROCK-PCA method allows identifying their annual and seasonal oscillations, as well as their non-seasonal trends and spatial variability patterns.

preprint2016arXiv

Dimensionality Reduction via Regression in Hyperspectral Imagery

This paper introduces a new unsupervised method for dimensionality reduction via regression (DRR). The algorithm belongs to the family of invertible transforms that generalize Principal Component Analysis (PCA) by using curvilinear instead of linear features. DRR identifies the nonlinear features through multivariate regression to ensure the reduction in redundancy between he PCA coefficients, the reduction of the variance of the scores, and the reduction in the reconstruction error. More importantly, unlike other nonlinear dimensionality reduction methods, the invertibility, volume-preservation, and straightforward out-of-sample extension, makes DRR interpretable and easy to apply. The properties of DRR enable learning a more broader class of data manifolds than the recently proposed Non-linear Principal Components Analysis (NLPCA) and Principal Polynomial Analysis (PPA). We illustrate the performance of the representation in reducing the dimensionality of remote sensing data. In particular, we tackle two common problems: processing very high dimensional spectral information such as in hyperspectral image sounding data, and dealing with spatial-spectral image patches of multispectral images. Both settings pose collinearity and ill-determination problems. Evaluation of the expressive power of the features is assessed in terms of truncation error, estimating atmospheric variables, and surface land cover classification error. Results show that DRR outperforms linear PCA and recently proposed invertible extensions based on neural networks (NLPCA) and univariate regressions (PPA).

preprint2016arXiv

Optimized Kernel Entropy Components

This work addresses two main issues of the standard Kernel Entropy Component Analysis (KECA) algorithm: the optimization of the kernel decomposition and the optimization of the Gaussian kernel parameter. KECA roughly reduces to a sorting of the importance of kernel eigenvectors by entropy instead of by variance as in Kernel Principal Components Analysis. In this work, we propose an extension of the KECA method, named Optimized KECA (OKECA), that directly extracts the optimal features retaining most of the data entropy by means of compacting the information in very few features (often in just one or two). The proposed method produces features which have higher expressive power. In particular, it is based on the Independent Component Analysis (ICA) framework, and introduces an extra rotation to the eigen-decomposition, which is optimized via gradient ascent search. This maximum entropy preservation suggests that OKECA features are more efficient than KECA features for density estimation. In addition, a critical issue in both methods is the selection of the kernel parameter since it critically affects the resulting performance. Here we analyze the most common kernel length-scale selection criteria. Results of both methods are illustrated in different synthetic and real problems. Results show that 1) OKECA returns projections with more expressive power than KECA, 2) the most successful rule for estimating the kernel parameter is based on maximum likelihood, and 3) OKECA is more robust to the selection of the length-scale parameter in kernel density estimation.

preprint2016arXiv

Principal Polynomial Analysis

This paper presents a new framework for manifold learning based on a sequence of principal polynomials that capture the possibly nonlinear nature of the data. The proposed Principal Polynomial Analysis (PPA) generalizes PCA by modeling the directions of maximal variance by means of curves, instead of straight lines. Contrarily to previous approaches, PPA reduces to performing simple univariate regressions, which makes it computationally feasible and robust. Moreover, PPA shows a number of interesting analytical properties. First, PPA is a volume-preserving map, which in turn guarantees the existence of the inverse. Second, such an inverse can be obtained in closed form. Invertibility is an important advantage over other learning methods, because it permits to understand the identified features in the input domain where the data has physical meaning. Moreover, it allows to evaluate the performance of dimensionality reduction in sensible (input-domain) units. Volume preservation also allows an easy computation of information theoretic quantities, such as the reduction in multi-information after the transform. Third, the analytical nature of PPA leads to a clear geometrical interpretation of the manifold: it allows the computation of Frenet-Serret frames (local features) and of generalized curvatures at any point of the space. And fourth, the analytical Jacobian allows the computation of the metric induced by the data, thus generalizing the Mahalanobis distance. These properties are demonstrated theoretically and illustrated experimentally. The performance of PPA is evaluated in dimensionality and redundancy reduction, in both synthetic and real datasets from the UCI repository.

preprint2016arXiv

Sensitivity Maps of the Hilbert-Schmidt Independence Criterion

Kernel dependence measures yield accurate estimates of nonlinear relations between random variables, and they are also endorsed with solid theoretical properties and convergence rates. Besides, the empirical estimates are easy to compute in closed form just involving linear algebra operations. However, they are hampered by two important problems: the high computational cost involved, as two kernel matrices of the sample size have to be computed and stored, and the interpretability of the measure, which remains hidden behind the implicit feature map. We here address these two issues. We introduce the Sensitivity Maps (SMs) for the Hilbert-Schmidt independence criterion (HSIC). Sensitivity maps allow us to explicitly analyze and visualize the relative relevance of both examples and features on the dependence measure. We also present the randomized HSIC (RHSIC) and its corresponding sensitivity maps to cope with large scale problems. We build upon the framework of random features and the Bochner's theorem to approximate the involved kernels in the canonical HSIC. The power of the RHSIC measure scales favourably with the number of samples, and it approximates HSIC and the sensitivity maps efficiently. Convergence bounds of both the measure and the sensitivity map are also provided. Our proposal is illustrated in synthetic examples, and challenging real problems of dependence estimation, feature selection, and causal inference from empirical data.

preprint2015arXiv

Kernel Manifold Alignment

We introduce a kernel method for manifold alignment (KEMA) and domain adaptation that can match an arbitrary number of data sources without needing corresponding pairs, just few labeled examples in all domains. KEMA has interesting properties: 1) it generalizes other manifold alignment methods, 2) it can align manifolds of very different complexities, performing a sort of manifold unfolding plus alignment, 3) it can define a domain-specific metric to cope with multimodal specificities, 4) it can align data spaces of different dimensionality, 5) it is robust to strong nonlinear feature deformations, and 6) it is closed-form invertible which allows transfer across-domains and data synthesis. We also present a reduced-rank version for computational efficiency and discuss the generalization performance of KEMA under Rademacher principles of stability. KEMA exhibits very good performance over competing methods in synthetic examples, visual object recognition and recognition of facial expressions tasks.

preprint2015arXiv

Unsupervised Deep Feature Extraction for Remote Sensing Image Classification

This paper introduces the use of single layer and deep convolutional networks for remote sensing data analysis. Direct application to multi- and hyper-spectral imagery of supervised (shallow or deep) convolutional networks is very challenging given the high input data dimensionality and the relatively small amount of available labeled data. Therefore, we propose the use of greedy layer-wise unsupervised pre-training coupled with a highly efficient algorithm for unsupervised learning of sparse features. The algorithm is rooted on sparse representations and enforces both population and lifetime sparsity of the extracted features, simultaneously. We successfully illustrate the expressive power of the extracted representations in several scenarios: classification of aerial scenes, as well as land-use classification in very high resolution (VHR), or land-cover classification from multi- and hyper-spectral images. The proposed algorithm clearly outperforms standard Principal Component Analysis (PCA) and its kernel counterpart (kPCA), as well as current state-of-the-art algorithms of aerial classification, while being extremely computationally efficient at learning representations of data. Results show that single layer convolutional networks can extract powerful discriminative features only when the receptive field accounts for neighboring pixels, and are preferred when the classification requires high resolution and detailed results. However, deep architectures significantly outperform single layers variants, capturing increasing levels of abstraction and complexity throughout the feature hierarchy.

Gustau Camps-Valls

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Inference over radiative transfer models using variational and expectation maximization methods

The Kernelized Taylor Diagram

Unsupervised Anomaly and Change Detection with Multivariate Gaussianization

Graph Embedding via High Dimensional Model Representation for Hyperspectral Images

Integrating Domain Knowledge in Data-driven Earth Observation with Process Convolutions

Retrieval of Coloured Dissolved Organic Matter with Machine Learning Methods

A Perspective on Gaussian Processes for Earth Observation

Accounting for Input Noise in Gaussian Process Parameter Retrieval

Efficient Nonlinear RX Anomaly Detectors

Kernel Methods and their derivatives: Concept and perspectives for the Earth system sciences

Nonlinear PCA for Spatio-Temporal Analysis of Earth Observation Data

Dimensionality Reduction via Regression in Hyperspectral Imagery

Optimized Kernel Entropy Components

Principal Polynomial Analysis

Sensitivity Maps of the Hilbert-Schmidt Independence Criterion

Kernel Manifold Alignment

Unsupervised Deep Feature Extraction for Remote Sensing Image Classification