Source author record

Daniel Hernández-Lobato

Daniel Hernández-Lobato appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Applications astro-ph.HE Computation Methodology physics.geo-ph

Catalog footprint

What is connected

18works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Conditional Diffusion Sampling

Sampling from unnormalized multimodal distributions with limited density evaluations remains a fundamental challenge in machine learning and natural sciences. Successful approaches construct a bridge between a tractable reference and the target distribution. Parallel Tempering (PT) serves as the gold standard, while recent diffusion-based approaches offer a continuous alternative at the cost of neural training. In this work, we introduce Conditional Diffusion Sampling (CDS), a framework that combines these two paradigms. To this end, we derive Conditional Interpolants, a class of stochastic processes whose transport dynamics are governed by an exact, closed-form stochastic differential equation (SDE), requiring no neural approximation. Although these dynamics require sampling from a non-trivial initialization distribution, we show both theoretically and empirically that the cost of this initialization diminishes for sufficiently short diffusion times. CDS leverages this by a two-stage procedure: (1) PT is used to efficiently sample the initial distribution, and then (2) samples are transported via the transport SDE. This combination couples the robust global exploration of PT with efficient local transport. Experiments suggest that CDS has the potential to achieve a superior trade-off between sample quality and density evaluation cost compared to state-of-the-art samplers.

preprint2022arXiv

Correcting Model Bias with Sparse Implicit Processes

Model selection in machine learning (ML) is a crucial part of the Bayesian learning procedure. Model choice may impose strong biases on the resulting predictions, which can hinder the performance of methods such as Bayesian neural networks and neural samplers. On the other hand, newly proposed approaches for Bayesian ML exploit features of approximate inference in function space with implicit stochastic processes (a generalization of Gaussian processes). The approach of Sparse Implicit Processes (SIP) is particularly successful in this regard, since it is fully trainable and achieves flexible predictions. Here, we expand on the original experiments to show that SIP is capable of correcting model bias when the data generating mechanism differs strongly from the one implied by the model. We use synthetic datasets to show that SIP is capable of providing predictive distributions that reflect the data better than the exact predictions of the initial, but wrongly assumed model.

preprint2022arXiv

Efficient Transformed Gaussian Processes for Non-Stationary Dependent Multi-class Classification

This work introduces the Efficient Transformed Gaussian Process (ETGP), a new way of creating C stochastic processes characterized by: 1) the C processes are non-stationary, 2) the C processes are dependent by construction without needing a mixing matrix, 3) training and making predictions is very efficient since the number of Gaussian Processes (GP) operations (e.g. inverting the inducing point's covariance matrix) do not depend on the number of processes. This makes the ETGP particularly suited for multi-class problems with a very large number of classes, which are the problems studied in this work. ETGPs exploit the recently proposed Transformed Gaussian Process (TGP), a stochastic process specified by transforming a Gaussian Process using an invertible transformation. However, unlike TGPs, ETGPs are constructed by transforming a single sample from a GP using C invertible transformations. We derive an efficient sparse variational inference algorithm for the proposed model and demonstrate its utility in 5 classification tasks which include low/medium/large datasets and a different number of classes, ranging from just a few to hundreds. Our results show that ETGPs, in general, outperform state-of-the-art methods for multi-class classification based on GPs, and have a lower computational cost (around one order of magnitude smaller).

preprint2022arXiv

Function-space Inference with Sparse Implicit Processes

Implicit Processes (IPs) represent a flexible framework that can be used to describe a wide variety of models, from Bayesian neural networks, neural samplers and data generators to many others. IPs also allow for approximate inference in function-space. This change of formulation solves intrinsic degenerate problems of parameter-space approximate inference concerning the high number of parameters and their strong dependencies in large models. For this, previous works in the literature have attempted to employ IPs both to set up the prior and to approximate the resulting posterior. However, this has proven to be a challenging task. Existing methods that can tune the prior IP result in a Gaussian predictive distribution, which fails to capture important data patterns. By contrast, methods producing flexible predictive distributions by using another IP to approximate the posterior process cannot tune the prior IP to the observed data. We propose here the first method that can accomplish both goals. For this, we rely on an inducing-point representation of the prior IP, as often done in the context of sparse Gaussian processes. The result is a scalable method for approximate inference with IPs that can tune the prior IP parameters to the data, and that provides accurate non-Gaussian predictive distributions.

preprint2022arXiv

Gaussian Processes for Missing Value Imputation

Missing values are common in many real-life datasets. However, most of the current machine learning methods can not handle missing values. This means that they should be imputed beforehand. Gaussian Processes (GPs) are non-parametric models with accurate uncertainty estimates that combined with sparse approximations and stochastic variational inference scale to large data sets. Sparse GPs can be used to compute a predictive distribution for missing data. Here, we present a hierarchical composition of sparse GPs that is used to predict missing values at each dimension using all the variables from the other dimensions. We call the approach missing GP (MGP). MGP can be trained simultaneously to impute all observed missing values. Specifically, it outputs a predictive distribution for each missing value that is then used in the imputation of other missing values. We evaluate MGP in one private clinical data set and four UCI datasets with a different percentage of missing values. We compare the performance of MGP with other state-of-the-art methods for imputing missing values, including variants based on sparse GPs and deep GPs. The results obtained show a significantly better performance of MGP.

preprint2022arXiv

Inference over radiative transfer models using variational and expectation maximization methods

Earth observation from satellites offers the possibility to monitor our planet with unprecedented accuracy. Radiative transfer models (RTMs) encode the energy transfer through the atmosphere, and are used to model and understand the Earth system, as well as to estimate the parameters that describe the status of the Earth from satellite observations by inverse modeling. However, performing inference over such simulators is a challenging problem. RTMs are nonlinear, non-differentiable and computationally costly codes, which adds a high level of difficulty in inference. In this paper, we introduce two computational techniques to infer not only point estimates of biophysical parameters but also their joint distribution. One of them is based on a variational autoencoder approach and the second one is based on a Monte Carlo Expectation Maximization (MCEM) scheme. We compare and discuss benefits and drawbacks of each approach. We also provide numerical comparisons in synthetic simulations and the real PROSAIL model, a popular RTM that combines land vegetation leaf and canopy modeling. We analyze the performance of the two approaches for modeling and inferring the distribution of three key biophysical parameters for quantifying the terrestrial biosphere.

preprint2020arXiv

Multi-class Gaussian Process Classification with Noisy Inputs

It is a common practice in the machine learning community to assume that the observed data are noise-free in the input attributes. Nevertheless, scenarios with input noise are common in real problems, as measurements are never perfectly accurate. If this input noise is not taken into account, a supervised machine learning method is expected to perform sub-optimally. In this paper, we focus on multi-class classification problems and use Gaussian processes (GPs) as the underlying classifier. Motivated by a data set coming from the astrophysics domain, we hypothesize that the observed data may contain noise in the inputs. Therefore, we devise several multi-class GP classifiers that can account for input noise. Such classifiers can be efficiently trained using variational inference to approximate the posterior distribution of the latent variables of the model. Moreover, in some situations, the amount of noise can be known before-hand. If this is the case, it can be readily introduced in the proposed methods. This prior information is expected to lead to better performance results. We have evaluated the proposed methods by carrying out several experiments, involving synthetic and real data. These include several data sets from the UCI repository, the MNIST data set and a data set coming from astrophysics. The results obtained show that, although the classification error is similar across methods, the predictive distribution of the proposed methods is better, in terms of the test log-likelihood, than the predictive distribution of a classifier based on GPs that ignores input noise.

preprint2017arXiv

Dealing with Integer-valued Variables in Bayesian Optimization with Gaussian Processes

Bayesian optimization (BO) methods are useful for optimizing functions that are expensive to evaluate, lack an analytical expression and whose evaluations can be contaminated by noise. These methods rely on a probabilistic model of the objective function, typically a Gaussian process (GP), upon which an acquisition function is built. This function guides the optimization process and measures the expected utility of performing an evaluation of the objective at a new point. GPs assume continous input variables. When this is not the case, such as when some of the input variables take integer values, one has to introduce extra approximations. A common approach is to round the suggested variable value to the closest integer before doing the evaluation of the objective. We show that this can lead to problems in the optimization process and describe a more principled approach to account for input variables that are integer-valued. We illustrate in both synthetic and a real experiments the utility of our approach, which significantly improves the results of standard BO methods on problems involving integer-valued variables.

preprint2016arXiv

Black-box $α$-divergence Minimization

Black-box alpha (BB-$α$) is a new approximate inference method based on the minimization of $α$-divergences. BB-$α$ scales to large datasets because it can be implemented using stochastic gradient descent. BB-$α$ can be applied to complex probabilistic models with little effort since it only requires as input the likelihood function and its gradients. These gradients can be easily obtained using automatic differentiation. By changing the divergence parameter $α$, the method is able to interpolate between variational Bayes (VB) ($α\rightarrow 0$) and an algorithm similar to expectation propagation (EP) ($α= 1$). Experiments on probit regression and neural network regression and classification problems show that BB-$α$ with non-standard settings of $α$, such as $α= 0.5$, usually produces better predictions than with $α\rightarrow 0$ (VB) or $α= 1$ (EP).

preprint2016arXiv

Deep Gaussian Processes for Regression using Approximate Expectation Propagation

Deep Gaussian processes (DGPs) are multi-layer hierarchical generalisations of Gaussian processes (GPs) and are formally equivalent to neural networks with multiple, infinitely wide hidden layers. DGPs are nonparametric probabilistic models and as such are arguably more flexible, have a greater capacity to generalise, and provide better calibrated uncertainty estimates than alternative deep models. This paper develops a new approximate Bayesian learning scheme that enables DGPs to be applied to a range of medium to large scale regression problems for the first time. The new method uses an approximate Expectation Propagation procedure and a novel and efficient extension of the probabilistic backpropagation algorithm for learning. We evaluate the new method for non-linear regression on eleven real-world datasets, showing that it always outperforms GP regression and is almost always better than state-of-the-art deterministic and sampling-based approximate inference methods for Bayesian neural networks. As a by-product, this work provides a comprehensive analysis of six approximate Bayesian methods for training neural networks.

preprint2016arXiv

Non-linear Causal Inference using Gaussianity Measures

We provide theoretical and empirical evidence for a type of asymmetry between causes and effects that is present when these are related via linear models contaminated with additive non-Gaussian noise. Assuming that the causes and the effects have the same distribution, we show that the distribution of the residuals of a linear fit in the anti-causal direction is closer to a Gaussian than the distribution of the residuals in the causal direction. This Gaussianization effect is characterized by reduction of the magnitude of the high-order cumulants and by an increment of the differential entropy of the residuals. The problem of non-linear causal inference is addressed by performing an embedding in an expanded feature space, in which the relation between causes and effects can be assumed to be linear. The effectiveness of a method to discriminate between causes and effects based on this type of asymmetry is illustrated in a variety of experiments using different measures of Gaussianity. The proposed method is shown to be competitive with state-of-the-art techniques for causal inference.

preprint2016arXiv

Predictive Entropy Search for Multi-objective Bayesian Optimization

We present PESMO, a Bayesian method for identifying the Pareto set of multi-objective optimization problems, when the functions are expensive to evaluate. The central idea of PESMO is to choose evaluation points so as to maximally reduce the entropy of the posterior distribution over the Pareto set. Critically, the PESMO multi-objective acquisition function can be decomposed as a sum of objective-specific acquisition functions, which enables the algorithm to be used in \emph{decoupled} scenarios in which the objectives can be evaluated separately and perhaps with different costs. This decoupling capability also makes it possible to identify difficult objectives that require more evaluations. PESMO also offers gains in efficiency, as its cost scales linearly with the number of objectives, in comparison to the exponential cost of other methods. We compare PESMO with other related methods for multi-objective Bayesian optimization on synthetic and real-world problems. The results show that PESMO produces better recommendations with a smaller number of evaluations of the objectives, and that a decoupled evaluation can lead to improvements in performance, particularly when the number of objectives is large.

preprint2015arXiv

Scalable Gaussian Process Classification via Expectation Propagation

Variational methods have been recently considered for scaling the training process of Gaussian process classifiers to large datasets. As an alternative, we describe here how to train these classifiers efficiently using expectation propagation. The proposed method allows for handling datasets with millions of data instances. More precisely, it can be used for (i) training in a distributed fashion where the data instances are sent to different nodes in which the required computations are carried out, and for (ii) maximizing an estimate of the marginal likelihood using a stochastic approximation of the gradient. Several experiments indicate that the method described is competitive with the variational approach.

preprint2015arXiv

Stochastic Expectation Propagation for Large Scale Gaussian Process Classification

A method for large scale Gaussian process classification has been recently proposed based on expectation propagation (EP). Such a method allows Gaussian process classifiers to be trained on very large datasets that were out of the reach of previous deployments of EP and has been shown to be competitive with related techniques based on stochastic variational inference. Nevertheless, the memory resources required scale linearly with the dataset size, unlike in variational methods. This is a severe limitation when the number of instances is very large. Here we show that this problem is avoided when stochastic EP is used to train the model.

preprint2015arXiv

Training Deep Gaussian Processes using Stochastic Expectation Propagation and Probabilistic Backpropagation

Deep Gaussian processes (DGPs) are multi-layer hierarchical generalisations of Gaussian processes (GPs) and are formally equivalent to neural networks with multiple, infinitely wide hidden layers. DGPs are probabilistic and non-parametric and as such are arguably more flexible, have a greater capacity to generalise, and provide better calibrated uncertainty estimates than alternative deep models. The focus of this paper is scalable approximate Bayesian learning of these networks. The paper develops a novel and efficient extension of probabilistic backpropagation, a state-of-the-art method for training Bayesian neural networks, that can be used to train DGPs. The new method leverages a recently proposed method for scaling Expectation Propagation, called stochastic Expectation Propagation. The method is able to automatically discover useful input warping, expansion or compression, and it is therefore is a flexible form of Bayesian kernel design. We demonstrate the success of the new method for supervised learning on several real-world datasets, showing that it typically outperforms GP regression and is never much worse.

preprint2014arXiv

Mind the Nuisance: Gaussian Process Classification using Privileged Noise

The learning with privileged information setting has recently attracted a lot of attention within the machine learning community, as it allows the integration of additional knowledge into the training process of a classifier, even when this comes in the form of a data modality that is not available at test time. Here, we show that privileged information can naturally be treated as noise in the latent function of a Gaussian Process classifier (GPC). That is, in contrast to the standard GPC setting, the latent function is not just a nuisance but a feature: it becomes a natural measure of confidence about the training data by modulating the slope of the GPC sigmoid likelihood function. Extensive experiments on public datasets show that the proposed GPC method using privileged noise, called GPC+, improves over a standard GPC without privileged knowledge, and also over the current state-of-the-art SVM-based method, SVM+. Moreover, we show that advanced neural networks and deep learning methods can be compressed as privileged information.

preprint2013arXiv

Gaussian Process Conditional Copulas with Applications to Financial Time Series

The estimation of dependencies between multiple variables is a central problem in the analysis of financial time series. A common approach is to express these dependencies in terms of a copula function. Typically the copula function is assumed to be constant but this may be inaccurate when there are covariates that could have a large influence on the dependence structure of the data. To account for this, a Bayesian framework for the estimation of conditional copulas is proposed. In this framework the parameters of a copula are non-linearly related to some arbitrary conditioning variables. We evaluate the ability of our method to predict time-varying dependencies on several equities and currencies and observe consistent performance gains compared to static copula models and other time-varying copula methods.

preprint2011arXiv

Convergent Expectation Propagation in Linear Models with Spike-and-slab Priors

Exact inference in the linear regression model with spike and slab priors is often intractable. Expectation propagation (EP) can be used for approximate inference. However, the regular sequential form of EP (R-EP) may fail to converge in this model when the size of the training set is very small. As an alternative, we propose a provably convergent EP algorithm (PC-EP). PC-EP is proved to minimize an energy function which, under some constraints, is bounded from below and whose stationary points coincide with the solution of R-EP. Experiments with synthetic data indicate that when R-EP does not converge, the approximation generated by PC-EP is often better. By contrast, when R-EP converges, both methods perform similarly.

Daniel Hernández-Lobato

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Conditional Diffusion Sampling

Correcting Model Bias with Sparse Implicit Processes

Efficient Transformed Gaussian Processes for Non-Stationary Dependent Multi-class Classification

Function-space Inference with Sparse Implicit Processes

Gaussian Processes for Missing Value Imputation

Inference over radiative transfer models using variational and expectation maximization methods

Multi-class Gaussian Process Classification with Noisy Inputs

Dealing with Integer-valued Variables in Bayesian Optimization with Gaussian Processes

Black-box $α$-divergence Minimization

Deep Gaussian Processes for Regression using Approximate Expectation Propagation

Non-linear Causal Inference using Gaussianity Measures

Predictive Entropy Search for Multi-objective Bayesian Optimization

Scalable Gaussian Process Classification via Expectation Propagation

Stochastic Expectation Propagation for Large Scale Gaussian Process Classification

Training Deep Gaussian Processes using Stochastic Expectation Propagation and Probabilistic Backpropagation

Mind the Nuisance: Gaussian Process Classification using Privileged Noise

Gaussian Process Conditional Copulas with Applications to Financial Time Series

Convergent Expectation Propagation in Linear Models with Spike-and-slab Priors