Researcher profile

Pablo M. Olmos

Pablo M. Olmos contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Order-Agnostic Autoregressive Modelling with Missing Data

Order-Agnostic autoregressive models have demonstrated strong performance in deep generative modeling, yet their use in settings with incomplete data remains largely unexplored. In this work, we reinterpret them through the lens of missing data. First, we show that their standard training procedure on fully observed data implicitly performs imputation under a missing completely at random mechanism, resulting in robust out-of-sample imputation performance in settings with high missingness. Second, we introduce the first principled framework for training them directly on incomplete datasets under general missingness mechanisms. Third, we leverage their amortized conditional density estimation to perform active information acquisition, i.e., sequentially selecting the most informative missing variables for downstream prediction or inference. Across a suite of real-world benchmarks, our Missingness-Aware Order-Agnostic Autoregressive Model (MO-ARM) consistently outperforms established imputation baselines.

preprint2022arXiv

Multi-task longitudinal forecasting with missing values on Alzheimer's Disease

Machine learning techniques typically applied to dementia forecasting lack in their capabilities to jointly learn several tasks, handle time dependent heterogeneous data and missing values. In this paper, we propose a framework using the recently presented SSHIBA model for jointly learning different tasks on longitudinal data with missing values. The method uses Bayesian variational inference to impute missing values and combine information of several views. This way, we can combine different data-views from different time-points in a common latent space and learn the relations between each time-point while simultaneously modelling and predicting several output variables. We apply this model to predict together diagnosis, ventricle volume, and clinical scores in dementia. The results demonstrate that SSHIBA is capable of learning a good imputation of the missing values and outperforming the baselines while simultaneously predicting three different tasks.

preprint2022arXiv

PyHHMM: A Python Library for Heterogeneous Hidden Markov Models

We introduce PyHHMM, an object-oriented open-source Python implementation of Heterogeneous-Hidden Markov Models (HHMMs). In addition to HMM's basic core functionalities, such as different initialization algorithms and classical observations models, i.e., continuous and multinoulli, PyHHMM distinctively emphasizes features not supported in similar available frameworks: a heterogeneous observation model, missing data inference, different model order selection criterias, and semi-supervised training. These characteristics result in a feature-rich implementation for researchers working with sequential data. PyHHMM relies on the numpy, scipy, scikit-learn, and seaborn Python packages, and is distributed under the Apache-2.0 License. PyHHMM's source code is publicly available on Github (https://github.com/fmorenopino/HeterogeneousHMM) to facilitate adoptions and future contributions. A detailed documentation (https://pyhhmm.readthedocs.io/en/latest), which covers examples of use and models' theoretical explanation, is available. The package can be installed through the Python Package Index (PyPI), via 'pip install pyhhmm'.

preprint2021arXiv

Bayesian Sparse Factor Analysis with Kernelized Observations

Multi-view problems can be faced with latent variable models since they are able to find low-dimensional projections that fairly capture the correlations among the multiple views that characterise each datum. On the other hand, high-dimensionality and non-linear issues are traditionally handled by kernel methods, inducing a (non)-linear function between the latent projection and the data itself. However, they usually come with scalability issues and exposition to overfitting. Here, we propose merging both approaches into single model so that we can exploit the best features of multi-view latent models and kernel methods and, moreover, overcome their limitations. In particular, we combine probabilistic factor analysis with what we refer to as kernelized observations, in which the model focuses on reconstructing not the data itself, but its relationship with other data points measured by a kernel function. This model can combine several types of views (kernelized or not), and it can handle heterogeneous data and work in semi-supervised settings. Additionally, by including adequate priors, it can provide compact solutions for the kernelized observations -- based in a automatic selection of Bayesian Relevance Vectors (RVs) -- and can include feature selection capabilities. Using several public databases, we demonstrate the potential of our approach (and its extensions) w.r.t. common multi-view learning models such as kernel canonical correlation analysis or manifold relevance determination.

preprint2020arXiv

Handling Incomplete Heterogeneous Data using VAEs

Variational autoencoders (VAEs), as well as other generative models, have been shown to be efficient and accurate for capturing the latent structure of vast amounts of complex high-dimensional data. However, existing VAEs can still not directly handle data that are heterogenous (mixed continuous and discrete) or incomplete (with missing data at random), which is indeed common in real-world applications. In this paper, we propose a general framework to design VAEs suitable for fitting incomplete heterogenous data. The proposed HI-VAE includes likelihood models for real-valued, positive real valued, interval, categorical, ordinal and count data, and allows accurate estimation (and potentially imputation) of missing data. Furthermore, HI-VAE presents competitive predictive performance in supervised tasks, outperforming supervised models when trained on incomplete data.

preprint2020arXiv

Improved BiGAN training with marginal likelihood equalization

We propose a novel training procedure for improving the performance of generative adversarial networks (GANs), especially to bidirectional GANs. First, we enforce that the empirical distribution of the inverse inference network matches the prior distribution, which favors the generator network reproducibility on the seen samples. Second, we have found that the marginal log-likelihood of the samples shows a severe overrepresentation of a certain type of samples. To address this issue, we propose to train the bidirectional GAN using a non-uniform sampling for the mini-batch selection, resulting in improved quality and variety in generated samples measured quantitatively and by visual inspection. We illustrate our new procedure with the well-known CIFAR10, Fashion MNIST and CelebA datasets.

preprint2020arXiv

Sparse Semi-supervised Heterogeneous Interbattery Bayesian Analysis

The Bayesian approach to feature extraction, known as factor analysis (FA), has been widely studied in machine learning to obtain a latent representation of the data. An adequate selection of the probabilities and priors of these bayesian models allows the model to better adapt to the data nature (i.e. heterogeneity, sparsity), obtaining a more representative latent space. The objective of this article is to propose a general FA framework capable of modelling any problem. To do so, we start from the Bayesian Inter-Battery Factor Analysis (BIBFA) model, enhancing it with new functionalities to be able to work with heterogeneous data, include feature selection, and handle missing values as well as semi-supervised problems. The performance of the proposed model, Sparse Semi-supervised Heterogeneous Interbattery Bayesian Analysis (SSHIBA) has been tested on 4 different scenarios to evaluate each one of its novelties, showing not only a great versatility and an interpretability gain, but also outperforming most of the state-of-the-art algorithms.