Researcher profile

Pablo Lemos

Pablo Lemos contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

MIRA: A Score for Conditional Distribution Accuracy and Model Comparison

We introduce Mira, a sample-based score for assessing the accuracy of a candidate conditional distribution using only joint samples from the true data-generating process. Relying on the principle that distributions coincide if they assign equal probability mass to all regions, we derive an analytic expression for the Mira statistic, whose average defines the Mira score. This formulation further allows us to compute theoretical reference values and uncertainty estimates when the candidate distribution matches the true one. This framework enables model comparison by quantifying the alignment between the conditional distribution of a candidate model and the true data generating process. Consequently, Mira enables Bayesian model comparison through direct posterior validation, bypassing the challenging evidence computation. We demonstrate its effectiveness across several toy problems and Bayesian inference tasks.

preprint2022arXiv

Cosmology with one galaxy?

Galaxies can be characterized by many internal properties such as stellar mass, gas metallicity, and star-formation rate. We quantify the amount of cosmological and astrophysical information that the internal properties of individual galaxies and their host dark matter halos contain. We train neural networks using hundreds of thousands of galaxies from 2,000 state-of-the-art hydrodynamic simulations with different cosmologies and astrophysical models of the CAMELS project to perform likelihood-free inference on the value of the cosmological and astrophysical parameters. We find that knowing the internal properties of a single galaxy allow our models to infer the value of $Ω_{\rm m}$, at fixed $Ω_{\rm b}$, with a $\sim10\%$ precision, while no constraint can be placed on $σ_8$. Our results hold for any type of galaxy, central or satellite, massive or dwarf, at all considered redshifts, $z\leq3$, and they incorporate uncertainties in astrophysics as modeled in CAMELS. However, our models are not robust to changes in subgrid physics due to the large intrinsic differences the two considered models imprint on galaxy properties. We find that the stellar mass, stellar metallicity, and maximum circular velocity are among the most important galaxy properties to determine the value of $Ω_{\rm m}$. We believe that our results can be explained taking into account that changes in the value of $Ω_{\rm m}$, or potentially $Ω_{\rm b}/Ω_{\rm m}$, affect the dark matter content of galaxies. That effect leaves a distinct signature in galaxy properties to the one induced by galactic processes. Our results suggest that the low-dimensional manifold hosting galaxy properties provides a tight direct link between cosmology and astrophysics.

preprint2022arXiv

Rediscovering orbital mechanics with machine learning

We present an approach for using machine learning to automatically discover the governing equations and hidden properties of real physical systems from observations. We train a "graph neural network" to simulate the dynamics of our solar system's Sun, planets, and large moons from 30 years of trajectory data. We then use symbolic regression to discover an analytical expression for the force law implicitly learned by the neural network, which our results showed is equivalent to Newton's law of gravitation. The key assumptions that were required were translational and rotational equivariance, and Newton's second and third laws of motion. Our approach correctly discovered the form of the symbolic force law. Furthermore, our approach did not require any assumptions about the masses of planets and moons or physical constants. They, too, were accurately inferred through our methods. Though, of course, the classical law of gravitation has been known since Isaac Newton, our result serves as a validation that our method can discover unknown laws and hidden properties from observed data. More broadly this work represents a key step toward realizing the potential of machine learning for accelerating scientific discovery.

preprint2022arXiv

Split personalities in Bayesian Neural Networks: the case for full marginalisation

The true posterior distribution of a Bayesian neural network is massively multimodal. Whilst most of these modes are functionally equivalent, we demonstrate that there remains a level of real multimodality that manifests in even the simplest neural network setups. It is only by fully marginalising over all posterior modes, using appropriate Bayesian sampling tools, that we can capture the split personalities of the network. The ability of a network trained in this manner to reason between multiple candidate solutions dramatically improves the generalisability of the model, a feature we contend is not consistently captured by alternative approaches to the training of Bayesian neural networks. We provide a concise minimal example of this, which can provide lessons and a future path forward for correctly utilising the explainability and interpretability of Bayesian neural networks.

preprint2022arXiv

The Cosmic Graph: Optimal Information Extraction from Large-Scale Structure using Catalogues

We present an implicit likelihood approach to quantifying cosmological information over discrete catalogue data, assembled as graphs. To do so, we explore cosmological parameter constraints using mock dark matter halo catalogues. We employ Information Maximising Neural Networks (IMNNs) to quantify Fisher information extraction as a function of graph representation. We a) demonstrate the high sensitivity of modular graph structure to the underlying cosmology in the noise-free limit, b) show that graph neural network summaries automatically combine mass and clustering information through comparisons to traditional statistics, c) demonstrate that networks can still extract information when catalogues are subject to noisy survey cuts, and d) illustrate how nonlinear IMNN summaries can be used as asymptotically optimal compressed statistics for Bayesian simulation-based inference. We reduce the area of joint $Ω_m, σ_8$ parameter constraints with small ($\sim$100 object) halo catalogues by a factor of 42 over the two-point correlation function, and demonstrate that the networks automatically combine mass and clustering information. This work utilises a new IMNN implementation over graph data in Jax, which can take advantage of either numerical or auto-differentiability. We also show that graph IMNNs successfully compress simulations away from the fiducial model at which the network is fitted, indicating a promising alternative to n-point statistics in catalogue simulation-based analyses.

preprint2022arXiv

Wavelet Moments for Cosmological Parameter Estimation

Extracting non-Gaussian information from the non-linear regime of structure formation is key to fully exploiting the rich data from upcoming cosmological surveys probing the large-scale structure of the universe. However, due to theoretical and computational complexities, this remains one of the main challenges in analyzing observational data. We present a set of summary statistics for cosmological matter fields based on 3D wavelets to tackle this challenge. These statistics are computed as the spatial average of the complex modulus of the 3D wavelet transform raised to a power $q$ and are therefore known as invariant wavelet moments. The 3D wavelets are constructed to be radially band-limited and separable on a spherical polar grid and come in three types: isotropic, oriented, and harmonic. In the Fisher forecast framework, we evaluate the performance of these summary statistics on matter fields from the Quijote suite, where they are shown to reach state-of-the-art parameter constraints on the base $Λ$CDM parameters, as well as the sum of neutrino masses. We show that we can improve constraints by a factor 5 to 10 in all parameters with respect to the power spectrum baseline.

preprint2022arXiv

Weak lensing magnification of Type Ia Supernovae from the Pantheon sample

Using data from the Pantheon SN Ia compilation and the Sloan Digital Sky Survey (SDSS), we propose an estimator for weak lensing convergence incorporating positional and photometric data of foreground galaxies. The correlation between this and the Hubble diagram residuals of the supernovae has $3.6σ$ significance, and is consistent with weak lensing magnification due to dark matter halos centered on galaxies. We additionally constrain the properties of the galactic haloes, such as the mass-to-light ratio $Γ$ and radial profile of the halo matter density $ρ(r)$. We derive a new relationship for the additional r.m.s. scatter in magnitudes caused by lensing, finding $σ_{\rm lens} = (0.06 \pm 0.017) (d_{\rm C}(z)/ d_{\rm C}(z=1))^{3/2}$ where $d_{\rm C}(z)$ is the comoving distance to redshift $z$. Hence the scatter in apparent magnitudes due lensing will be of the same size as the intrinsic scatter of SN Ia by $z \sim 1.2$. We propose a modification of the distance modulus estimator for SN Ia to incorporate lensing, which can be easily calculated from observational data. We anticipate this will improve the accuracy of cosmological parameter estimation for high-redshift SN Ia data.

preprint2020arXiv

Baryon Acoustic Oscillations and the Hubble Constant: Past, Present and Future

We investigate constraints on the Hubble constant ($H_0$) using Baryon Acoustic Oscillations (BAO) and baryon density measurements from Big Bang Nucleosynthesis (BBN). We start by investigating the tension between galaxy BAO measurements and those using the Lyman-$α$ forest, within a Bayesian framework. Using the latest results from eBOSS DR14 we find that the probability of this tension being statistical is $\simeq6.3\%$ assuming flat $Λ$CDM. We measure $H_0 = 67.6\pm1.1$ km s$^{-1}$ Mpc$^{-1}$, with a weak dependence on the BBN prior used, in agreement with results from Planck Cosmic Microwave Background (CMB) results and in strong tension with distance ladder results. Finally, we forecast the future of BAO $+$ BBN measurements of $H_0$, using the Dark Energy Spectroscopic Instrument (DESI). We find that the choice of BBN prior will have a significant impact when considering future BAO measurements from DESI.

preprint2020arXiv

The Impact of Peculiar Velocities on the Estimation of the Hubble Constant from Gravitational Wave Standard Sirens

In this work we investigate the systematic uncertainties that arise from the calculation of the peculiar velocity when estimating the Hubble constant ($H_0$) from gravitational wave standard sirens. We study the GW170817 event and the estimation of the peculiar velocity of its host galaxy, NGC 4993, when using Gaussian smoothing over nearby galaxies. NGC 4993 being a relatively nearby galaxy, at $\sim 40 \ {\rm Mpc}$ away, is subject to a significant effect of peculiar velocities. We demonstrate a direct dependence of the estimated peculiar velocity value on the choice of smoothing scale. We show that when not accounting for this systematic, a bias of $\sim 200 \ {\rm km \ s ^{-1}}$ in the peculiar velocity incurs a bias of $\sim 4 \ {\rm km \ s ^{-1} \ Mpc^{-1}}$ on the Hubble constant. We formulate a Bayesian model that accounts for the dependence of the peculiar velocity on the smoothing scale and by marginalising over this parameter we remove the need for a choice of smoothing scale. The proposed model yields $H_0 = 68.6 ^{+14.0}_{-8.5}~{\rm km\ s^{-1}\ Mpc^{-1}}$. We demonstrate that under this model a more robust unbiased estimate of the Hubble constant from nearby GW sources is obtained.

preprint2020arXiv

The sum of the masses of the Milky Way and M31: a likelihood-free inference approach

We use Density Estimation Likelihood-Free Inference, $Λ$ Cold Dark Matter simulations of $\sim 2M$ galaxy pairs, and data from Gaia and the Hubble Space Telescope to infer the sum of the masses of the Milky Way and Andromeda (M31) galaxies, the two main components of the Local Group. This method overcomes most of the approximations of the traditional timing argument, makes the writing of a theoretical likelihood unnecessary, and allows the non-linear modelling of observational errors that take into account correlations in the data and non-Gaussian distributions. We obtain an $M_{200}$ mass estimate $M_{\rm MW+M31} = 4.6^{+2.3}_{-1.8} \times 10^{12} M_{\odot}$ ($68 \%$ C.L.), in agreement with previous estimates both for the sum of the two masses and for the individual masses. This result is not only one of the most reliable estimates of the sum of the two masses to date, but is also an illustration of likelihood-free inference in a problem with only one parameter and only three data points.

preprint2019arXiv

Quantifying Suspiciousness Within Correlated Data Sets

We propose a principled Bayesian method for quantifying tension between correlated datasets with wide uninformative parameter priors. This is achieved by extending the Suspiciousness statistic, which is insensitive to priors. Our method uses global summary statistics, and as such it can be used as a diagnostic for internal consistency. We show how our approach can be combined with methods that use parameter space and data space to identify the existing internal discrepancies. As an example, we use it to test the internal consistency of the KiDS-450 data in 4 photometric redshift bins, and to recover controlled internal discrepancies in simulated KiDS data. We propose this as a diagnostic of internal consistency for present and future cosmological surveys, and as a tension metric for data sets that have non-negligible correlation, such as LSST and Euclid.

preprint2018arXiv

Co-orbital resonance with a migrating proto-giant planet

In this work we pose the possibility that, at an early stage, the migration of a proto--giant planet caused by the presence of a gaseous circumstellar disk could explain the continuous feeding of small bodies into its orbit. Particularly, we study the probability of capture and permanence in co--orbital resonance of these small bodies, as planets of diverse masses migrate by interaction with the gaseous disk, and the drag induced by this disk dissipates energy from these small objects, making capture more likely. Also, we study the relevance of the circumplanetary disk, a structure formed closely around the planet where the gas density is enhanced, in the process of capture. It is of great interest for us to study the capture of small bodies in 1:1 resonance because it could account for the origin of the Trojan population, which has been proposed \citep{2011Icar..215..669K} as a mechanism of quasi-satellites and irregular satellites capture.