Source author record

Aaron Sim

Aaron Sim appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Applications Genomics hep-th math.DG Methodology physics.comp-ph physics.soc-ph Social and Information Networks

Catalog footprint

What is connected

5works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Contrastive Mixture of Posteriors for Counterfactual Inference, Data Integration and Fairness

Learning meaningful representations of data that can address challenges such as batch effect correction and counterfactual inference is a central problem in many domains including computational biology. Adopting a Conditional VAE framework, we show that marginal independence between the representation and a condition variable plays a key role in both of these challenges. We propose the Contrastive Mixture of Posteriors (CoMP) method that uses a novel misalignment penalty defined in terms of mixtures of the variational posteriors to enforce this independence in latent space. We show that CoMP has attractive theoretical properties compared to previous approaches, and we prove counterfactual identifiability of CoMP under additional assumptions. We demonstrate state-of-the-art performance on a set of challenging tasks including aligning human tumour samples with cancer cell-lines, predicting transcriptome-level perturbation responses, and batch correction on single-cell RNA sequencing data. We also find parallels to fair representation learning and demonstrate that CoMP is competitive on a common task in the field.

preprint2015arXiv

Great cities look small

Great cities connect people; failed cities isolate people. Despite the fundamental importance of physical, face-to-face social-ties in the functioning of cities, these connectivity networks are not explicitly observed in their entirety. Attempts at estimating them often rely on unrealistic over-simplifications such as the assumption of spatial homogeneity. Here we propose a mathematical model of human interactions in terms of a local strategy of maximising the number of beneficial connections attainable under the constraint of limited individual travelling-time budgets. By incorporating census and openly-available online multi-modal transport data, we are able to characterise the connectivity of geometrically and topologically complex cities. Beyond providing a candidate measure of greatness, this model allows one to quantify and assess the impact of transport developments, population growth, and other infrastructure and demographic changes on a city. Supported by validations of GDP and HIV infection rates across United States metropolitan areas, we illustrate the effect of changes in local and city-wide connectivities by considering the economic impact of two contemporary inter- and intra-city transport developments in the United Kingdom: High Speed Rail 2 and London Crossrail. This derivation of the model suggests that the scaling of different urban indicators with population size has an explicitly mechanistic origin.

preprint2013arXiv

Random Forests on Distance Matrices for Imaging Genetics Studies

We propose a non-parametric regression methodology, Random Forests on Distance Matrices (RFDM), for detecting genetic variants associated to quantitative phenotypes representing the human brain's structure or function, and obtained using neuroimaging techniques. RFDM, which is an extension of decision forests, requires a distance matrix as response that encodes all pair-wise phenotypic distances in the random sample. We discuss ways to learn such distances directly from the data using manifold learning techniques, and how to define such distances when the phenotypes are non-vectorial objects such as brain connectivity networks. We also describe an extension of RFDM to detect espistatic effects while keeping the computational complexity low. Extensive simulation results and an application to an imaging genetics study of Alzheimer's Disease are presented and discussed.

preprint2012arXiv

Information Geometry and Sequential Monte Carlo

This paper explores the application of methods from information geometry to the sequential Monte Carlo (SMC) sampler. In particular the Riemannian manifold Metropolis-adjusted Langevin algorithm (mMALA) is adapted for the transition kernels in SMC. Similar to its function in Markov chain Monte Carlo methods, the mMALA is a fully adaptable kernel which allows for efficient sampling of high-dimensional and highly correlated parameter spaces. We set up the theoretical framework for its use in SMC with a focus on the application to the problem of sequential Bayesian inference for dynamical systems as modelled by sets of ordinary differential equations. In addition, we argue that defining the sequence of distributions on geodesics optimises the effective sample sizes in the SMC run. We illustrate the application of the methodology by inferring the parameters of simulated Lotka-Volterra and Fitzhugh-Nagumo models. In particular we demonstrate that compared to employing a standard adaptive random walk kernel, the SMC sampler with an information geometric kernel design attains a higher level of statistical robustness in the inferred parameters of the dynamical systems.

preprint2009arXiv

E7(7) formulation of N=2 backgrounds

In this paper we reformulate N=2 supergravity backgrounds arising in type II string theory in terms of quantities transforming under the U-duality group E7(7). In particular we combine the Ramond--Ramond scalar degrees of freedom together with the O(6,6) pure spinors which govern the Neveu-Schwarz sector by considering an extended version of generalised geometry. We give E7(7)-invariant expressions for the Kahler and hyperkahler potentials describing the moduli space of vector and hypermultiplets, demonstrating that both correspond to standard E7(7) coset spaces. We also find E7(7) expressions for the Killing prepotentials defining the scalar potential, and discuss the equations governing N=1 vacua in this formalism.