Source author record

Bianca Dumitrascu

Bianca Dumitrascu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Genomics Machine Learning Applications Methodology Quantitative Methods

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Approximate Latent Force Model Inference

Physically-inspired latent force models offer an interpretable alternative to purely data driven tools for inference in dynamical systems. They carry the structure of differential equations and the flexibility of Gaussian processes, yielding interpretable parameters and dynamics-imposed latent functions. However, the existing inference techniques associated with these models rely on the exact computation of posterior kernel terms which are seldom available in analytical form. Most applications relevant to practitioners, such as Hill equations or diffusion equations, are hence intractable. In this paper, we overcome these computational problems by proposing a variational solution to a general class of non-linear and parabolic partial differential equation latent force models. Further, we show that a neural operator approach can scale our model to thousands of instances, enabling fast, distributed computation. We demonstrate the efficacy and flexibility of our framework by achieving competitive performance on several tasks where the kernels are of varying degrees of tractability.

preprint2022arXiv

MarkerMap: nonlinear marker selection for single-cell studies

Single-cell RNA-seq data allow the quantification of cell type differences across a growing set of biological contexts. However, pinpointing a small subset of genomic features explaining this variability can be ill-defined and computationally intractable. Here we introduce MarkerMap, a generative model for selecting minimal gene sets which are maximally informative of cell type origin and enable whole transcriptome reconstruction. MarkerMap provides a scalable framework for both supervised marker selection, aimed at identifying specific cell type populations, and unsupervised marker selection, aimed at gene expression imputation and reconstruction. We benchmark MarkerMap's competitive performance against previously published approaches on real single cell gene expression data sets. MarkerMap is available as a pip installable package, as a community resource aimed at developing explainable machine learning techniques for enhancing interpretability in single-cell studies.

preprint2020arXiv

Nonparametric Bayesian multi-armed bandits for single cell experiment design

The problem of maximizing cell type discovery under budget constraints is a fundamental challenge for the collection and analysis of single-cell RNA-sequencing (scRNA-seq) data. In this paper, we introduce a simple, computationally efficient, and scalable Bayesian nonparametric sequential approach to optimize the budget allocation when designing a large scale experiment for the collection of scRNA-seq data for the purpose of, but not limited to, creating cell atlases. Our approach relies on the following tools: i) a hierarchical Pitman-Yor prior that recapitulates biological assumptions regarding cellular differentiation, and ii) a Thompson sampling multi-armed bandit strategy that balances exploitation and exploration to prioritize experiments across a sequence of trials. Posterior inference is performed by using a sequential Monte Carlo approach, which allows us to fully exploit the sequential nature of our species sampling problem. We empirically show that our approach outperforms state-of-the-art methods and achieves near-Oracle performance on simulated and scRNA-seq data alike. HPY-TS code is available at https://github.com/fedfer/HPYsinglecell.

preprint2015arXiv

A Bayesian test to identify variance effects

Identifying genetic variants that regulate quantitative traits, or QTLs, is the primary focus of the field of statistical genetics. Most current methods are limited to identifying mean effects, or associations between genotype and the mean value of a quantitative trait. It is possible, however, that a genetic variant may affect the variance of the quantitative trait in lieu of, or in addition to, affecting the trait mean. Here, we develop a general methodological approach to identifying covariates with variance effects on a quantitative trait using a Bayesian heteroskedastic linear regression model. We show that our Bayesian test for heteroskedasticity (BTH) outperforms classical tests for differences in variation across a large range of simulations drawn from scenarios common to the analysis of quantitative traits. We apply BTH to methylation QTL study data and expression QTL study data to identify variance QTLs. When compared with three tests for heteroskedasticity used in genomics, we illustrate the benefits of using our approach, including avoiding overfitting by incorporating uncertainty and flexibly identifying heteroskedastic effects.