Source author record

Amnon Amir

Amnon Amir appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Genomics Computation Quantitative Methods Computational Engineering, Finance, and Science cs.CY Human-Computer Interaction Information Theory math.IT

Catalog footprint

What is connected

4works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Testing for differential abundance in compositional counts data, with application to microbiome studies

Identifying which taxa in our microbiota are associated with traits of interest is important for advancing science and health. However, the identification is challenging because the measured vector of taxa counts (by amplicon sequencing) is compositional, so a change in the abundance of one taxon in the microbiota induces a change in the number of sequenced counts across all taxa. The data is typically sparse, with zero counts present either due to biological variance or limited sequencing depth (technical zeros). For low abundance taxa, the chance for technical zeros is non-negligible. We show that existing methods designed to identify differential abundance for compositional data may have an inflated number of false positives due to improper handling of the zero counts. We introduce a novel non-parametric approach which provides valid inference even when the fraction of zero counts is substantial. Our approach uses a set of reference taxa that are non-differentially abundant, which can be estimated from the data or from outside information. We show the usefulness of our approach via simulations, as well as on three different data sets: a Crohn's disease study, the Human Microbiome Project, and an experiment with 'spiked-in' bacteria.

preprint2016arXiv

Integrating citizen science with online learning to ask better questions

Online learners spend millions of hours per year testing their new skills on assignments with known answers. This paper explores whether framing research questions as assignments with unknown answers helps learners generate novel, useful, and difficult-to-find knowledge while increasing their motivation by contributing to a larger goal. Collaborating with the American Gut Project, the world's largest crowdfunded citizen science project, we deploy Gut Instinct to allow novices to generate hypotheses about the constitution of the human gut microbiome. The tool enables online learners to explore learning material about the microbiome and create their own theories around causal variances for microbiome. Building on crowdsourcing or serious games that use people as replaceable units, this work-in-progress lays our plans for how people (a) use their personal knowledge (b) towards solving a larger real-world goal (c) that can provide potential benefits to them. We hope to demonstrate that Gut Instinct citizen scientists generate useful hypotheses, perform better on learning tasks than traditional MOOC learners, and are better engaged with the learning material.

preprint2013arXiv

Accurate Profiling of Microbial Communities from Massively Parallel Sequencing using Convex Optimization

We describe the Microbial Community Reconstruction ({\bf MCR}) Problem, which is fundamental for microbiome analysis. In this problem, the goal is to reconstruct the identity and frequency of species comprising a microbial community, using short sequence reads from Massively Parallel Sequencing (MPS) data obtained for specified genomic regions. We formulate the problem mathematically as a convex optimization problem and provide sufficient conditions for identifiability, namely the ability to reconstruct species identity and frequency correctly when the data size (number of reads) grows to infinity. We discuss different metrics for assessing the quality of the reconstructed solution, including a novel phylogenetically-aware metric based on the Mahalanobis distance, and give upper-bounds on the reconstruction error for a finite number of reads under different metrics. We propose a scalable divide-and-conquer algorithm for the problem using convex optimization, which enables us to handle large problems (with $\sim10^6$ species). We show using numerical simulations that for realistic scenarios, where the microbial communities are sparse, our algorithm gives solutions with high accuracy, both in terms of obtaining accurate frequency, and in terms of species phylogenetic resolution.

preprint2010arXiv

Bacterial Community Reconstruction Using A Single Sequencing Reaction

Bacteria are the unseen majority on our planet, with millions of species and comprising most of the living protoplasm. While current methods enable in-depth study of a small number of communities, a simple tool for breadth studies of bacterial population composition in a large number of samples is lacking. We propose a novel approach for reconstruction of the composition of an unknown mixture of bacteria using a single Sanger-sequencing reaction of the mixture. This method is based on compressive sensing theory, which deals with reconstruction of a sparse signal using a small number of measurements. Utilizing the fact that in many cases each bacterial community is comprised of a small subset of the known bacterial species, we show the feasibility of this approach for determining the composition of a bacterial mixture. Using simulations, we show that sequencing a few hundred base-pairs of the 16S rRNA gene sequence may provide enough information for reconstruction of mixtures containing tens of species, out of tens of thousands, even in the presence of realistic measurement noise. Finally, we show initial promising results when applying our method for the reconstruction of a toy experimental mixture with five species. Our approach may have a potential for a practical and efficient way for identifying bacterial species compositions in biological samples.