Source author record

Rosemary Braun

Rosemary Braun appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation Genomics Quantitative Methods Applications Molecular Networks Machine Learning

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2014arXiv

Network Methods for Pathway Analysis of Genomic Data

Rapid advances in high-throughput technologies have led to considerable interest in analyzing genome-scale data in the context of biological pathways, with the goal of identifying functional systems that are involved in a given phenotype. In the most common approaches, biological pathways are modeled as simple sets of genes, neglecting the network of interactions comprising the pathway and treating all genes as equally important to the pathway's function. Recently, a number of new methods have been proposed to integrate pathway topology in the analyses, harnessing existing knowledge and enabling more nuanced models of complex biological systems. However, there is little guidance available to researches choosing between these methods. In this review, we discuss eight topology-based methods, comparing their methodological approaches and appropriate use cases. In addition, we present the results of the application of these methods to a curated set of ten gene expression profiling studies using a common set of pathway annotations. We report the computational efficiency of the methods and the consistency of the results across methods and studies to help guide users in choosing a method. We also discuss the challenges and future outlook for improved network analysis methodologies.

preprint2011arXiv

Partition Decoupling for Multi-gene Analysis of Gene Expression Profiling Data

We present the extention and application of a new unsupervised statistical learning technique--the Partition Decoupling Method--to gene expression data. Because it has the ability to reveal non-linear and non-convex geometries present in the data, the PDM is an improvement over typical gene expression analysis algorithms, permitting a multi-gene analysis that can reveal phenotypic differences even when the individual genes do not exhibit differential expression. Here, we apply the PDM to publicly-available gene expression data sets, and demonstrate that we are able to identify cell types and treatments with higher accuracy than is obtained through other approaches. By applying it in a pathway-by-pathway fashion, we demonstrate how the PDM may be used to find sets of mechanistically-related genes that discriminate phenotypes.

preprint2011arXiv

Pathways of Distinction Analysis: a new technique for multi-SNP analysis of GWAS data

Genome-wide association studies have become increasingly common due to advances in technology and have permitted the identification of differences in single nucleotide polymorphism (SNP) alleles that are associated with diseases. However, while typical GWAS analysis techniques treat markers individually, complex diseases are unlikely to have a single causative gene. There is thus a pressing need for multi-SNP analysis methods that can reveal system-level differences in cases and controls. Here, we present a novel multi-SNP GWAS analysis method called Pathways of Distinction Analysis (PoDA). The method uses GWAS data and known pathway-gene and gene-SNP associations to identify pathways that permit, ideally, the distinction of cases from controls. The technique is based upon the hypothesis that if a pathway is related to disease risk, cases will appear more similar to other cases than to controls for the SNPs associated with that pathway. By systematically applying the method to all pathways of potential interest, we can identify those for which the hypothesis holds true, i.e., pathways containing SNPs for which the samples exhibit greater within-class similarity than across classes. Importantly, PoDA improves on existing single-SNP and SNP-set enrichment analyses in that it does not require the SNPs in a pathway to exhibit independent main effects. This permits PoDA to reveal pathways in which epistatic interactions drives risk. In this paper, we detail the PoDA method and apply it to two GWA studies: one of breast cancer, and the other of liver cancer. The results obtained strongly suggest that there exist pathway-wide genomic differences that contribute to disease susceptibility. PoDA thus provides an analytical tool that is complementary to existing techniques and has the power to enrich our understanding of disease genomics at the systems-level.

preprint2009arXiv

Needles in the Haystack: Identifying Individuals Present in Pooled Genomic Data

Recent publications have described and applied a novel metric that quantifies the genetic distance of an individual with respect to two population samples, and have suggested that the metric makes it possible to infer the presence of an individual of known genotype in a sample for which only the marginal allele frequencies are known. However, the assumptions, limitations, and utility of this metric remained incompletely characterized. Here we present an exploration of the strengths and limitations of that method. In addition to analytical investigations of the underlying assumptions, we use both real and simulated genotypes to test empirically the method's accuracy. The results reveal that, when used as a means by which to identify individuals as members of a population sample, the specificity is low in several circumstances. We find that the misclassifications stem from violations of assumptions that are crucial to the technique yet hard to control in practice, and we explore the feasibility of several methods to improve the sensitivity. Additionally, we find that the specificity may still be lower than expected even in ideal circumstances. However, despite the metric's inadequacies for identifying the presence of an individual in a sample, our results suggest potential avenues for future research on tuning this method to problems of ancestry inference or disease prediction. By revealing both the strengths and limitations of the proposed method, we hope to elucidate situations in which this distance metric may be used in an appropriate manner. We also discuss the implications of our findings in forensics applications and in the protection of GWAS participant privacy.