Source author record

Andrea Califano

Andrea Califano appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Genomics Methodology Molecular Networks Quantitative Methods

Catalog footprint

What is connected

2works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

Searching for Gene Sets with Mutually Exclusive Mutations

Cancer cells evolve through random somatic mutations. "Beneficial" mutations which disrupt key pathways (e.g. cell cycle regulation) are subject to natural selection. Multiple mutations may lead to the same "beneficial" effect, in which case there is no selective advantage to having more than one of these mutations. Hence we are interested in finding sets of genes whose mutations are approximately mutually exclusive (anti-co-occurring) within the TCGA Pancancer dataset. In principle, finding the best set is NP Hard. Nevertheless, we will show how a new Mutation anti-co-OCcurrence Algorithm (MOCA) provides an effective greedy search and testing algorithm with guaranteed control of the familywise error rate or false discovery rate, by combining some under-appreciated ideas from frequentist hypothesis testing. These ideas include: (a) A novel exact conditional test for the tendency of multiple sets to have a large/small union/intersection, which generalises Fisher's exact test of 2x2 tables. (b) Randomised hypothesis tests for discrete distributions. (c) Stouffer's method for combining p-values. (d) Weighted multiple hypothesis testing. A new approach to setting a-priori weights which generates additional implicit hypothesis tests is suggested, and allows us to preserve almost all statistical power when testing pairs despite introducing a combinatorially large number of additional hypotheses.

preprint2010arXiv

Multivariate dependence and genetic networks inference

A critical task in systems biology is the identification of genes that interact to control cellular processes by transcriptional activation of a set of target genes. Many methods have been developed to use statistical correlations in high-throughput datasets to infer such interactions. However, cellular pathways are highly cooperative, often requiring the joint effect of many molecules, and few methods have been proposed to explicitly identify such higher-order interactions, partially due to the fact that the notion of multivariate statistical dependency itself remains imprecisely defined. We define the concept of dependence among multiple variables using maximum entropy techniques and introduce computational tests for their identification. Synthetic network results reveal that this procedure uncovers dependencies even in undersampled regimes, when the joint probability distribution cannot be reliably estimated. Analysis of microarray data from human B cells reveals that third-order statistics, but not second-order ones, uncover relationships between genes that interact in a pathway to cooperatively regulate a common set of targets.