Source author record

Peter Donnelly

Peter Donnelly appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications Genomics Machine Learning Populations and Evolution

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2014arXiv

A Greedy Homotopy Method for Regression with Nonconvex Constraints

Constrained least squares regression is an essential tool for high-dimensional data analysis. Given a partition $\mathcal{G}$ of input variables, this paper considers a particular class of nonconvex constraint functions that encourage the linear model to select a small number of variables from a small number of groups in $\mathcal{G}$. Such constraints are relevant in many practical applications, such as Genome-Wide Association Studies (GWAS). Motivated by the efficiency of the Lasso homotopy method, we present RepLasso, a greedy homotopy algorithm that tries to solve the induced sequence of nonconvex problems by solving a sequence of suitably adapted convex surrogate problems. We prove that in some situations RepLasso recovers the global minima of the nonconvex problem. Moreover, even if it does not recover global minima, we prove that in relevant cases it will still do no worse than the Lasso in terms of support and signed support recovery, while in practice outperforming it. We show empirically that the strategy can also be used to improve over other Lasso-style algorithms. Finally, a GWAS of ankylosing spondylitis highlights our method's practical utility.

preprint2013arXiv

Efficient computation with a linear mixed model on large-scale data sets with applications to genetic studies

Motivated by genome-wide association studies, we consider a standard linear model with one additional random effect in situations where many predictors have been collected on the same subjects and each predictor is analyzed separately. Three novel contributions are (1) a transformation between the linear and log-odds scales which is accurate for the important genetic case of small effect sizes; (2) a likelihood-maximization algorithm that is an order of magnitude faster than the previously published approaches; and (3) efficient methods for computing marginal likelihoods which allow Bayesian model comparison. The methodology has been successfully applied to a large-scale association study of multiple sclerosis including over 20,000 individuals and 500,000 genetic variants.

preprint2010arXiv

A Bayesian Method for Detecting and Characterizing Allelic Heterogeneity and Boosting Signals in Genome-Wide Association Studies

The standard paradigm for the analysis of genome-wide association studies involves carrying out association tests at both typed and imputed SNPs. These methods will not be optimal for detecting the signal of association at SNPs that are not currently known or in regions where allelic heterogeneity occurs. We propose a novel association test, complementary to the SNP-based approaches, that attempts to extract further signals of association by explicitly modeling and estimating both unknown SNPs and allelic heterogeneity at a locus. At each site we estimate the genealogy of the case-control sample by taking advantage of the HapMap haplotypes across the genome. Allelic heterogeneity is modeled by allowing more than one mutation on the branches of the genealogy. Our use of Bayesian methods allows us to assess directly the evidence for a causative SNP not well correlated with known SNPs and for allelic heterogeneity at each locus. Using simulated data and real data from the WTCCC project, we show that our method (i) produces a significant boost in signal and accurately identifies the form of the allelic heterogeneity in regions where it is known to exist, (ii) can suggest new signals that are not found by testing typed or imputed SNPs and (iii) can provide more accurate estimates of effect sizes in regions of association.

preprint2010arXiv

The coalescent and its descendants

The coalescent revolutionised theoretical population genetics, simplifying, or making possible for the first time, many analyses, proofs, and derivations, and offering crucial insights about the way in which the structure of data in samples from populations depends on the demographic history of the population. However statistical inference under the coalescent model is extremely challenging, effectively because no explicit expressions are available for key sampling probabilities. This led initially to approximation of these probabilities by ingenious application of modern computationally-intensive statistical methods. A key breakthrough occurred when Li and Stephens introduced a different model, similar in spirit to the coalescent, for which efficient calculations are feasible. In turn, the Li and Stephens model has changed statistical inference for the wealth of data now available which documents molecular genetic variation within populations. We briefly review the coalescent and associated measure-valued diffusions, describe the Li and Stephens model, and introduce and apply a generalisation of it for inference of population structure in the presence of linkage disequilibrium.

Peter Donnelly

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

A Greedy Homotopy Method for Regression with Nonconvex Constraints

Efficient computation with a linear mixed model on large-scale data sets with applications to genetic studies

A Bayesian Method for Detecting and Characterizing Allelic Heterogeneity and Boosting Signals in Genome-Wide Association Studies

The coalescent and its descendants