Source author record

Graham Coop

Graham Coop appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Populations and Evolution Applications math.PR math.ST Quantitative Methods Statistics Theory

Catalog footprint

What is connected

12works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Genetic similarity versus genetic ancestry groups as sample descriptors in human genetics

A common sample descriptor in human genomics studies is that of 'genetic ancestry group', with terms such as 'European genetic ancestry' or 'East Asian genetic ancestry' frequently used in publications to describe the genetics of groups of individuals based on the analysis of their genotypes. In this Perspective, I argue that these terms are imprecise and potentially misleading and that, for most applications, simple statements of genetic similarity represent a more accurate description.

preprint2016arXiv

A genomic map of the effects of linked selection in Drosophila

Natural selection at one site shapes patterns of genetic variation at linked sites. Quantifying the effects of 'linked selection' on levels of genetic diversity is key to making reliable inference about demography, building a null model in scans for targets of adaptation, and learning about the dynamics of natural selection. Here, we introduce the first method that jointly infers parameters of distinct modes of linked selection, notably background selection and selective sweeps, from genome-wide diversity data, functional annotations and genetic maps. The central idea is to calculate the probability that a neutral site is polymorphic given local annotations, substitution patterns, and recombination rates. Information is then combined across sites and samples using composite likelihood in order to estimate genome-wide parameters of distinct modes of selection. In addition to parameter estimation, this approach yields a map of the expected neutral diversity levels along the genome. To illustrate the utility of our approach, we apply it to genome-wide resequencing data from 125 lines in Drosophila melanogaster and reliably predict diversity levels at the 1Mb scale. Our results corroborate estimates of a high fraction of beneficial substitutions in proteins and untranslated regions (UTR). They allow us to distinguish between the contribution of sweeps and other modes of selection around amino acid substitutions and to uncover evidence for pervasive sweeps in untranslated regions (UTRs). Our inference further suggests a substantial effect of linked selection from non-classic sweeps. More generally, we demonstrate that linked selection has had a larger effect in reducing diversity levels and increasing their variance in D. melanogaster than previously appreciated.

preprint2014arXiv

The Population Genetic Signature of Polygenic Local Adaptation

Adaptation in response to selection on polygenic phenotypes may occur via subtle allele frequencies shifts at many loci. Current population genomic techniques are not well posed to identify such signals. In the past decade, detailed knowledge about the specific loci underlying polygenic traits has begun to emerge from genome-wide association studies (GWAS). Here we combine this knowledge from GWAS with robust population genetic modeling to identify traits that may have been influenced by local adaptation. We exploit the fact that GWAS provide an estimate of the additive effect size of many loci to estimate the mean additive genetic value for a given phenotype across many populations as simple weighted sums of allele frequencies. We first describe a general model of neutral genetic value drift for an arbitrary number of populations with an arbitrary relatedness structure. Based on this model we develop methods for detecting unusually strong correlations between genetic values and specific environmental variables, as well as a generalization of $Q_{ST}/F_{ST}$ comparisons to test for over-dispersion of genetic values among populations. Finally we lay out a framework to identify the individual populations or groups of populations that contribute to the signal of overdispersion. These tests have considerably greater power than their single locus equivalents due to the fact that they look for positive covariance between like effect alleles, and also significantly outperform methods that do not account for population structure. We apply our tests to the Human Genome Diversity Panel (HGDP) dataset using GWAS data for height, skin pigmentation, type 2 diabetes, body mass index, and two inflammatory bowel disease datasets. This analysis uncovers a number of putative signals of local adaptation, and we discuss the biological interpretation and caveats of these results.

preprint2013arXiv

Disentangling the effects of geographic and ecological isolation on genetic differentiation

Populations can be genetically isolated both by geographic distance and by differences in their ecology or environment that decrease the rate of successful migration. Empirical studies often seek to investigate the relationship between genetic differentiation and some ecological variable(s) while accounting for geographic distance, but common approaches to this problem (such as the partial Mantel test) have a number of drawbacks. In this article, we present a Bayesian method that enables users to quantify the relative contributions of geographic distance and ecological distance to genetic differentiation between sampled populations or individuals. We model the allele frequencies in a set of populations at a set of unlinked loci as spatially correlated Gaussian processes, in which the covariance structure is a decreasing function of both geographic and ecological distance. Parameters of the model are estimated using a Markov chain Monte Carlo algorithm. We call this method Bayesian Estimation of Differentiation in Alleles by Spatial Structure and Local Ecology (BEDASSLE), and have implemented it in a user-friendly format in the statistical platform R. We demonstrate its utility with a simulation study and empirical applications to human and teosinte datasets.

preprint2013arXiv

Genomic identification of founding haplotypes reveals the history of the selfing species Capsella rubella

The shift from outcrossing to self-fertilization is among the most common transitions in plants. Until recently, however, a genome-wide view of this transition has been obscured by a dearth of appropriate data and the lack of appropriate population genomic methods to interpret such data. Here, we present novel analyses detailing the origin of the selfing species, Capsella rubella, which recently split from its outcrossing sister, Capsella grandiflora. Due to the recency of the split, most variation within C. rubella is found within C. grandiflora. We can therefore identify genomic regions where two C. rubella individuals have inherited the same or different segments of ancestral diversity (i.e. founding haplotypes) present in C. rubella's founder(s). Based on this analysis, we show that C. rubella was founded by multiple individuals drawn from a diverse ancestral population closely related to extant C. grandiflora, that drift and selection have rapidly homogenized most of this ancestral variation since C. rubella's founding, and that little novel variation has accumulated within this time. Despite the extensive loss of ancestral variation, the approximately 25% of the genome for which two C. rubella individuals have inherited different founding haplotypes makes up roughly 90% of the genetic variation between them. To extend these findings, we develop a coalescent model that utilizes the inferred frequency of founding haplotypes and variation within founding haplotypes to estimate that C. rubella was founded by a potentially large number of individuals 50-100 kya, and has subsequently experienced a 20X reduction in its effective population size. As population genomic data from an increasing number of outcrossing/selfing pairs are generated, analyses like this here will facilitate a fine-scaled view of the evolutionary and demographic impact of the transition to self-fertilization.

preprint2013arXiv

Patterns of neutral diversity under general models of selective sweeps

Two major sources of stochasticity in the dynamics of neutral alleles result from resampling of finite populations (genetic drift) and the random genetic background of nearby selected alleles on which the neutral alleles are found (linked selection). There is now good evidence that linked selection plays an important role in shaping polymorphism levels in a number of species. One of the best investigated models of linked selection is the recurrent full sweep model, in which newly arisen selected alleles fix rapidly. However, the bulk of selected alleles that sweep into the population may not be destined for rapid fixation. Here we develop a general model of recurrent selective sweeps in a coalescent framework, one that generalizes the recurrent full sweep model to the case where selected alleles do not sweep to fixation. We show that in a large population, only the initial rapid increase of a selected allele affects the genealogy at partially linked sites, which under fairly general assumptions are unaffected by the subsequent fate of the selected allele. We also apply the theory to a simple model to investigate the impact of recurrent partial sweeps on levels of neutral diversity, and find that for a given reduction in diversity, the impact of recurrent partial sweeps on the frequency spectrum at neutral sites is determined primarily by the frequencies achieved by the selected alleles. Consequently, recurrent sweeps of selected alleles to low frequencies can have a profound effect on levels of diversity but can leave the frequency spectrum relatively unperturbed. In fact, the limiting coalescent model under a high rate of sweeps to low frequency is identical to the standard neutral model. The general model of selective sweeps we describe goes some way towards providing a more flexible framework to describe genomic patterns of diversity than is currently available.

preprint2013arXiv

Speciation and introgression between Mimulus nasutus and Mimulus guttatus

Mimulus guttatus and M. nasutus are an evolutionary and ecological model sister species pair differentiated by ecology, mating system, and partial reproductive isolation. Despite extensive research on this system, the history of divergence and differentiation in this sister pair is unclear. We present and analyze a novel population genomic data set which shows that M. nasutus "budded" off of a central Californian M. guttatus population within the last 200 to 500 thousand years. In this time, the M. nasutus genome has accrued numerous genomic signatures of the transition to predominant selfing. Despite clear biological differentiation, we document ongoing, bidirectional introgression. We observe a negative relationship between the recombination rate and divergence between M. nasutus and sympatric M. guttatus samples, suggesting that selection acts against M. nasutus ancestry in M. guttatus.

preprint2013arXiv

The geography of recent genetic ancestry across Europe

The recent genealogical history of human populations is a complex mosaic formed by individual migration, large-scale population movements, and other demographic events. Population genomics datasets can provide a window into this recent history, as rare traces of recent shared genetic ancestry are detectable due to long segments of shared genomic material. We make use of genomic data for 2,257 Europeans (the POPRES dataset) to conduct one of the first surveys of recent genealogical ancestry over the past three thousand years at a continental scale. We detected 1.9 million shared genomic segments, and used the lengths of these to infer the distribution of shared ancestors across time and geography. We find that a pair of modern Europeans living in neighboring populations share around 10-50 genetic common ancestors from the last 1500 years, and upwards of 500 genetic ancestors from the previous 1000 years. These numbers drop off exponentially with geographic distance, but since genetic ancestry is rare, individuals from opposite ends of Europe are still expected to share millions of common genealogical ancestors over the last 1000 years. There is substantial regional variation in the number of shared genetic ancestors: especially high numbers of common ancestors between many eastern populations likely date to the Slavic and/or Hunnic expansions, while much lower levels of common ancestry in the Italian and Iberian peninsulas may indicate weaker demographic effects of Germanic expansions into these areas and/or more stably structured populations. Recent shared ancestry in modern Europeans is ubiquitous, and clearly shows the impact of both small-scale migration and large historical events. Population genomic datasets have considerable power to uncover recent demographic history, and will allow a much fuller picture of the close genealogical kinship of individuals across the world.

preprint2012arXiv

Robust identification of local adaptation from allele frequencies

Comparing allele frequencies among populations that differ in environment has long been a tool for detecting loci involved in local adaptation. However, such analyses are complicated by an imperfect knowledge of population allele frequencies and neutral correlations of allele frequencies among populations due to shared population history and gene flow. Here we develop a set of methods to robustly test for unusual allele frequency patterns, and correlations between environmental variables and allele frequencies while accounting for these complications based on a Bayesian model previously implemented in the software Bayenv. Using this model, we calculate a set of `standardized allele frequencies' that allows investigators to apply tests of their choice to multiple populations, while accounting for sampling and covariance due to population history. We illustrate this first by showing that these standardized frequencies can be used to calculate powerful tests to detect non-parametric correlations with environmental variables, which are also less prone to spurious results due to outlier populations. We then demonstrate how these standardized allele frequencies can be used to construct a test to detect SNPs that deviate strongly from neutral population structure. This test is conceptually related to FST but should be more powerful as we account for population history. We also extend the model to next-generation sequencing of population pools, which is a cost-efficient way to estimate population allele frequencies, but it implies an additional level of sampling noise. The utility of these methods is demonstrated in simulations and by re-analyzing human SNP data from the HGDP populations. An implementation of our method will be available from http://gcbias.org.

preprint2011arXiv

Is your phylogeny informative? Measuring the power of comparative methods

Phylogenetic comparative methods may fail to produce meaningful results when either the underlying model is inappropriate or the data contain insufficient information to inform the inference. The ability to measure the statistical power of these methods has become crucial to ensure that data quantity keeps pace with growing model complexity. Through simulations, we show that commonly applied model choice methods based on information criteria can have remarkably high error rates; this can be a problem because methods to estimate the uncertainty or power are not widely known or applied. Furthermore, the power of comparative methods can depend significantly on the structure of the data. We describe a Monte Carlo based method which addresses both of these challenges, and show how this approach both quantifies and substantially reduces errors relative to information criteria. The method also produces meaningful confidence intervals for model parameters. We illustrate how the power to distinguish different models, such as varying levels of selection, varies both with number of taxa and structure of the phylogeny. We provide an open-source implementation in the pmc ("Phylogenetic Monte Carlo") package for the R programming language. We hope such power analysis becomes a routine part of model comparison in comparative methods.

preprint2011arXiv

Scrambling eggs: Meiotic drive and the evolution of female recombination rates

Theories to explain the prevalence of sex and recombination have long been a central theme of evolutionary biology. Yet despite decades of attention dedicated to the evolution of sex and recombination, the widespread pattern of sex-differences in the recombination rate is not well understood and has received relatively little theoretical attention. Here, we argue that female meiotic drivers - alleles that increase in frequency by exploiting the asymmetric cell division of oogenesis - present a potent selective pressure favoring the modification of the female recombination rate. Because recombination plays a central role in shaping patterns of variation within and among dyads, modifiers of the female recombination rate can function as potent suppressors or enhancers of female meiotic drive. We show that when female recombination modifiers are unlinked to female drivers, recombination modifiers that suppress harmful female drive can spread. By contrast, a recombination modifier tightly linked to a driver can increase in frequency by enhancing female drive. Our results predict that rapidly evolving female recombination rates, particularly around centromeres, should be a common outcome of meiotic drive. We discuss how selection to modify the efficacy of meiotic drive may contribute to commonly observed patterns of sex-differences in recombination.

preprint2010arXiv

Parallel adaptation: One or many waves of advance of an advantageous allele?

Our models for detecting the effect of adaptation on population genomic diversity are often predicated on a single newly arisen mutation sweeping rapidly to fixation. However, a population can also adapt to a new situation by multiple mutations of similar phenotypic effect that arise in parallel. These mutations can each quickly reach intermediate frequency, preventing any single one from rapidly sweeping to fixation globally (a "soft" sweep). Here we study models of parallel mutation in a geographically spread population adapting to a global selection pressure. The slow geographic spread of a selected allele can allow other selected alleles to arise and spread elsewhere in the species range. When these different selected alleles meet, their spread can slow dramatically, and so form a geographic patchwork which could be mistaken for a signal of local adaptation. This random spatial tessellation will dissipate over time due to mixing by migration, leaving a set of partial sweeps within the global population. We show that the spatial tessellation initially formed by mutational types is closely connected to Poisson process models of crystallization, which we extend. We find that the probability of parallel mutation and the spatial scale on which parallel mutation occurs is captured by a single characteristic length that reflects the expected distance a spreading allele travels before it encounters a different spreading allele. This characteristic length depends on the mutation rate, the dispersal parameter, the effective local density of individuals, and to a much lesser extent the strength of selection. We argue that even in widely dispersing species, such parallel geographic sweeps may be surprisingly common. Thus, we predict, as more data becomes available, many more examples of intra-species parallel adaptation will be uncovered.

Graham Coop

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Genetic similarity versus genetic ancestry groups as sample descriptors in human genetics

A genomic map of the effects of linked selection in Drosophila

The Population Genetic Signature of Polygenic Local Adaptation

Disentangling the effects of geographic and ecological isolation on genetic differentiation

Genomic identification of founding haplotypes reveals the history of the selfing species Capsella rubella

Patterns of neutral diversity under general models of selective sweeps

Speciation and introgression between Mimulus nasutus and Mimulus guttatus

The geography of recent genetic ancestry across Europe

Robust identification of local adaptation from allele frequencies

Is your phylogeny informative? Measuring the power of comparative methods

Scrambling eggs: Meiotic drive and the evolution of female recombination rates

Parallel adaptation: One or many waves of advance of an advantageous allele?