Source author record

Kelley Harris

Kelley Harris appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Populations and Evolution

Catalog footprint

What is connected

2works

1topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2014arXiv

Decoding coalescent hidden Markov models in linear time

In many areas of computational biology, hidden Markov models (HMMs) have been used to model local genomic features. In particular, coalescent HMMs have been used to infer ancient population sizes, migration rates, divergence times, and other parameters such as mutation and recombination rates. As more loci, sequences, and hidden states are added to the model, however, the runtime of coalescent HMMs can quickly become prohibitive. Here we present a new algorithm for reducing the runtime of coalescent HMMs from quadratic in the number of hidden time states to linear, without making any additional approximations. Our algorithm can be incorporated into various coalescent HMMs, including the popular method PSMC for inferring variable effective population sizes. Here we implement this algorithm to speed up our demographic inference method diCal, which is equivalent to PSMC when applied to a sample of two haplotypes. We demonstrate that the linear-time method can reconstruct a population size change history more accurately than the quadratic-time method, given similar computation resources. We also apply the method to data from the 1000 Genomes project, inferring a high-resolution history of size changes in the European population.

preprint2014arXiv

Error-prone polymerase activity causes multinucleotide mutations in humans

About 2% of human genetic polymorphisms have been hypothesized to arise via multinucleotide mutations (MNMs), complex events that generate SNPs at multiple sites in a single generation. MNMs have the potential to accelerate the pace at which single genes evolve and to confound studies of demography and selection that assume all SNPs arise independently. In this paper, we examine clustered mutations that are segregating in a set of 1,092 human genomes, demonstrating that MNMs become enriched as large numbers of individuals are sampled. We leverage the size of the dataset to deduce new information about the allelic spectrum of MNMs, estimating the percentage of linked SNP pairs that were generated by simultaneous mutation as a function of the distance between the affected sites and showing that MNMs exhibit a high percentage of transversions relative to transitions. These findings are reproducible in data from multiple sequencing platforms. Among tandem mutations that occur simultaneously at adjacent sites, we find an especially skewed distribution of ancestral and derived dinucleotides, with $\textrm{GC}\to \textrm{AA}$, $\textrm{GA}\to \textrm{TT}$ and their reverse complements making up 36% of the total. These same mutations dominate the spectrum of tandem mutations produced by the upregulation of low-fidelity Polymerase $ζ$ in mutator strains of S. cerevisiae that have impaired DNA excision repair machinery. This suggests that low-fidelity DNA replication by Pol $ζ$ is at least partly responsible for the MNMs that are segregating in the human population, and that useful information about the biochemistry of MNM can be extracted from ordinary population genomic data. We incorporate our findings into a mathematical model of the multinucleotide mutation process that can be used to correct phylogenetic and population genetic methods for the presence of MNMs.

Kelley Harris

What is connected

Connect this record

See the researcher in context

Building this map preview

2 published item(s)

Decoding coalescent hidden Markov models in linear time

Error-prone polymerase activity causes multinucleotide mutations in humans