Researcher profile

Kelley Harris

Kelley Harris contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
1topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2014arXiv

Decoding coalescent hidden Markov models in linear time

In many areas of computational biology, hidden Markov models (HMMs) have been used to model local genomic features. In particular, coalescent HMMs have been used to infer ancient population sizes, migration rates, divergence times, and other parameters such as mutation and recombination rates. As more loci, sequences, and hidden states are added to the model, however, the runtime of coalescent HMMs can quickly become prohibitive. Here we present a new algorithm for reducing the runtime of coalescent HMMs from quadratic in the number of hidden time states to linear, without making any additional approximations. Our algorithm can be incorporated into various coalescent HMMs, including the popular method PSMC for inferring variable effective population sizes. Here we implement this algorithm to speed up our demographic inference method diCal, which is equivalent to PSMC when applied to a sample of two haplotypes. We demonstrate that the linear-time method can reconstruct a population size change history more accurately than the quadratic-time method, given similar computation resources. We also apply the method to data from the 1000 Genomes project, inferring a high-resolution history of size changes in the European population.

preprint2014arXiv

Error-prone polymerase activity causes multinucleotide mutations in humans

About 2% of human genetic polymorphisms have been hypothesized to arise via multinucleotide mutations (MNMs), complex events that generate SNPs at multiple sites in a single generation. MNMs have the potential to accelerate the pace at which single genes evolve and to confound studies of demography and selection that assume all SNPs arise independently. In this paper, we examine clustered mutations that are segregating in a set of 1,092 human genomes, demonstrating that MNMs become enriched as large numbers of individuals are sampled. We leverage the size of the dataset to deduce new information about the allelic spectrum of MNMs, estimating the percentage of linked SNP pairs that were generated by simultaneous mutation as a function of the distance between the affected sites and showing that MNMs exhibit a high percentage of transversions relative to transitions. These findings are reproducible in data from multiple sequencing platforms. Among tandem mutations that occur simultaneously at adjacent sites, we find an especially skewed distribution of ancestral and derived dinucleotides, with $\textrm{GC}\to \textrm{AA}$, $\textrm{GA}\to \textrm{TT}$ and their reverse complements making up 36% of the total. These same mutations dominate the spectrum of tandem mutations produced by the upregulation of low-fidelity Polymerase $ζ$ in mutator strains of S. cerevisiae that have impaired DNA excision repair machinery. This suggests that low-fidelity DNA replication by Pol $ζ$ is at least partly responsible for the MNMs that are segregating in the human population, and that useful information about the biochemistry of MNM can be extracted from ordinary population genomic data. We incorporate our findings into a mathematical model of the multinucleotide mutation process that can be used to correct phylogenetic and population genetic methods for the presence of MNMs.