Researcher profile

Katherine S. Pollard

Katherine S. Pollard contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - Baseline
3works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2014arXiv

motifDiverge: a model for assessing the statistical significance of gene regulatory motif divergence between two DNA sequences

Next-generation sequencing technology enables the identification of thousands of gene regulatory sequences in many cell types and organisms. We consider the problem of testing if two such sequences differ in their number of binding site motifs for a given transcription factor (TF) protein. Binding site motifs impart regulatory function by providing TFs the opportunity to bind to genomic elements and thereby affect the expression of nearby genes. Evolutionary changes to such functional DNA are hypothesized to be major contributors to phenotypic diversity within and between species; but despite the importance of TF motifs for gene expression, no method exists to test for motif loss or gain. Assuming that motif counts are Binomially distributed, and allowing for dependencies between motif instances in evolutionarily related sequences, we derive the probability mass function of the difference in motif counts between two nucleotide sequences. We provide a method to numerically estimate this distribution from genomic data and show through simulations that our estimator is accurate. Finally, we introduce the R package {\tt motifDiverge} that implements our methodology and illustrate its application to gene regulatory enhancers identified by a mouse developmental time course experiment. While this study was motivated by analysis of regulatory motifs, our results can be applied to any problem involving two correlated Bernoulli trials.

preprint2013arXiv

A Model-Based Analysis of GC-Biased Gene Conversion in the Human and Chimpanzee Genomes

GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts ~1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. We also find some evidence that they contribute to the fixation of deleterious alleles, including an enrichment for disease-associated polymorphisms. These tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages; they supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available.

preprint2013arXiv

Integrating diverse datasets improves developmental enhancer prediction

Gene-regulatory enhancers have been identified by many lines of evidence, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a novel method for predicting developmental enhancers and their tissue specificity. EnhancerFinder uses a two-step multiple-kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and thousands of diverse functional genomics datasets from a variety of cell types and developmental stages. We trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser, in contrast to histone mark or sequence-based enhancer definitions commonly used. We comprehensively evaluated EnhancerFinder, and found that our integrative approach improves enhancer prediction accuracy over previous approaches that consider a single type of data. Our evaluation highlights the importance of considering information from many tissues when predicting specific types of enhancers. We find that VISTA enhancers active in embryonic heart are easier to predict than enhancers active in several other tissues due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and hits from genome-wide association studies. We demonstrate the utility of our enhancer predictions by identifying and validating a novel cranial nerve enhancer in the ZEB2 locus. Our genome-wide developmental enhancer predictions will be freely available as a UCSC Genome Browser track.