Source author record

Richard M. Karp

Richard M. Karp appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Discrete Mathematics Genomics Information Theory math.IT

Catalog footprint

What is connected

3works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2011arXiv

Comparing Pedigree Graphs

Pedigree graphs, or family trees, are typically constructed by an expensive process of examining genealogical records to determine which pairs of individuals are parent and child. New methods to automate this process take as input genetic data from a set of extant individuals and reconstruct ancestral individuals. There is a great need to evaluate the quality of these methods by comparing the estimated pedigree to the true pedigree. In this paper, we consider two main pedigree comparison problems. The first is the pedigree isomorphism problem, for which we present a linear-time algorithm for leaf-labeled pedigrees. The second is the pedigree edit distance problem, for which we present 1) several algorithms that are fast and exact in various special cases, and 2) a general, randomized heuristic algorithm. In the negative direction, we first prove that the pedigree isomorphism problem is as hard as the general graph isomorphism problem, and that the sub-pedigree isomorphism problem is NP-hard. We then show that the pedigree edit distance problem is APX-hard in general and NP-hard on leaf-labeled pedigrees. We use simulated pedigrees to compare our edit-distance algorithms to each other as well as to a branch-and-bound algorithm that always finds an optimal solution.

preprint2011arXiv

Faster and More Accurate Sequence Alignment with SNAP

We present the Scalable Nucleotide Alignment Program (SNAP), a new short and long read aligner that is both more accurate (i.e., aligns more reads with fewer errors) and 10-100x faster than state-of-the-art tools such as BWA. Unlike recent aligners based on the Burrows-Wheeler transform, SNAP uses a simple hash index of short seed sequences from the genome, similar to BLAST's. However, SNAP greatly reduces the number and cost of local alignment checks performed through several measures: it uses longer seeds to reduce the false positive locations considered, leverages larger memory capacities to speed index lookup, and excludes most candidate locations without fully computing their edit distance to the read. The result is an algorithm that scales well for reads from one hundred to thousands of bases long and provides a rich error model that can match classes of mutations (e.g., longer indels) that today's fast aligners ignore. We calculate that SNAP can align a dataset with 30x coverage of a human genome in less than an hour for a cost of $2 on Amazon EC2, with higher accuracy than BWA. Finally, we describe ongoing work to further improve SNAP.

preprint2008arXiv

Probabilistic Analysis of Linear Programming Decoding

We initiate the probabilistic analysis of linear programming (LP) decoding of low-density parity-check (LDPC) codes. Specifically, we show that for a random LDPC code ensemble, the linear programming decoder of Feldman et al. succeeds in correcting a constant fraction of errors with high probability. The fraction of correctable errors guaranteed by our analysis surpasses previous non-asymptotic results for LDPC codes, and in particular exceeds the best previous finite-length result on LP decoding by a factor greater than ten. This improvement stems in part from our analysis of probabilistic bit-flipping channels, as opposed to adversarial channels. At the core of our analysis is a novel combinatorial characterization of LP decoding success, based on the notion of a generalized matching. An interesting by-product of our analysis is to establish the existence of ``probabilistic expansion'' in random bipartite graphs, in which one requires only that almost every (as opposed to every) set of a certain size expands, for sets much larger than in the classical worst-case setting.