Source author record

Sven Rahmann

Sven Rahmann appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Formal Languages and Automata Theory Quantitative Methods Distributed, Parallel, and Cluster Computing Molecular Networks Other Computer Science

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2014arXiv

Massively parallel read mapping on GPUs with PEANUT

We present PEANUT (ParallEl AligNment UTility), a highly parallel GPU-based read mapper with several distinguishing features, including a novel q-gram index (called the q-group index) with small memory footprint built on-the-fly over the reads and the possibility to output both the best hits or all hits of a read. Designing the algorithm particularly for the GPU architecture, we were able to reach maximum core occupancy for several key steps. Our benchmarks show that PEANUT outperforms other state-of- the-art mappers in terms of speed and sensitivity. The software is available at http://peanut.readthedocs.org.

preprint2014arXiv

Using the Expectation Maximization Algorithm with Heterogeneous Mixture Components for the Analysis of Spectrometry Data

Coupling a multi-capillary column (MCC) with an ion mobility (IM) spectrometer (IMS) opened a multitude of new application areas for gas analysis, especially in a medical context, as volatile organic compounds (VOCs) in exhaled breath can hint at a person's state of health. To obtain a potential diagnosis from a raw MCC/IMS measurement, several computational steps are necessary, which so far have required manual interaction, e.g., human evaluation of discovered peaks. We have recently proposed an automated pipeline for this task that does not require human intervention during the analysis. Nevertheless, there is a need for improved methods for each computational step. In comparison to gas chromatography / mass spectrometry (GC/MS) data, MCC/IMS data is easier and less expensive to obtain, but peaks are more diffuse and there is a higher noise level. MCC/IMS measurements can be described as samples of mixture models (i.e., of convex combinations) of two-dimensional probability distributions. So we use the expectation-maximization (EM) algorithm to deconvolute mixtures in order to develop methods that improve data processing in three computational steps: denoising, baseline correction and peak clustering. A common theme of these methods is that mixture components within one model are not homogeneous (e.g., all Gaussian), but of different types. Evaluation shows that the novel methods outperform the existing ones. We provide Python software implementing all three methods and make our evaluation data available at http://www.rahmannlab.de/research/ims.

preprint2011arXiv

Protein Hypernetworks: a Logic Framework for Interaction Dependencies and Perturbation Effects in Protein Networks

Motivation: Protein interactions are fundamental building blocks of biochemical reaction systems underlying cellular functions. The complexity and functionality of such systems emerge not from the protein interactions themselves but from the dependencies between these interactions. Therefore, a comprehensive approach for integrating and using information about such dependencies is required. Results: We present an approach for endowing protein networks with interaction dependencies using propositional logic, thereby obtaining protein hypernetworks. First we demonstrate how this framework straightforwardly improves the prediction of protein complexes. Next we show that modeling protein perturbations in hypernetworks, rather than in networks, allows to better infer the functional necessity of proteins for yeast. Furthermore, hypernetworks improve the prediction of synthetic lethal interactions in yeast, indicating their capability to capture high-order functional relations between proteins. Conclusion: Protein hypernetworks are a consistent formal framework for modeling dependencies between protein interactions within protein networks. First applications of protein hypernetworks on the yeast interactome indicate their value for inferring functional features of complex biochemical systems.

preprint2010arXiv

Exact Analysis of Pattern Matching Algorithms with Probabilistic Arithmetic Automata

We propose a framework for the exact probabilistic analysis of window-based pattern matching algorithms, such as Boyer-Moore, Horspool, Backward DAWG Matching, Backward Oracle Matching, and more. In particular, we show how to efficiently obtain the distribution of such an algorithm's running time cost for any given pattern in a random text model, which can be quite general, from simple uniform models to higher-order Markov models or hidden Markov models (HMMs). Furthermore, we provide a technique to compute the exact distribution of \emph{differences} in running time cost of two algorithms. In contrast to previous work, our approach is neither limited to simple text models, nor to asymptotic statements, nor to moment computations such as expectation and variance. Methodically, we use extensions of finite automata which we call deterministic arithmetic automata (DAAs) and probabilistic arithmetic automata (PAAs) [13]. To our knowledge, this is the first time that substring- or suffix-based pattern matching algorithms are analyzed exactly. Experimentally, we compare Horspool's algorithm, Backward DAWG Matching, and Backward Oracle Matching on prototypical patterns of short length and provide statistics on the size of minimal DAAs for these computations.

preprint2010arXiv

Probabilistic Arithmetic Automata and their Applications

We present probabilistic arithmetic automata (PAAs), a general model to describe chains of operations whose operands depend on chance, along with two different algorithms to exactly calculate the distribution of the results obtained by such probabilistic calculations. PAAs provide a unifying framework to approach many problems arising in computational biology and elsewhere. Here, we present five different applications, namely (1) pattern matching statistics on random texts, including the computation of the distribution of occurrence counts, waiting time and clump size under HMM background models; (2) exact analysis of window-based pattern matching algorithms; (3) sensitivity of filtration seeds used to detect candidate sequence alignments; (4) length and mass statistics of peptide fragments resulting from enzymatic cleavage reactions; and (5) read length statistics of 454 sequencing reads. The diversity of these applications indicates the flexibility and unifying character of the presented framework. While the construction of a PAA depends on the particular application, we single out a frequently applicable construction method for pattern statistics: We introduce deterministic arithmetic automata (DAAs) to model deterministic calculations on sequences, and demonstrate how to construct a PAA from a given DAA and a finite-memory random text model. We show how to transform a finite automaton into a DAA and then into the corresponding PAA.

Sven Rahmann

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Massively parallel read mapping on GPUs with PEANUT

Using the Expectation Maximization Algorithm with Heterogeneous Mixture Components for the Analysis of Spectrometry Data

Protein Hypernetworks: a Logic Framework for Interaction Dependencies and Perturbation Effects in Protein Networks

Exact Analysis of Pattern Matching Algorithms with Probabilistic Arithmetic Automata

Probabilistic Arithmetic Automata and their Applications