Paper detail

Portability of Scientific Workflows in NGS Data Analysis: A Case Study

The analysis of next-generation sequencing (NGS) data requires complex computational workflows consisting of dozens of autonomously developed yet interdependent processing steps. Whenever large amounts of data need to be processed, these workflows must be executed on a parallel and/or distributed systems to ensure reasonable runtime. Porting a workflow developed for a particular system on a particular hardware infrastructure to another system or to another infrastructure is non-trivial, which poses a major impediment to the scientific necessities of workflow reproducibility and workflow reusability. In this work, we describe our efforts to port a state-of-the-art workflow for the detection of specific variants in whole-exome sequencing of mice. The workflow originally was developed in the scientific workflow system snakemake for execution on a high-performance cluster controlled by Sun Grid Engine. In the project, we ported it to the scientific workflow system SaasFee that can execute workflows on (multi-core) stand-alone servers or on clusters of arbitrary sizes using the Hadoop. The purpose of this port was that also owners of low-cost hardware infrastructures, for which Hadoop was made for, become able to use the workflow. Although both the source and the target system are called scientific workflow systems, they differ in numerous aspects, ranging from the workflow languages to the scheduling mechanisms and the file access interfaces. These differences resulted in various problems, some expected and more unexpected, that had to be resolved before the workflow could be run with equal semantics. As a side-effect, we also report cost/runtime ratios for a state-of-the-art NGS workflow on very different hardware platforms: A comparably cheap stand-alone server (80 threads), a mid-cost, mid-sized cluster (552 threads), and a high-end HPC system (3784 threads).

preprint2020arXivOpen access
0citations
0reviews
0saves
Nocode
Nodataset
0institutions

Next steps

Decide what to do with this paper

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Log in to curate

Reading frame

Keep the important context close to the paper

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Institutions

Add specific reaction

Move through the context

Research map

Open full explorer

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Structured reviews

0 review(s)

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

Work discussion

0 comment(s)

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.