Source author record

Tak-Wah Lam

Tak-Wah Lam appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Genomics Data Structures and Algorithms Distributed, Parallel, and Cluster Computing Computational Engineering, Finance, and Science Quantitative Methods

Catalog footprint

What is connected

9works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Scheduling with Gaps: New Models and Algorithms

We consider scheduling problems for unit jobs with release times, where the number or size of the gaps in the schedule is taken into consideration, either in the objective function or as a constraint. Except for a few papers on energy minimization, there is no work in the scheduling literature that uses performance metrics depending on the gap structure of a schedule. One of our objectives is to initiate the study of such scheduling problems with gaps. We show that such problems often lead to interesting algorithmic problems, with connections to other areas of algorithmics. We focus on the model with unit jobs. First we examine scheduling problems with deadlines, where we consider variants of minimum-gap scheduling, including maximizing throughput with a budget for gaps or minimizing the number of gaps with a throughput requirement. We then turn to other objective functions. For example, in some scenarios, gaps in a schedule may be actually desirable, leading to the problem of maximizing the number of gaps. Other versions we study include minimizing maximum gap or maximizing minimum gap. The second part of the paper examines the model without deadlines, where we focus on the tradeoff between the number of gaps and the total or maximum flow time. For all these problems we provide polynomial time algorithms, with running times ranging from $O(n \log n)$ for some problems, to $O(n^7)$ for other. The solutions involve a spectrum of algorithmic techniques, including different dynamic programming formulations, speed-up techniques based on searching Monge arrays, searching X + Y matrices, or implicit binary search.

preprint2014arXiv

GPU-Accelerated BWT Construction for Large Collection of Short Reads

Advances in DNA sequencing technology have stimulated the development of algorithms and tools for processing very large collections of short strings (reads). Short-read alignment and assembly are among the most well-studied problems. Many state-of-the-art aligners, at their core, have used the Burrows-Wheeler transform (BWT) as a main-memory index of a reference genome (typical example, NCBI human genome). Recently, BWT has also found its use in string-graph assembly, for indexing the reads (i.e., raw data from DNA sequencers). In a typical data set, the volume of reads is tens of times of the sequenced genome and can be up to 100 Gigabases. Note that a reference genome is relatively stable and computing the index is not a frequent task. For reads, the index has to computed from scratch for each given input. The ability of efficient BWT construction becomes a much bigger concern than before. In this paper, we present a practical method called CX1 for constructing the BWT of very large string collections. CX1 is the first tool that can take advantage of the parallelism given by a graphics processing unit (GPU, a relative cheap device providing a thousand or more primitive cores), as well as simultaneously the parallelism from a multi-core CPU and more interestingly, from a cluster of GPU-enabled nodes. Using CX1, the BWT of a short-read collection of up to 100 Gigabases can be constructed in less than 2 hours using a machine equipped with a quad-core CPU and a GPU, or in about 43 minutes using a cluster with 4 such machines (the speedup is almost linear after excluding the first 16 minutes for loading the reads from the hard disk). The previously fastest tool BRC is measured to take 12 hours to process 100 Gigabases on one machine; it is non-trivial how BRC can be parallelized to take advantage a cluster of machines, let alone GPUs.

preprint2014arXiv

MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner. It finished assembling a soil metagenomics dataset with 252Gbps in 44.1 hours and 99.6 hours on a single computing node with and without a GPU, respectively. MEGAHIT assembles the data as a whole, i.e., it avoids pre-processing like partitioning and normalization, which might compromise on result integrity. MEGAHIT generates 3 times larger assembly, with longer contig N50 and average contig length than the previous assembly. 55.8% of the reads were aligned to the assembly, which is 4 times higher than the previous. The source code of MEGAHIT is freely available at https://github.com/voutcn/megahit under GPLv3 license.

preprint2014arXiv

MICA: A fast short-read aligner that takes full advantage of Intel Many Integrated Core Architecture (MIC)

Background: Short-read aligners have recently gained a lot of speed by exploiting the massive parallelism of GPU. An uprising alternative to GPU is Intel MIC; supercomputers like Tianhe-2, currently top of TOP500, is built with 48,000 MIC boards to offer ~55 PFLOPS. The CPU-like architecture of MIC allows CPU-based software to be parallelized easily; however, the performance is often inferior to GPU counterparts as an MIC board contains only ~60 cores (while a GPU board typically has over a thousand cores). Results: To better utilize MIC-enabled computers for NGS data analysis, we developed a new short-read aligner MICA that is optimized in view of MICs limitation and the extra parallelism inside each MIC core. Experiments on aligning 150bp paired-end reads show that MICA using one MIC board is 4.9 times faster than the BWA-MEM (using 6-core of a top-end CPU), and slightly faster than SOAP3-dp (using a GPU). Furthermore, MICAs simplicity allows very efficient scale-up when multiple MIC boards are used in a node (3 cards give a 14.1-fold speedup over BWA-MEM). Summary: MICA can be readily used by MIC-enabled supercomputers for production purpose. We have tested MICA on Tianhe-2 with 90 WGS samples (17.47 Tera-bases), which can be aligned in an hour less than 400 nodes. MICA has impressive performance even though the current MIC is at its initial stage of development (the next generation of MIC has been announced to release in late 2014).

preprint2013arXiv

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species

Background - The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. Results - In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. Conclusions - Many current genome assemblers produced useful assemblies, containing a significant representation of their genes, regulatory sequences, and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.

preprint2013arXiv

SOAP3-dp: Fast, Accurate and Sensitive GPU-based Short Read Aligner

To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, GEM and GPU-based aligners including BarraCUDA and CUSHAW, SOAP3-dp is two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60 percent. Real data evaluation using human genome demonstrates SOAP3-dp's power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1 percent FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides a scoring scheme same as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A.

preprint2013arXiv

SOAPdenovo-Trans: De novo transcriptome assembly with short RNA-Seq reads

Motivation: Transcriptome sequencing has long been the favored method for quickly and inexpensively obtaining the sequences for a large number of genes from an organism with no reference genome. With the rapidly increasing throughputs and decreasing costs of next generation sequencing, RNA-Seq has gained in popularity; but given the typically short reads (e.g. 2 x 90 bp paired ends) of this technol- ogy, de novo assembly to recover complete or full-length transcript sequences remains an algorithmic challenge. Results: We present SOAPdenovo-Trans, a de novo transcriptome assembler designed specifically for RNA-Seq. Its performance was evaluated on transcriptome datasets from rice and mouse. Using the known transcripts from these well-annotated genomes (sequenced a decade ago) as our benchmark, we assessed how SOAPdenovo- Trans and two other popular software handle the practical issues of alternative splicing and variable expression levels. Our conclusion is that SOAPdenovo-Trans provides higher contiguity, lower redundancy, and faster execution. Availability and Implementation: Source code and user manual are at http://sourceforge.net/projects/soapdenovotrans/ Contact: xieyl@genomics.cn or bgi-soap@googlegroups.com

preprint2011arXiv

Edit Distance to Monotonicity in Sliding Windows

Given a stream of items each associated with a numerical value, its edit distance to monotonicity is the minimum number of items to remove so that the remaining items are non-decreasing with respect to the numerical value. The space complexity of estimating the edit distance to monotonicity of a data stream is becoming well-understood over the past few years. Motivated by applications on network quality monitoring, we extend the study to estimating the edit distance to monotonicity of a sliding window covering the $w$ most recent items in the stream for any $w \ge 1$. We give a deterministic algorithm which can return an estimate within a factor of $(4+\eps)$ using $O(\frac{1}{\eps^2} \log^2(\eps w))$ space. We also extend the study in two directions. First, we consider a stream where each item is associated with a value from a partial ordered set. We give a randomized $(4+ε)$-approximate algorithm using $O(\frac{1}{ε^2} \log ε^2 w \log w)$ space. Second, we consider an out-of-order stream where each item is associated with a creation time and a numerical value, and items may be out of order with respect to their creation times. The goal is to estimate the edit distance to monotonicity with respect to the numerical value of items arranged in the order of creation times. We show that any randomized constant-approximate algorithm requires linear space.

preprint2010arXiv

Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window

The past decade has witnessed many interesting algorithms for maintaining statistics over a data stream. This paper initiates a theoretical study of algorithms for monitoring distributed data streams over a time-based sliding window (which contains a variable number of items and possibly out-of-order items). The concern is how to minimize the communication between individual streams and the root, while allowing the root, at any time, to be able to report the global statistics of all streams within a given error bound. This paper presents communication-efficient algorithms for three classical statistics, namely, basic counting, frequent items and quantiles. The worst-case communication cost over a window is $O(\frac{k} ε \log \frac{εN}{k})$ bits for basic counting and $O(\frac{k}ε \log \frac{N}{k})$ words for the remainings, where $k$ is the number of distributed data streams, $N$ is the total number of items in the streams that arrive or expire in the window, and $ε< 1$ is the desired error bound. Matching and nearly matching lower bounds are also obtained.

Tak-Wah Lam

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Scheduling with Gaps: New Models and Algorithms

GPU-Accelerated BWT Construction for Large Collection of Short Reads

MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

MICA: A fast short-read aligner that takes full advantage of Intel Many Integrated Core Architecture (MIC)

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species

SOAP3-dp: Fast, Accurate and Sensitive GPU-based Short Read Aligner

SOAPdenovo-Trans: De novo transcriptome assembly with short RNA-Seq reads

Edit Distance to Monotonicity in Sliding Windows

Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window