Researcher profile

Fangqiu Han

Fangqiu Han contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2015arXiv

Observability of Lattice Graphs

We consider a graph observability problem: how many edge colors are needed for an unlabeled graph so that an agent, walking from node to node, can uniquely determine its location from just the observed color sequence of the walk? Specifically, let G(n,d) be an edge-colored subgraph of d-dimensional (directed or undirected) lattice of size n^d = n * n * ... * n. We say that G(n,d) is t-observable if an agent can uniquely determine its current position in the graph from the color sequence of any t-dimensional walk, where the dimension is the number of different directions spanned by the edges of the walk. A walk in an undirected lattice G(n,d) has dimension between 1 and d, but a directed walk can have dimension between 1 and 2d because of two different orientations for each axis. We derive bounds on the number of colors needed for t-observability. Our main result is that Theta(n^(d/t)) colors are both necessary and sufficient for t-observability of G(n,d), where d is considered a constant. This shows an interesting dependence of graph observability on the ratio between the dimension of the lattice and that of the walk. In particular, the number of colors for full-dimensional walks is Theta(n^(1/2)) in the directed case, and Theta(n) in the undirected case, independent of the lattice dimension. All of our results extend easily to non-square lattices: given a lattice graph of size N = n_1 * n_2 * ... * n_d, the number of colors for t-observability is Theta (N^(1/t)).

preprint2012arXiv

Memory Efficient De Bruijn Graph Construction

Massively parallel DNA sequencing technologies are revolutionizing genomics research. Billions of short reads generated at low costs can be assembled for reconstructing the whole genomes. Unfortunately, the large memory footprint of the existing de novo assembly algorithms makes it challenging to get the assembly done for higher eukaryotes like mammals. In this work, we investigate the memory issue of constructing de Bruijn graph, a core task in leading assembly algorithms, which often consumes several hundreds of gigabytes memory for large genomes. We propose a disk-based partition method, called Minimum Substring Partitioning (MSP), to complete the task using less than 10 gigabytes memory, without runtime slowdown. MSP breaks the short reads into multiple small disjoint partitions so that each partition can be loaded into memory, processed individually and later merged with others to form a de Bruijn graph. By leveraging the overlaps among the k-mers (substring of length k), MSP achieves astonishing compression ratio: The total size of partitions is reduced from $Θ(kn)$ to $Θ(n)$, where $n$ is the size of the short read database, and $k$ is the length of a $k$-mer. Experimental results show that our method can build de Bruijn graphs using a commodity computer for any large-volume sequence dataset.