Source author record

Silu Huang

Silu Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases Data Structures and Algorithms

Catalog footprint

What is connected

5works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

Finding Multiple New Optimal Locations in a Road Network

We study the problem of optimal location querying for location based services in road networks, which aims to find locations for new servers or facilities. The existing optimal solutions on this problem consider only the cases with one new server. When two or more new servers are to be set up, the problem with minmax cost criteria, MinMax, becomes NP-hard. In this work we identify some useful properties about the potential locations for the new servers, from which we derive a novel algorithm for MinMax, and show that it is efficient when the number of new servers is small. When the number of new servers is large, we propose an efficient 3-approximate algorithm. We verify with experiments on real road networks that our solutions are effective and attains significantly better result quality compared to the existing greedy algorithms.

preprint2015arXiv

Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff

The relative ease of collaborative data science and analysis has led to a proliferation of many thousands or millions of $versions$ of the same datasets in many scientific and commercial domains, acquired or constructed at various stages of data analysis across many users, and often over long periods of time. Managing, storing, and recreating these dataset versions is a non-trivial task. The fundamental challenge here is the $storage-recreation\;trade-off$: the more storage we use, the faster it is to recreate or retrieve versions, while the less storage we use, the slower it is to recreate or retrieve versions. Despite the fundamental nature of this problem, there has been a surprisingly little amount of work on it. In this paper, we study this trade-off in a principled manner: we formulate six problems under various settings, trading off these quantities in various ways, demonstrate that most of the problems are intractable, and propose a suite of inexpensive heuristics drawing from techniques in delay-constrained scheduling, and spanning tree literature, to solve these problems. We have built a prototype version management system, that aims to serve as a foundation to our DATAHUB system for facilitating collaborative data science. We demonstrate, via extensive experiments, that our proposed heuristics provide efficient solutions in practical dataset versioning scenarios.

preprint2015arXiv

Towards a unified query language for provenance and versioning

Organizations and teams collect and acquire data from various sources, such as social interactions, financial transactions, sensor data, and genome sequencers. Different teams in an organization as well as different data scientists within a team are interested in extracting a variety of insights which require combining and collaboratively analyzing datasets in diverse ways. DataHub is a system that aims to provide robust version control and provenance management for such a scenario. To be truly useful for collaborative data science, one also needs the ability to specify queries and analysis tasks over the versioning and the provenance information in a unified manner. In this paper, we present an initial design of our query language, called VQuel, that aims to support such unified querying over both types of information, as well as the intermediate and final results of analyses. We also discuss some of the key language design and implementation challenges moving forward.

preprint2014arXiv

(α, k)-Minimal Sorting and Skew Join in MPI and MapReduce

As computer clusters are found to be highly effective for handling massive datasets, the design of efficient parallel algorithms for such a computing model is of great interest. We consider (α, k)-minimal algorithms for such a purpose, where α is the number of rounds in the algorithm, and k is a bound on the deviation from perfect workload balance. We focus on new (α, k)-minimal algorithms for sorting and skew equijoin operations for computer clusters. To the best of our knowledge the proposed sorting and skew join algorithms achieve the best workload balancing guarantee when compared to previous works. Our empirical study shows that they are close to optimal in workload balancing. In particular, our proposed sorting algorithm is around 25% more efficient than the state-of-the-art Terasort algorithm and achieves significantly more even workload distribution by over 50%.

preprint2014arXiv

Temporal Graph Traversals: Definitions, Algorithms, and Applications

A temporal graph is a graph in which connections between vertices are active at specific times, and such temporal information leads to completely new patterns and knowledge that are not present in a non-temporal graph. In this paper, we study traversal problems in a temporal graph. Graph traversals, such as DFS and BFS, are basic operations for processing and studying a graph. While both DFS and BFS are well-known simple concepts, it is non-trivial to adopt the same notions from a non-temporal graph to a temporal graph. We analyze the difficulties of defining temporal graph traversals and propose new definitions of DFS and BFS for a temporal graph. We investigate the properties of temporal DFS and BFS, and propose efficient algorithms with optimal complexity. In particular, we also study important applications of temporal DFS and BFS. We verify the efficiency and importance of our graph traversal algorithms in real world temporal graphs.

Silu Huang

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Finding Multiple New Optimal Locations in a Road Network

Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff

Towards a unified query language for provenance and versioning

(α, k)-Minimal Sorting and Skew Join in MPI and MapReduce

Temporal Graph Traversals: Definitions, Algorithms, and Applications