Researcher profile

Silu Huang

Silu Huang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2016arXiv

Finding Multiple New Optimal Locations in a Road Network

We study the problem of optimal location querying for location based services in road networks, which aims to find locations for new servers or facilities. The existing optimal solutions on this problem consider only the cases with one new server. When two or more new servers are to be set up, the problem with minmax cost criteria, MinMax, becomes NP-hard. In this work we identify some useful properties about the potential locations for the new servers, from which we derive a novel algorithm for MinMax, and show that it is efficient when the number of new servers is small. When the number of new servers is large, we propose an efficient 3-approximate algorithm. We verify with experiments on real road networks that our solutions are effective and attains significantly better result quality compared to the existing greedy algorithms.

preprint2015arXiv

Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff

The relative ease of collaborative data science and analysis has led to a proliferation of many thousands or millions of $versions$ of the same datasets in many scientific and commercial domains, acquired or constructed at various stages of data analysis across many users, and often over long periods of time. Managing, storing, and recreating these dataset versions is a non-trivial task. The fundamental challenge here is the $storage-recreation\;trade-off$: the more storage we use, the faster it is to recreate or retrieve versions, while the less storage we use, the slower it is to recreate or retrieve versions. Despite the fundamental nature of this problem, there has been a surprisingly little amount of work on it. In this paper, we study this trade-off in a principled manner: we formulate six problems under various settings, trading off these quantities in various ways, demonstrate that most of the problems are intractable, and propose a suite of inexpensive heuristics drawing from techniques in delay-constrained scheduling, and spanning tree literature, to solve these problems. We have built a prototype version management system, that aims to serve as a foundation to our DATAHUB system for facilitating collaborative data science. We demonstrate, via extensive experiments, that our proposed heuristics provide efficient solutions in practical dataset versioning scenarios.

preprint2015arXiv

Towards a unified query language for provenance and versioning

Organizations and teams collect and acquire data from various sources, such as social interactions, financial transactions, sensor data, and genome sequencers. Different teams in an organization as well as different data scientists within a team are interested in extracting a variety of insights which require combining and collaboratively analyzing datasets in diverse ways. DataHub is a system that aims to provide robust version control and provenance management for such a scenario. To be truly useful for collaborative data science, one also needs the ability to specify queries and analysis tasks over the versioning and the provenance information in a unified manner. In this paper, we present an initial design of our query language, called VQuel, that aims to support such unified querying over both types of information, as well as the intermediate and final results of analyses. We also discuss some of the key language design and implementation challenges moving forward.

preprint2014arXiv

(α, k)-Minimal Sorting and Skew Join in MPI and MapReduce

As computer clusters are found to be highly effective for handling massive datasets, the design of efficient parallel algorithms for such a computing model is of great interest. We consider (α, k)-minimal algorithms for such a purpose, where α is the number of rounds in the algorithm, and k is a bound on the deviation from perfect workload balance. We focus on new (α, k)-minimal algorithms for sorting and skew equijoin operations for computer clusters. To the best of our knowledge the proposed sorting and skew join algorithms achieve the best workload balancing guarantee when compared to previous works. Our empirical study shows that they are close to optimal in workload balancing. In particular, our proposed sorting algorithm is around 25% more efficient than the state-of-the-art Terasort algorithm and achieves significantly more even workload distribution by over 50%.

preprint2014arXiv

Temporal Graph Traversals: Definitions, Algorithms, and Applications

A temporal graph is a graph in which connections between vertices are active at specific times, and such temporal information leads to completely new patterns and knowledge that are not present in a non-temporal graph. In this paper, we study traversal problems in a temporal graph. Graph traversals, such as DFS and BFS, are basic operations for processing and studying a graph. While both DFS and BFS are well-known simple concepts, it is non-trivial to adopt the same notions from a non-temporal graph to a temporal graph. We analyze the difficulties of defining temporal graph traversals and propose new definitions of DFS and BFS for a temporal graph. We investigate the properties of temporal DFS and BFS, and propose efficient algorithms with optimal complexity. In particular, we also study important applications of temporal DFS and BFS. We verify the efficiency and importance of our graph traversal algorithms in real world temporal graphs.