Source author record

Xiaolong Wan

Xiaolong Wan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases Artificial Intelligence

Catalog footprint

What is connected

2works

2topics

2close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Redundancy-Driven Top-$k$ Functional Dependency Discovery

Functional dependencies (FDs) are basic constraints in relational databases and are used for many data management tasks. Most FD discovery algorithms find all valid dependencies, but this causes two problems. First, the computational cost is prohibitive: computational complexity grows quadratically with the number of tuples and exponentially with the number of attributes, making discovery slow on large-scale and high-dimensional data. Second, the result set can be huge, making it hard to identify useful dependencies. We propose SDP (Selective-Discovery-and-Prune), which discovers the top-$k$ FDs ranked by redundancy count. Redundancy count measures how much duplicated information an FD explains and connects directly to storage overhead and update anomalies. SDP uses an upper bound on redundancy to prune the search space. It is proved that this upper bound is monotone: adding attributes refines partitions and thus decreases the bound. Once the bound falls below the top-$k$ threshold, the entire branch can be skipped. We improve SDP with three optimizations: ordering attributes by partition cardinality, using pairwise statistics in a Partition Cardinality Matrix to tighten bounds, and a global scheduler to explore promising branches first. Experiments on over 40 datasets show that SDP is much faster and uses less memory than exhaustive methods.

preprint2020arXiv

Reachability Queries with Label and Substructure Constraints on Knowledge Graphs

Since knowledge graphs (KGs) describe and model the relationships between entities and concepts in the real world, reasoning on KGs often correspond to the reachability queries with label and substructure constraints (LSCR). Specially, for a search path p, LSCR queries not only require that the labels of the edges passed by p are in a certain label set, but also claim that a vertex in p could satisfy a certain substructure constraint. LSCR queries is much more complex than the label-constraint reachability (LCR) queries, and there is no efficient solution for LSCR queries on KGs, to the best of our knowledge. Motivated by this, we introduce two solutions for such queries on KGs, UIS and INS. The former can also be utilized for general edge-labeled graphs, and is relatively handy for practical implementation. The latter is an efficient local-index-based informed search strategy. An extensive experimental evaluation, on both synthetic and real KGs, illustrates that our solutions can efficiently process LSCR queries on KGs.