Source author record

Zhaoqiang Chen

Zhaoqiang Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Databases Artificial Intelligence Machine Learning

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

All-in-one Graph-based Indexing for Hybrid Search on GPUs

Hybrid search has emerged as a promising paradigm that combines lexical and semantic retrieval, enhancing accuracy for applications such as recommendations, information retrieval, and Retrieval-Augmented Generation. However, existing methods are constrained by a trilemma: they sacrifice flexibility for efficiency, suffer from accuracy degradation, or incur prohibitive storage overhead for flexible combinations of retrieval paths. This paper introduces Allan-Poe, a novel all-in-one graph index accelerated by GPUs for efficient hybrid search. We first analyze the limitations of existing retrieval paradigms and extract key design principles for an effective hybrid index. Guided by the principles, we architect a unified graph-based index that flexibly integrates three retrieval paths (dense vector, sparse vector, and full-text) within a single, cohesive structure. To enable efficient construction, we design a GPU-accelerated pipeline featuring a warp-level hybrid distance kernel, RNG-IP joint pruning, and keyword-aware neighbor recycling. For query processing, we introduce a dynamic fusion framework that supports any combination of retrieval paths and weights without index reconstruction, flexibly leveraging logical structures from the knowledge graph to resolve complex multi-hop queries. Extensive experiments on 6 real-world datasets demonstrate that Allan-Poe achieves superior end-to-end query accuracy and outperforms state-of-the-art methods by 1.5x-186.4x in throughput, while significantly reducing storage overhead.

preprint2026arXiv

Frequency-Aware Graph Construction and Search for Dynamic Vector Databases

Approximate Nearest Neighbor Search (ANNS) is a crucial operation in databases and artificial intelligence. While graph-based ANNS methods like HNSW and NSG excel in performance, they assume uniform query distribution. However, in real-world scenarios, user preferences and temporal dynamics often result in certain data points being queried more frequently than others, and these query patterns can change over time. To better leverage such characteristics, we propose DQF, a novel Dual-Index Query Framework. This framework features a dual-layer index structure and a dynamic search strategy based on a decision tree. The dual-layer index includes a hot index for high-frequency nodes and a full index covering the entire dataset, allowing for the separate management of hot and cold queries. Furthermore, we propose a dynamic search strategy that employs a decision tree to determine whether a query is of the high-frequency type, avoiding unnecessary searches in the full index through early termination. Additionally, to address fluctuations in query frequency, we design an update mechanism to manage the hot index. New high-frequency nodes will be inserted into the hot index, which is periodically rebuilt when its size exceeds a predefined threshold, removing outdated low-frequency nodes. Experiments on four real-world datasets demonstrate that the Dual-Index Query Framework achieves a significant speedup of 2.0-5.7x over state-of-the-art algorithms while maintaining a 95% recall rate. Importantly, it avoids full index reconstruction even as query distributions change, underscoring its efficiency and practicality in dynamic query distribution scenarios.

preprint2022arXiv

Adaptive Deep Learning for Entity Resolution by Risk Analysis

The state-of-the-art performance on entity resolution (ER) has been achieved by deep learning. However, deep models are usually trained on large quantities of accurately labeled training data, and can not be easily tuned towards a target workload. Unfortunately, in real scenarios, there may not be sufficient labeled training data, and even worse, their distribution is usually more or less different from the target workload even when they come from the same domain. To alleviate the said limitations, this paper proposes a novel risk-based approach to tune a deep model towards a target workload by its particular characteristics. Built on the recent advances on risk analysis for ER, the proposed approach first trains a deep model on labeled training data, and then fine-tunes it by minimizing its estimated misprediction risk on unlabeled target data. Our theoretical analysis shows that risk-based adaptive training can correct the label status of a mispredicted instance with a fairly good chance. We have also empirically validated the efficacy of the proposed approach on real benchmark data by a comparative study. Our extensive experiments show that it can considerably improve the performance of deep models. Furthermore, in the scenario of distribution misalignment, it can similarly outperform the state-of-the-art alternative of transfer learning by considerable margins. Using ER as a test case, we demonstrate that risk-based adaptive training is a promising approach potentially applicable to various challenging classification tasks.