Researcher profile

Congcong Ge

Congcong Ge contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
1topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2026arXiv

All-in-one Graph-based Indexing for Hybrid Search on GPUs

Hybrid search has emerged as a promising paradigm that combines lexical and semantic retrieval, enhancing accuracy for applications such as recommendations, information retrieval, and Retrieval-Augmented Generation. However, existing methods are constrained by a trilemma: they sacrifice flexibility for efficiency, suffer from accuracy degradation, or incur prohibitive storage overhead for flexible combinations of retrieval paths. This paper introduces Allan-Poe, a novel all-in-one graph index accelerated by GPUs for efficient hybrid search. We first analyze the limitations of existing retrieval paradigms and extract key design principles for an effective hybrid index. Guided by the principles, we architect a unified graph-based index that flexibly integrates three retrieval paths (dense vector, sparse vector, and full-text) within a single, cohesive structure. To enable efficient construction, we design a GPU-accelerated pipeline featuring a warp-level hybrid distance kernel, RNG-IP joint pruning, and keyword-aware neighbor recycling. For query processing, we introduce a dynamic fusion framework that supports any combination of retrieval paths and weights without index reconstruction, flexibly leveraging logical structures from the knowledge graph to resolve complex multi-hop queries. Extensive experiments on 6 real-world datasets demonstrate that Allan-Poe achieves superior end-to-end query accuracy and outperforms state-of-the-art methods by 1.5x-186.4x in throughput, while significantly reducing storage overhead.

preprint2026arXiv

Frequency-Aware Graph Construction and Search for Dynamic Vector Databases

Approximate Nearest Neighbor Search (ANNS) is a crucial operation in databases and artificial intelligence. While graph-based ANNS methods like HNSW and NSG excel in performance, they assume uniform query distribution. However, in real-world scenarios, user preferences and temporal dynamics often result in certain data points being queried more frequently than others, and these query patterns can change over time. To better leverage such characteristics, we propose DQF, a novel Dual-Index Query Framework. This framework features a dual-layer index structure and a dynamic search strategy based on a decision tree. The dual-layer index includes a hot index for high-frequency nodes and a full index covering the entire dataset, allowing for the separate management of hot and cold queries. Furthermore, we propose a dynamic search strategy that employs a decision tree to determine whether a query is of the high-frequency type, avoiding unnecessary searches in the full index through early termination. Additionally, to address fluctuations in query frequency, we design an update mechanism to manage the hot index. New high-frequency nodes will be inserted into the hot index, which is periodically rebuilt when its size exceeds a predefined threshold, removing outdated low-frequency nodes. Experiments on four real-world datasets demonstrate that the Dual-Index Query Framework achieves a significant speedup of 2.0-5.7x over state-of-the-art algorithms while maintaining a 95% recall rate. Importantly, it avoids full index reconstruction even as query distributions change, underscoring its efficiency and practicality in dynamic query distribution scenarios.

preprint2020arXiv

KGClean: An Embedding Powered Knowledge Graph Cleaning Framework

The quality assurance of the knowledge graph is a prerequisite for various knowledge-driven applications. We propose KGClean, a novel cleaning framework powered by knowledge graph embedding, to detect and repair the heterogeneous dirty data. In contrast to previous approaches that either focus on filling missing data or clean errors violated limited rules, KGClean enables (i) cleaning both missing data and other erroneous values, and (ii) mining potential rules automatically, which expands the coverage of error detecting. KGClean first learns data representations by TransGAT, an effective knowledge graph embedding model, which gathers the neighborhood information of each data and incorporates the interactions among data for casting data to continuous vector spaces with rich semantics. KGClean integrates an active learning-based classification model, which identifies errors with a small seed of labels. KGClean utilizes an efficient PRO-repair strategy to repair errors using a novel concept of propagation power. Extensive experiments on four typical knowledge graphs demonstrate the effectiveness of KGClean in practice.