Source author record

Chao Zhan

Chao Zhan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

astro-ph.IM gr-qc Information Retrieval

Catalog footprint

What is connected

2works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Ascend-RaBitQ: Heterogeneous NPU-CPU Acceleration of Billion-Scale Similarity Search with 1-bit Quantization

Vector similarity search is a critical component of modern AI systems, but traditional CPU-based implementations face fundamental scalability bottlenecks for billion-scale corpora due to prohibitive computational overhead and memory bandwidth limitations. While Neural Processing Units (NPUs) offer orders-of-magnitude higher compute density, existing CPU/GPU-optimized 1-bit RaBitQ quantization implementations cannot be directly ported to NPU architectures due to fundamental hardware mismatches, and homogeneous design paradigms struggle to simultaneously balance accuracy, memory footprint, and performance. This paper presents Ascend-RaBitQ, the first heterogeneous NPU-CPU optimized IVF-RaBitQ system for billion-scale vector search, built on the core insight that decoupling coarse ranking (NPU) from fine ranking (CPU) allows each stage to leverage its optimal hardware, breaking the long-standing accuracy-memory-performance trade-off. We propose a three-stage heterogeneous pipeline comprising AI Core-accelerated coarse ranking on 1-bit quantized vectors, on-device AI CPU Top-k processing, and host CPU fine re-ranking on full-precision vectors. We introduce four NPU architecture-native optimizations: fused AIC-AIV operators for parallel distance computation, computation flow restructuring to exploit rotation orthogonality, fine-grained index block-level load balancing that breaks query boundaries, and intra-NPU pipeline parallelism between AI Core and AI CPU to mask Top-k latency. Evaluation on standard datasets shows that Ascend-RaBitQ achieves 3.0* to 62.8* faster index construction than the CPU baseline, up to 4.6* throughput improvement over the fastest CPU IVF-RaBitQ implementation, and over 100* over the mathematically equivalent CPU baseline, while demonstrating encouraging scalability on distributed multi-NPU systems.

preprint2021arXiv

The response of the Convolutional Neural Network to the transient noise in Gravitational wave detection

In recent years, much work have studied the use of convolutional neural networks for gravitational-wave detection. However little work pay attention to whether the transient noise can trigger the CNN model or not. In this paper, we study the responses of the sine-Gaussian glitches, the Gaussian glitches and the ring-down glitches in the trained convolutional neural network classifier. We find that the network is robust to the sine-Gaussian and Gaussian glitches, whose false alarm probabilities are close to that of the LIGO-like noises, in contrast to the case of the ring-down glitches, in which the false alarm probability is far larger than that of the LIGO-like noises. We also investigate the responses of the glitches with different frequency. We find that when the frequency of the glitches falls in that of the trained GW signals, the false alarm probability of the glitches will be much larger than that of the LIGO-like noises, and the probability of the glitches being misjudged as the GW signals may even exceed 30%.