Source author record

Pei He

Pei He appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV Machine Learning physics.optics

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

BGG: Bridging the Geometric Gap between Cross-View images by Vision Foundation Model Adaptation for Geo-Localization

Geometric differences between cross-view images, such as drone and satellite views, significantly increase the challenge of Cross-View Geo-Localization (CVGL), which aims to acquire the geolocation of images by image retrieval. To further enhance the CVGL performance, this paper proposes a parameter-efficient adaptation framework for bridging the geometric gap across images based on the vision foundation model (VFM) (e.g., DINOv3), termed BGG. BGG not only effectively leverages the general visual representations of VFM and captures the robust and consistent features from cross-view images, but also utilizes the generalization capabilities of the VFM, significantly improving the CVGL performance. It mainly contains a Multi-granularity Feature Enhancement Adapter (MFEA) and a Frequency-Aware Structural Aggregation (FASA) module. Specifically, MFEA enhances the scale adaptability and viewpoint robustness of features by multi-level dilated convolutions, effectively bridging the cross-view geometric gap with small training costs. Additionally, considering the [CLS] token lacks spatial details for precise image retrieval and localization, the FASA module modulates patch tokens in the frequency domain and performs adaptive aggregation for local structural feature enhancement. Finally, BGG fuses the enhanced local features with the [CLS] token for more accurate CVGL. Extensive experiments on University-1652 and SUES-200 datasets demonstrate that BGG has significant advantages over other methods and achieves state-of-the-art localization performance with low training costs.

preprint2024arXiv

Shrinking Your TimeStep: Towards Low-Latency Neuromorphic Object Recognition with Spiking Neural Network

Neuromorphic object recognition with spiking neural networks (SNNs) is the cornerstone of low-power neuromorphic computing. However, existing SNNs suffer from significant latency, utilizing 10 to 40 timesteps or more, to recognize neuromorphic objects. At low latencies, the performance of existing SNNs is drastically degraded. In this work, we propose the Shrinking SNN (SSNN) to achieve low-latency neuromorphic object recognition without reducing performance. Concretely, we alleviate the temporal redundancy in SNNs by dividing SNNs into multiple stages with progressively shrinking timesteps, which significantly reduces the inference latency. During timestep shrinkage, the temporal transformer smoothly transforms the temporal scale and preserves the information maximally. Moreover, we add multiple early classifiers to the SNN during training to mitigate the mismatch between the surrogate gradient and the true gradient, as well as the gradient vanishing/exploding, thus eliminating the performance degradation at low latency. Extensive experiments on neuromorphic datasets, CIFAR10-DVS, N-Caltech101, and DVS-Gesture have revealed that SSNN is able to improve the baseline accuracy by 6.55% ~ 21.41%. With only 5 average timesteps and without any data augmentation, SSNN is able to achieve an accuracy of 73.63% on CIFAR10-DVS. This work presents a heterogeneous temporal scale SNN and provides valuable insights into the development of high-performance, low-latency SNNs.

preprint2022arXiv

Super-resolution multicolor fluorescence microscopy enabled by an apochromatic super-oscillatory lens with extended depth-of-focus

Multicolor super-resolution imaging remains an intractable challenge for both far-field and near-field based super-resolution techniques. Planar super-oscillatory lens (SOL), a far-field subwavelength-focusing diffractive lens device, holds great potential for achieving sub-diffraction-limit imaging at multiple wavelengths. However, conventional SOL devices suffer from a numerical aperture (NA) related intrinsic tradeoff among the depth of focus (DoF), chromatic dispersion and focus spot size, being an essential characteristics of common diffractive optical elements. Typically, the limited DoF and significant chromatism associated with high NA can lead to unfavorable degradation of image quality although increasing NA imporves the resolution. Here, we apply a multi-objective genetic algorithm (GA) optimization approach to design an apochromatic binary-phase SOL that generates axially jointed multifoci concurrently having prolonged DoF, customized working distance (WD) and suppressed side-lobes yet minimized main-lobe size, optimizing the aforementioned NA-dependent tradeoff. Experimental implementation of this GA-optimized SOL demonstrates simultaneous focusing of blue, green and red light beams into an optical needle half of the incident wavelength in diameter at 428 um WD, resulting in an ultimate resolution better than one third of the incident wavelength in the lateral dimension. By integrating this apochromatic SOL device with a commercial fluorescence microscope, we employ the optical needle to perform, for the first time, three-dimensional super-resolution multicolor fluorescence imaging of the unseen fine structure of neurons at one go. The present study provides not only a practical route to far-field multicolor super-resolution imaging but also a viable approach for constructing imaging systems avoiding complex sample positioning and unfavorable photobleaching.