Source author record

Yufei Guo

Yufei Guo appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mtrl-sci Information Retrieval

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

High-Performance KV$_3$Sb$_5$/WSe$_2$ van der Waals Photodetectors

Kagome metals AV$_3$Sb$_5$ (A = K, Rb, Cs) have recently emerged as a promising platform for exploring correlated and topological quantum states, yet their potential for optoelectronic applications remains largely unexplored. Here, we report high-performance photodetectors based on van der Waals KV$_3$Sb$_5$/WSe$_2$ heterojunctions. A high-quality Schottky interface readily forms between KV$_3$Sb$_5$ and WSe$_2$, enabling efficient separation and transport of photoinduced carriers. Under 520 nm illumination, the device achieves an open-circuit voltage up to 0.6 V, a responsivity of 809 mA/W, and a fast response time of 18.3 us. This work demonstrates the promising optoelectronic applications of Kagome metals and highlights the potential of KV$_3$Sb$_5$-based van der Waals heterostructures for high-performance photodetection.

preprint2026arXiv

Text-Guided Visual Representation Learning for Robust Multimodal E-Commerce Recommendation

Multimodal item embeddings are crucial for e-commerce item-to-item (I2I) retrieval, yet real-world product images often contain promotional overlays and background clutter that inject spurious visual cues and degrade retrieval robustness. This issue is particularly pronounced in MLRM-style pipelines, where a frozen vision encoder is connected to an LLM through a lightweight connector that must selectively aggregate visual tokens. We propose Text-Guided Q-Former (TGQ-Former), a text-guided visual representation learning framework that leverages structured metadata as semantic guidance for visual token extraction while preserving complementary visual evidence. Concretely, TGQ-Former employs a hybrid-query connector to disentangle metadata-anchored and exploratory visual streams, and introduces a lightweight reliability-aware dual-gated vector modulation module to adaptively calibrate their contributions under noisy inputs. Experiments on large-scale, real-world e-commerce datasets with full-pool retrieval show that TGQ-Former consistently outperforms strong connector baselines and end-to-end MLLMs. On average, it improves Hit Rate@100 (H@100) by 6.04%, demonstrating the effectiveness of text-guided visual encoding for robust multimodal retrieval.

preprint2023arXiv

NeuroCLIP: Neuromorphic Data Understanding by CLIP and SNN

Recently, the neuromorphic vision sensor has received more and more interest. However, the neuromorphic data consists of asynchronous event spikes, which makes it difficult to construct a big benchmark to train a power general neural network model, thus limiting the neuromorphic data understanding for ``unseen" objects by deep learning. While for the frame image, since the training data can be obtained easily, the zero-shot and few-shot learning for ``unseen" task via the large Contrastive Vision-Language Pre-training (CLIP) model, which is pre-trained by large-scale image-text pairs in 2D, have shown inspirational performance. We wonder whether the CLIP could be transferred to neuromorphic data recognition to handle the ``unseen" problem. To this end, we materialize this idea with NeuroCLIP in the paper. The NeuroCLIP consists of 2D CLIP and two specially designed modules for neuromorphic data understanding. First, an event-frame module that could convert the event spikes to the sequential frame image with a simple discrimination strategy. Second, an inter-timestep adapter, which is a simple fine-tuned adapter based on a spiking neural network (SNN) for the sequential features coming from the visual encoder of CLIP to improve the few-shot performance. Various experiments on neuromorphic datasets including N-MNIST, CIFAR10-DVS, and ES-ImageNet demonstrate the effectiveness of NeuroCLIP. Our code is open-sourced at https://github.com/yfguo91/NeuroCLIP.git.