Source author record

Xuelin Zhu

Xuelin Zhu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision

Catalog footprint

What is connected

2works

1topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Query-Based Knowledge Sharing for Open-Vocabulary Multi-Label Classification

Identifying labels that did not appear during training, known as multi-label zero-shot learning, is a non-trivial task in computer vision. To this end, recent studies have attempted to explore the multi-modal knowledge of vision-language pre-training (VLP) models by knowledge distillation, allowing to recognize unseen labels in an open-vocabulary manner. However, experimental evidence shows that knowledge distillation is suboptimal and provides limited performance gain in unseen label prediction. In this paper, a novel query-based knowledge sharing paradigm is proposed to explore the multi-modal knowledge from the pretrained VLP model for open-vocabulary multi-label classification. Specifically, a set of learnable label-agnostic query tokens is trained to extract critical vision knowledge from the input image, and further shared across all labels, allowing them to select tokens of interest as visual clues for recognition. Besides, we propose an effective prompt pool for robust label embedding, and reformulate the standard ranking learning into a form of classification to allow the magnitude of feature vectors for matching, which both significantly benefit label recognition. Experimental results show that our framework significantly outperforms state-of-the-art methods on zero-shot task by 5.9% and 4.5% in mAP on the NUS-WIDE and Open Images, respectively.

preprint2020arXiv

Balanced Symmetric Cross Entropy for Large Scale Imbalanced and Noisy Data

Deep convolution neural network has attracted many attentions in large-scale visual classification task, and achieves significant performance improvement compared to traditional visual analysis methods. In this paper, we explore many kinds of deep convolution neural network architectures for large-scale product recognition task, which is heavily class-imbalanced and noisy labeled data, making it more challenged. Extensive experiments show that PNASNet achieves best performance among a variety of convolutional architectures. Together with ensemble technology and negative learning loss for noisy labeled data, we further improve the model performance on online test data. Finally, our proposed method achieves 0.1515 mean top-1 error on online test data.