Source author record

Fan Hu

Fan Hu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Multimedia Molecular Networks

Catalog footprint

What is connected

4works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Learn to Understand Negation in Video Retrieval

Negation is a common linguistic skill that allows human to express what we do NOT want. Naturally, one might expect video retrieval to support natural-language queries with negation, e.g., finding shots of kids sitting on the floor and not playing with a dog. However, the state-of-the-art deep learning based video retrieval models lack such ability, as they are typically trained on video description datasets such as MSR-VTT and VATEX that lack negated descriptions. Their retrieved results basically ignore the negator in the sample query, incorrectly returning videos showing kids playing with dog. This paper presents the first study on learning to understand negation in video retrieval and make contributions as follows. By re-purposing two existing datasets (MSR-VTT and VATEX), we propose a new evaluation protocol for video retrieval with negation. We propose a learning based method for training a negation-aware video retrieval model. The key idea is to first construct a soft negative caption for a specific training video by partially negating its original caption, and then compute a bidirectionally constrained loss on the triplet. This auxiliary loss is weightedly added to a standard retrieval loss. Experiments on the re-purposed benchmarks show that re-training the CLIP (Contrastive Language-Image Pre-Training) model by the proposed method clearly improves its ability to handle queries with negation. In addition, the model performance on the original benchmarks is also improved.

preprint2022arXiv

Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval

In this paper we revisit feature fusion, an old-fashioned topic, in the new context of text-to-video retrieval. Different from previous research that considers feature fusion only at one end, let it be video or text, we aim for feature fusion for both ends within a unified framework. We hypothesize that optimizing the convex combination of the features is preferred to modeling their correlations by computationally heavy multi-head self attention. We propose Lightweight Attentional Feature Fusion (LAFF). LAFF performs feature fusion at both early and late stages and at both video and text ends, making it a powerful method for exploiting diverse (off-the-shelf) features. The interpretability of LAFF can be used for feature selection. Extensive experiments on five public benchmark sets (MSR-VTT, MSVD, TGIF, VATEX and TRECVID AVS 2016-2020) justify LAFF as a new baseline for text-to-video retrieval.

preprint2022arXiv

Prediction of Potential Commercially Available Inhibitors against SARS-CoV-2 by Multi-Task Deep Learning Model

The outbreak of COVID-19 caused millions of deaths worldwide, and the number of total infections is still rising. It is necessary to identify some potentially effective drugs that can be used to prevent the development of severe symptoms or even death for those infected. Fortunately, many efforts have been made, and several effective drugs have been identified. The rapidly increasing amount of data is of great help for training an effective and specific deep learning model. In this study, we propose a multi-task deep learning model for the purpose of screening commercially available and effective inhibitors against SARS-CoV-2. First, we pretrained a model on several heterogenous protein-ligand interaction datasets. The model achieved competitive results on some benchmark datasets. Next, a coronavirus-specific dataset was collected and used to fine-tune the model. Then, the fine-tuned model was used to select commercially available drugs against SARS-CoV-2 protein targets. Overall, twenty compounds were listed as potential inhibitors. We further explored the model interpretability and observed the predicted important binding sites. Based on this prediction, molecular docking was also performed to visualize the binding modes of the selected inhibitors.

preprint2015arXiv

Dense v.s. Sparse: A Comparative Study of Sampling Analysis in Scene Classification of High-Resolution Remote Sensing Imagery

Scene classification is a key problem in the interpretation of high-resolution remote sensing imagery. Many state-of-the-art methods, e.g. bag-of-visual-words model and its variants, the topic models as well as deep learning-based approaches, share similar procedures: patch sampling, feature description/learning and classification. Patch sampling is the first and a key procedure which has a great influence on the results. In the literature, many different sampling strategies have been used, {e.g. dense sampling, random sampling, keypoint-based sampling and saliency-based sampling, etc. However, it is still not clear which sampling strategy is suitable for the scene classification of high-resolution remote sensing images. In this paper, we comparatively study the effects of different sampling strategies under the scenario of scene classification of high-resolution remote sensing images. We divide the existing sampling methods into two types: dense sampling and sparse sampling, the later of which includes random sampling, keypoint-based sampling and various saliency-based sampling proposed recently. In order to compare their performances, we rely on a standard bag-of-visual-words model to construct our testing scheme, owing to their simplicity, robustness and efficiency. The experimental results on two commonly used datasets show that dense sampling has the best performance among all the strategies but with high spatial and computational complexity, random sampling gives better or comparable results than other sparse sampling methods, like the sophisticated multi-scale key-point operators and the saliency-based methods which are intensively studied and commonly used recently.

Fan Hu

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Learn to Understand Negation in Video Retrieval

Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval

Prediction of Potential Commercially Available Inhibitors against SARS-CoV-2 by Multi-Task Deep Learning Model

Dense v.s. Sparse: A Comparative Study of Sampling Analysis in Scene Classification of High-Resolution Remote Sensing Imagery