Source author record

Cong Bai

Cong Bai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Multimedia

Catalog footprint

What is connected

3works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Intra-Modal Constraint Loss For Image-Text Retrieval

Cross-modal retrieval has drawn much attention in both computer vision and natural language processing domains. With the development of convolutional and recurrent neural networks, the bottleneck of retrieval across image-text modalities is no longer the extraction of image and text features but an efficient loss function learning in embedding space. Many loss functions try to closer pairwise features from heterogeneous modalities. This paper proposes a method for learning joint embedding of images and texts using an intra-modal constraint loss function to reduce the violation of negative pairs from the same homogeneous modality. Experimental results show that our approach outperforms state-of-the-art bi-directional image-text retrieval methods on Flickr30K and Microsoft COCO datasets. Our code is publicly available: https://github.com/CanonChen/IMC.

preprint2022arXiv

ISDA: Position-Aware Instance Segmentation with Deformable Attention

Most instance segmentation models are not end-to-end trainable due to either the incorporation of proposal estimation (RPN) as a pre-processing or non-maximum suppression (NMS) as a post-processing. Here we propose a novel end-to-end instance segmentation method termed ISDA. It reshapes the task into predicting a set of object masks, which are generated via traditional convolution operation with learned position-aware kernels and features of objects. Such kernels and features are learned by leveraging a deformable attention network with multi-scale representation. Thanks to the introduced set-prediction mechanism, the proposed method is NMS-free. Empirically, ISDA outperforms Mask R-CNN (the strong baseline) by 2.6 points on MS-COCO, and achieves leading performance compared with recent models. Code will be available soon.

preprint2022arXiv

MMINR: Multi-frame-to-Multi-frame Inference with Noise Resistance for Precipitation Nowcasting with Radar

Precipitation nowcasting based on radar echo maps is essential in meteorological research. Recently, Convolutional RNNs based methods dominate this field, but they cannot be solved by parallel computation resulting in longer inference time. FCN based methods adopt a multi-frame-to-single-frame inference (MSI) strategy to avoid this problem. They feedback into the model again to predict the next time step to get multi-frame nowcasting results in the prediction phase, which will lead to the accumulation of prediction errors. In addition, precipitation noise is a crucial factor contributing to high prediction errors because of its unpredictability. To address this problem, we propose a novel Multi-frame-to-Multi-frame Inference (MMI) model with Noise Resistance (NR) named MMINR. It avoids error accumulation and resists precipitation noiseś negative effect in parallel computation. NR contains a Noise Dropout Module (NDM) and a Semantic Restore Module (SRM). NDM deliberately dropout noise simple yet efficient, and SRM supplements semantic information of features to alleviate the problem of semantic information mistakenly lost by NDM. Experimental results demonstrate that MMINR can attain competitive scores compared with other SOTAs. The ablation experiments show that the proposed NDM and SRM can solve the aforementioned problems.

Cong Bai

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Intra-Modal Constraint Loss For Image-Text Retrieval

ISDA: Position-Aware Instance Segmentation with Deformable Attention

MMINR: Multi-frame-to-Multi-frame Inference with Noise Resistance for Precipitation Nowcasting with Radar