Source author record

Mengjiao Wang

Mengjiao Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language Computer Vision Machine Learning cond-mat.mtrl-sci Cryptography and Security Multimedia physics.app-ph

Catalog footprint

What is connected

4works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval

Dual encoders and cross encoders have been widely used for image-text retrieval. Between the two, the dual encoder encodes the image and text independently followed by a dot product, while the cross encoder jointly feeds image and text as the input and performs dense multi-modal fusion. These two architectures are typically modeled separately without interaction. In this work, we propose LoopITR, which combines them in the same network for joint learning. Specifically, we let the dual encoder provide hard negatives to the cross encoder, and use the more discriminative cross encoder to distill its predictions back to the dual encoder. Both steps are efficiently performed together in the same model. Our work centers on empirical analyses of this combined architecture, putting the main focus on the design of the distillation objective. Our experimental results highlight the benefits of training the two encoders in the same network, and demonstrate that distillation can be quite effective with just a few hard negative examples. Experiments on two standard datasets (Flickr30K and COCO) show our approach achieves state-of-the-art dual encoder performance when compared with approaches using a similar amount of data.

preprint2022arXiv

Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment

Vision-and-Language (V+L) pre-training models have achieved tremendous success in recent years on various multi-modal benchmarks. However, the majority of existing models require pre-training on a large set of parallel image-text data, which is costly to collect, compared to image-only or text-only data. In this paper, we explore unsupervised Vision-and-Language pre-training (UVLP) to learn the cross-modal representation from non-parallel image and text datasets. We found two key factors that lead to good unsupervised V+L pre-training without parallel data: (i) joint image-and-text input (ii) overall image-text alignment (even for non-parallel data). Accordingly, we propose a novel unsupervised V+L pre-training curriculum for non-parallel texts and images. We first construct a weakly aligned image-text corpus via a retrieval-based approach, then apply a set of multi-granular alignment pre-training tasks, including region-to-tag, region-to-phrase, and image-to-sentence alignment, to bridge the gap between the two modalities. A comprehensive ablation study shows each granularity is helpful to learn a stronger pre-trained model. We adapt our pre-trained model to a set of V+L downstream tasks, including VQA, NLVR2, Visual Entailment, and RefCOCO+. Our model achieves the state-of-art performance in all these tasks under the unsupervised setting.

preprint2021arXiv

Control of electronic band profiles through depletion layer engineering in core-shell nanocrystals

The understanding of depletion layers is of major importance to control the optical and electronic properties of metal oxide (MO) nanocrystals (NCs). Here, we show that depletion layer engineering is the main mechanism of photodoping of MO NCs. We show that the introduction of different electronic interfaces induces a double-bending of the electronic bands and a distinct carrier density profile. We found that the light-induced depletion layer modulation and bending of the bands close to the surface of the nanocrystal is the main mechanism responsible for the storage of extra electrons after photodoping in MO NCs. We support our results by a combined experimental and theoretical approach in the case of Sn:In2O3/In2O3 core-shell NCs, in which we compare numerical simulations with empirical modeling and experiments. This allows not only to extract the main mechanism of photodoping in MO NCs but also to engineer the charge storage capability of MO NCs after photodoping. Our results are transferable to other core-multishell systems, opening up a novel direction to control the optoelectronic properties of nanoscale MOs by designing their energetic band profiles through depletion layer engineering.

preprint2020arXiv

Cryptocurrency Address Clustering and Labeling

Anonymity is one of the most important qualities of blockchain technology. For example, one can simply create a bitcoin address to send and receive funds without providing KYC to any authority. In general, the real identity behind cryptocurrency addresses is not known, however, some addresses can be clustered according to their ownership by analyzing behavioral patterns, allowing those with known attribution to be assigned labels. These labels may be further used for legal and compliance purposes to assist in law enforcement investigations. In this document, we discuss our methodology behind assigning attribution labels to cryptocurrency addresses.