Researcher profile

Mengjiao Wang

Mengjiao Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval

Dual encoders and cross encoders have been widely used for image-text retrieval. Between the two, the dual encoder encodes the image and text independently followed by a dot product, while the cross encoder jointly feeds image and text as the input and performs dense multi-modal fusion. These two architectures are typically modeled separately without interaction. In this work, we propose LoopITR, which combines them in the same network for joint learning. Specifically, we let the dual encoder provide hard negatives to the cross encoder, and use the more discriminative cross encoder to distill its predictions back to the dual encoder. Both steps are efficiently performed together in the same model. Our work centers on empirical analyses of this combined architecture, putting the main focus on the design of the distillation objective. Our experimental results highlight the benefits of training the two encoders in the same network, and demonstrate that distillation can be quite effective with just a few hard negative examples. Experiments on two standard datasets (Flickr30K and COCO) show our approach achieves state-of-the-art dual encoder performance when compared with approaches using a similar amount of data.

preprint2022arXiv

Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment

Vision-and-Language (V+L) pre-training models have achieved tremendous success in recent years on various multi-modal benchmarks. However, the majority of existing models require pre-training on a large set of parallel image-text data, which is costly to collect, compared to image-only or text-only data. In this paper, we explore unsupervised Vision-and-Language pre-training (UVLP) to learn the cross-modal representation from non-parallel image and text datasets. We found two key factors that lead to good unsupervised V+L pre-training without parallel data: (i) joint image-and-text input (ii) overall image-text alignment (even for non-parallel data). Accordingly, we propose a novel unsupervised V+L pre-training curriculum for non-parallel texts and images. We first construct a weakly aligned image-text corpus via a retrieval-based approach, then apply a set of multi-granular alignment pre-training tasks, including region-to-tag, region-to-phrase, and image-to-sentence alignment, to bridge the gap between the two modalities. A comprehensive ablation study shows each granularity is helpful to learn a stronger pre-trained model. We adapt our pre-trained model to a set of V+L downstream tasks, including VQA, NLVR2, Visual Entailment, and RefCOCO+. Our model achieves the state-of-art performance in all these tasks under the unsupervised setting.

preprint2021arXiv

Control of electronic band profiles through depletion layer engineering in core-shell nanocrystals

The understanding of depletion layers is of major importance to control the optical and electronic properties of metal oxide (MO) nanocrystals (NCs). Here, we show that depletion layer engineering is the main mechanism of photodoping of MO NCs. We show that the introduction of different electronic interfaces induces a double-bending of the electronic bands and a distinct carrier density profile. We found that the light-induced depletion layer modulation and bending of the bands close to the surface of the nanocrystal is the main mechanism responsible for the storage of extra electrons after photodoping in MO NCs. We support our results by a combined experimental and theoretical approach in the case of Sn:In2O3/In2O3 core-shell NCs, in which we compare numerical simulations with empirical modeling and experiments. This allows not only to extract the main mechanism of photodoping in MO NCs but also to engineer the charge storage capability of MO NCs after photodoping. Our results are transferable to other core-multishell systems, opening up a novel direction to control the optoelectronic properties of nanoscale MOs by designing their energetic band profiles through depletion layer engineering.

preprint2020arXiv

Cryptocurrency Address Clustering and Labeling

Anonymity is one of the most important qualities of blockchain technology. For example, one can simply create a bitcoin address to send and receive funds without providing KYC to any authority. In general, the real identity behind cryptocurrency addresses is not known, however, some addresses can be clustered according to their ownership by analyzing behavioral patterns, allowing those with known attribution to be assigned labels. These labels may be further used for legal and compliance purposes to assist in law enforcement investigations. In this document, we discuss our methodology behind assigning attribution labels to cryptocurrency addresses.