Source author record

Dian Li

Dian Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Software Engineering cond-mat.mes-hall cond-mat.other

Catalog footprint

What is connected

6works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Two-phase Recommendation Framework for Consistent Java Method Names

In software engineering (SE) tasks, the naming approach is so important that it attracts many scholars from all over the world to study how to improve the quality of method names. To accurately recommend method names, we employ a novel framework to handle this problem. In our expeirments, nearly 8 million Java methods are collected from open source organizations as our evaluation dataset. In the first-phase recommendation, we introduce a fast and simple classifier based on the fast text neural network for reccomending potential method category. In the second-phase recomendation, we employ both two Long Short Term Memory Networks to reccomend consitent method names from each classification. Evaluation results prove that the proposed approach significantly outperforms state-of-the-art approach.

preprint2022arXiv

An empirical study on Java method name suggestion: are we there yet?

A large-scale evaluation for current naming approaches substantiates that such approaches are accurate. However, it is less known about which categories of method names work well via such naming approaches and how's the performance of naming approaches. To point out the superiority of the current naming approach, in this paper, we conduct an empirical study on such approaches in a new dataset. Moreover, we analyze the successful naming approaches above and find that: (1) around 60% of the accepted recommendation names are made on prefixes within get, set, is, and test. (2) A large portion (19.3%) of method names successfully recommended could be derived from the given method bodies. The comparisons also demonstrate the superior performance of the empirical study.

preprint2022arXiv

Bridging Video-text Retrieval with Multiple Choice Questions

Pre-training a model to learn transferable video-text representation for retrieval has attracted a lot of attention in recent years. Previous dominant works mainly adopt two separate encoders for efficient retrieval, but ignore local associations between videos and texts. Another line of research uses a joint encoder to interact video with texts, but results in low efficiency since each text-video pair needs to be fed into the model. In this work, we enable fine-grained video-text interactions while maintaining high efficiency for retrieval via a novel pretext task, dubbed as Multiple Choice Questions (MCQ), where a parametric module BridgeFormer is trained to answer the "questions" constructed by the text features via resorting to the video features. Specifically, we exploit the rich semantics of text (i.e., nouns and verbs) to build questions, with which the video encoder can be trained to capture more regional content and temporal dynamics. In the form of questions and answers, the semantic associations between local video-text features can be properly established. BridgeFormer is able to be removed for downstream retrieval, rendering an efficient and flexible model with only two encoders. Our method outperforms state-of-the-art methods on the popular text-to-video retrieval task in five datasets with different experimental setups (i.e., zero-shot and fine-tune), including HowTo100M (one million videos). We further conduct zero-shot action recognition, which can be cast as video-to-text retrieval, and our approach also significantly surpasses its counterparts. As an additional benefit, our method achieves competitive results with much shorter pre-training videos on single-modality downstream tasks, e.g., action recognition with linear evaluation.

preprint2022arXiv

CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation

To improve instance-level detection/segmentation performance, existing self-supervised and semi-supervised methods extract either task-unrelated or task-specific training signals from unlabeled data. We show that these two approaches, at the two extreme ends of the task-specificity spectrum, are suboptimal for the task performance. Utilizing too little task-specific training signals causes underfitting to the ground-truth labels of downstream tasks, while the opposite causes overfitting to the ground-truth labels. To this end, we propose a novel Class-Agnostic Semi-Supervised Learning (CA-SSL) framework to achieve a more favorable task-specificity balance in extracting training signals from unlabeled data. CA-SSL has three training stages that act on either ground-truth labels (labeled data) or pseudo labels (unlabeled data). This decoupling strategy avoids the complicated scheme in traditional SSL methods that balances the contributions from both data types. Especially, we introduce a warmup training stage to achieve a more optimal balance in task specificity by ignoring class information in the pseudo labels, while preserving localization training signals. As a result, our warmup model can better avoid underfitting/overfitting when fine-tuned on the ground-truth labels in detection and segmentation tasks. Using 3.6M unlabeled data, we achieve a significant performance gain of 4.7% over ImageNet-pretrained baseline on FCOS object detection. In addition, our warmup model demonstrates excellent transferability to other detection and segmentation frameworks.

preprint2022arXiv

Controllable Augmentations for Video Representation Learning

This paper focuses on self-supervised video representation learning. Most existing approaches follow the contrastive learning pipeline to construct positive and negative pairs by sampling different clips. However, this formulation tends to bias to static background and have difficulty establishing global temporal structures. The major reason is that the positive pairs, i.e., different clips sampled from the same video, have limited temporal receptive field, and usually share similar background but differ in motions. To address these problems, we propose a framework to jointly utilize local clips and global videos to learn from detailed region-level correspondence as well as general long-term temporal relations. Based on a set of controllable augmentations, we achieve accurate appearance and motion pattern alignment through soft spatio-temporal region contrast. Our formulation is able to avoid the low-level redundancy shortcut by mutual information minimization to improve the generalization. We also introduce local-global temporal order dependency to further bridge the gap between clip-level and video-level representations for robust temporal modeling. Extensive experiments demonstrate that our framework is superior on three video benchmarks in action recognition and video retrieval, capturing more accurate temporal dynamics.

preprint2016arXiv

Bound exciton and free exciton states in GaSe thin slab

The photoluminescence (PL) and absorption experiments have been performed in GaSe slab with incident light polarized perpendicular to c-axis of sample at 10K. An obvious energy difference of about 34meV between exciton absorption peak and PL peak (the highest energy peak) is observed. By studying the temperature dependence of PL spectra, we attribute it to energy difference between free exciton and bound exciton states, where main exciton absorption peak comes from free exciton absorption, and PL peak are attributed to recombination of bound exciton at 10K. This strong bound exciton effect is stable up to 50K. Moreover, the temperature dependence of integrated PL intensity and PL lifetime reveals that a non-radiative process, with active energy extracted as 0.5meV, dominates PL emission.

Dian Li

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

A Two-phase Recommendation Framework for Consistent Java Method Names

An empirical study on Java method name suggestion: are we there yet?

Bridging Video-text Retrieval with Multiple Choice Questions

CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation

Controllable Augmentations for Video Representation Learning

Bound exciton and free exciton states in GaSe thin slab