Source author record

Zhengxiang Wang

Zhengxiang Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

2works
2topics
3close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chinese Question Matching

To investigate the role of linguistic knowledge in data augmentation (DA) for Natural Language Processing (NLP), we designed two adapted DA programs and applied them to LCQMC (a Large-scale Chinese Question Matching Corpus) for a binary Chinese question matching classification task. The two DA programs produce augmented texts by five simple text editing operations (or DA techniques), largely irrespective of language generation rules, but one is enhanced with a pre-trained n-gram language model to fuse it with prior linguistic knowledge. We then trained four neural network models (BOW, CNN, LSTM, and GRU) and a pre-trained model (ERNIE-Gram) on the LCQMCs train sets of varying size as well as the related augmented train sets produced by the two DA programs. The results show that there are no significant performance differences between the models trained on the two types of augmented train sets, both when the five DA techniques are applied together or separately. Moreover, due to the inability of the five DA techniques to make strictly paraphrastic augmented texts, the results indicate the need of sufficient amounts of training examples for the classification models trained on them to mediate the negative impact of false matching augmented text pairs and improve performance, a limitation of random text editing perturbations used as a DA approach. Similar results were also obtained for English.

preprint2022arXiv

Thirty-Two Years of IEEE VIS: Authors, Fields of Study and Citations

The IEEE VIS Conference (VIS) recently rebranded itself as a unified conference and officially positioned itself within the discipline of Data Science. Driven by this movement, we investigated (1) who contributed to VIS, and (2) where VIS stands in the scientific world. We examined the authors and fields of study of 3,240 VIS publications in the past 32 years based on data collected from OpenAlex and IEEE Xplore, among other sources. We also examined the citation flows from referenced papers (i.e., those referenced in VIS) to VIS, and from VIS to citing papers (i.e., those citing VIS). We found that VIS has been becoming increasingly popular and collaborative. The number of publications, of unique authors, and of participating countries have been steadily growing. Both cross-country collaborations, and collaborations between educational and non-educational affiliations, namely "cross-type collaborations", are increasing. The dominance of the US is decreasing, and authors from China are now an important part of VIS. In terms of author affiliation types, VIS is increasingly dominated by authors from universities. We found that the topics, inspirations, and influences of VIS research is limited such that (1) VIS, and their referenced and citing papers largely fall into the Computer Science domain, and (2) citations flow mostly between the same set of subfields within Computer Science. Our citation analyses showed that award-winning VIS papers had higher citations. Interactive visualizations, replication data, source code and supplementary material are available at https://32vis.hongtaoh.com and https://osf.io/zkvjm.