Researcher profile

Zhou Yang

Zhou Yang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2024arXiv

Assessing AI Detectors in Identifying AI-Generated Code: Implications for Education

Educators are increasingly concerned about the usage of Large Language Models (LLMs) such as ChatGPT in programming education, particularly regarding the potential exploitation of imperfections in Artificial Intelligence Generated Content (AIGC) Detectors for academic misconduct. In this paper, we present an empirical study where the LLM is examined for its attempts to bypass detection by AIGC Detectors. This is achieved by generating code in response to a given question using different variants. We collected a dataset comprising 5,069 samples, with each sample consisting of a textual description of a coding problem and its corresponding human-written Python solution codes. These samples were obtained from various sources, including 80 from Quescol, 3,264 from Kaggle, and 1,725 from LeetCode. From the dataset, we created 13 sets of code problem variant prompts, which were used to instruct ChatGPT to generate the outputs. Subsequently, we assessed the performance of five AIGC detectors. Our results demonstrate that existing AIGC Detectors perform poorly in distinguishing between human-written code and AI-generated code.

preprint2022arXiv

Aspect-Based API Review Classification: How Far Can Pre-Trained Transformer Model Go?

APIs (Application Programming Interfaces) are reusable software libraries and are building blocks for modern rapid software development. Previous research shows that programmers frequently share and search for reviews of APIs on the mainstream software question and answer (Q&A) platforms like Stack Overflow, which motivates researchers to design tasks and approaches related to process API reviews automatically. Among these tasks, classifying API reviews into different aspects (e.g., performance or security), which is called the aspect-based API review classification, is of great importance. The current state-of-the-art (SOTA) solution to this task is based on the traditional machine learning algorithm. Inspired by the great success achieved by pre-trained models on many software engineering tasks, this study fine-tunes six pre-trained models for the aspect-based API review classification task and compares them with the current SOTA solution on an API review benchmark collected by Uddin et al. The investigated models include four models (BERT, RoBERTa, ALBERT and XLNet) that are pre-trained on natural languages, BERTOverflow that is pre-trained on text corpus extracted from posts on Stack Overflow, and CosSensBERT that is designed for handling imbalanced data. The results show that all the six fine-tuned models outperform the traditional machine learning-based tool. More specifically, the improvement on the F1-score ranges from 21.0% to 30.2%. We also find that BERTOverflow, a model pre-trained on the corpus from Stack Overflow, does not show better performance than BERT. The result also suggests that CosSensBERT also does not exhibit better performance than BERT in terms of F1, but it is still worthy of being considered as it achieves better performance on MCC and AUC.

preprint2022arXiv

Can Identifier Splitting Improve Open-Vocabulary Language Model of Code?

Statistical language models on source code have successfully assisted software engineering tasks. However, developers can create or pick arbitrary identifiers when writing source code. Freely chosen identifiers lead to the notorious out-of-vocabulary (OOV) problem that negatively affects model performance. Recently, Karampatsis et al. showed that using the Byte Pair Encoding (BPE) algorithm to address the OOV problem can improve the language models' predictive performance on source code. However, a drawback of BPE is that it cannot split the identifiers in a way that preserves the meaningful semantics. Prior researchers also show that splitting compound identifiers into sub-words that reflect the semantics can benefit software development tools. These two facts motivate us to explore whether identifier splitting techniques can be utilized to augment the BPE algorithm and boost the performance of open-vocabulary language models considered in Karampatsis et al.'s work. This paper proposes to split identifiers in both constructing vocabulary and processing model inputs procedures, thus exploiting three different settings of applying identifier splitting to language models for the code completion task. We contrast models' performance under these settings and find that simply inserting identifier splitting into the pipeline hurts the model performance, while a hybrid strategy combining identifier splitting and the BPE algorithm can outperform the original open-vocabulary models on predicting identifiers by 3.68% of recall and 6.32% of Mean Reciprocal Rank. The results also show that the hybrid strategy can improve the entropy of language models by 2.02%.

preprint2022arXiv

Detecting Offensive Language on Social Networks: An End-to-end Detection Method based on Graph Attention Networks

The pervasiveness of offensive language on the social network has caused adverse effects on society, such as abusive behavior online. It is urgent to detect offensive language and curb its spread. Existing research shows that methods with community structure features effectively improve the performance of offensive language detection. However, the existing models deal with community structure independently, which seriously affects the effectiveness of detection models. In this paper, we propose an end-to-end method based on community structure and text features for offensive language detection (CT-OLD). Specifically, the community structure features are directly captured by the graph attention network layer, and the text embeddings are taken from the last hidden layer of BERT. Attention mechanisms and position encoding are used to fuse these features. Meanwhile, we add user opinion to the community structure for representing user features. The user opinion is represented by user historical behavior information, which outperforms that represented by text information. Besides the above point, the distribution of users and tweets is unbalanced in the popular datasets, which limits the generalization ability of the model. To address this issue, we construct and release a dataset with reasonable user distribution. Our method outperforms baselines with the F1 score of 89.94%. The results show that the end-to-end model effectively learns the potential information of community structure and text, and user historical behavior information is more suitable for user opinion representation.

preprint2022arXiv

Minimal Quantile Functions Subject to Stochastic Dominance Constraints

We consider a problem of finding an SSD (second-order stochastic dominance)-minimal quantile function subject to the mixture of FSD (first-order stochastic dominance) and SSD constraints. The SSD-minimal solution is explicitly worked out and has a close relation to the Skorokhod problem. This result is then applied to explicitly solve a risk minimizing problem in financial economics.

preprint2022arXiv

Natural Attack for Pre-trained Models of Code

Pre-trained models of code have achieved success in many important software engineering tasks. However, these powerful models are vulnerable to adversarial attacks that slightly perturb model inputs to make a victim model produce wrong outputs. Current works mainly attack models of code with examples that preserve operational program semantics but ignore a fundamental requirement for adversarial example generation: perturbations should be natural to human judges, which we refer to as naturalness requirement. In this paper, we propose ALERT (nAturaLnEss AwaRe ATtack), a black-box attack that adversarially transforms inputs to make victim models produce wrong outputs. Different from prior works, this paper considers the natural semantic of generated examples at the same time as preserving the operational semantic of original inputs. Our user study demonstrates that human developers consistently consider that adversarial examples generated by ALERT are more natural than those generated by the state-of-the-art work by Zhang et al. that ignores the naturalness requirement. On attacking CodeBERT, our approach can achieve attack success rates of 53.62%, 27.79%, and 35.78% across three downstream tasks: vulnerability prediction, clone detection and code authorship attribution. On GraphCodeBERT, our approach can achieve average success rates of 76.95%, 7.96% and 61.47% on the three tasks. The above outperforms the baseline by 14.07% and 18.56% on the two pre-trained models on average. Finally, we investigated the value of the generated adversarial examples to harden victim models through an adversarial fine-tuning procedure and demonstrated the accuracy of CodeBERT and GraphCodeBERT against ALERT-generated adversarial examples increased by 87.59% and 92.32%, respectively.

preprint2022arXiv

PTM4Tag: Sharpening Tag Recommendation of Stack Overflow Posts with Pre-trained Models

Stack Overflow is often viewed as the most influential Software Question Answer (SQA) website with millions of programming-related questions and answers. Tags play a critical role in efficiently structuring the contents in Stack Overflow and are vital to support a range of site operations, e.g., querying relevant contents. Poorly selected tags often introduce extra noise and redundancy, which leads to tag synonym and tag explosion problems. Thus, an automated tag recommendation technique that can accurately recommend high-quality tags is desired to alleviate the problems mentioned above. Inspired by the recent success of pre-trained language models (PTMs) in natural language processing (NLP), we present PTM4Tag, a tag recommendation framework for Stack Overflow posts that utilize PTMs with a triplet architecture, which models the components of a post, i.e., Title, Description, and Code with independent language models. To the best of our knowledge, this is the first work that leverages PTMs in the tag recommendation task of SQA sites. We comparatively evaluate the performance of PTM4Tag based on five popular pre-trained models: BERT, RoBERTa, ALBERT, CodeBERT, and BERTOverflow. Our results show that leveraging the software engineering (SE) domain-specific PTM CodeBERT in PTM4Tag achieves the best performance among the five considered PTMs and outperforms the state-of-the-art deep learning (Convolutional Neural Network-based) approach by a large margin in terms of average $Precision@k$, $Recall@k$, and $F1$-$score@k$. We conduct an ablation study to quantify the contribution of a post's constituent components (Title, Description, and Code Snippets) to the performance of PTM4Tag. Our results show that Title is the most important in predicting the most relevant tags, and utilizing all the components achieves the best performance.

preprint2022arXiv

Revisiting Neuron Coverage Metrics and Quality of Deep Neural Networks

Deep neural networks (DNN) have been widely applied in modern life, including critical domains like autonomous driving, making it essential to ensure the reliability and robustness of DNN-powered systems. As an analogy to code coverage metrics for testing conventional software, researchers have proposed neuron coverage metrics and coverage-driven methods to generate DNN test cases. However, Yan et al. doubt the usefulness of existing coverage criteria in DNN testing. They show that a coverage-driven method is less effective than a gradient-based method in terms of both uncovering defects and improving model robustness. In this paper, we conduct a replication study of the work by Yan et al. and extend the experiments for deeper analysis. A larger model and a dataset of higher resolution images are included to examine the generalizability of the results. We also extend the experiments with more test case generation techniques and adjust the process of improving model robustness to be closer to the practical life cycle of DNN development. Our experiment results confirm the conclusion from Yan et al. that coverage-driven methods are less effective than gradient-based methods. Yan et al. find that using gradient-based methods to retrain cannot repair defects uncovered by coverage-driven methods. They attribute this to the fact that the two types of methods use different perturbation strategies: gradient-based methods perform differentiable transformations while coverage-driven methods can perform additional non-differentiable transformations. We test several hypotheses and further show that even coverage-driven methods are constrained only to perform differentiable transformations, the uncovered defects still cannot be repaired by adversarial training with gradient-based methods. Thus, defensive strategies for coverage-driven methods should be further studied.

preprint2022arXiv

Robust control problems of BSDEs coupled with value functions

A robust control problem is considered in this paper, where the controlled stochastic differential equations (SDEs) include ambiguity parameters and their coefficients satisfy non-Lipschitz continuous and non-linear growth conditions, the objective function is expressed as a backward stochastic differential equation (BSDE) with the generator depending on the value function. We establish the existence and uniqueness of the value function in a proper space and provide a verification theorem. Moreover, we apply the results to solve two typical optimal investment problems in the market with ambiguity, one of which is with Heston stochastic volatility model. In particular, we establish some sharp estimations for Heston model with ambiguity parameters.

preprint2021arXiv

BiasRV: Uncovering Biased Sentiment Predictions at Runtime

Sentiment analysis (SA) systems, though widely applied in many domains, have been demonstrated to produce biased results. Some research works have been done in automatically generating test cases to reveal unfairness in SA systems, but the community still lacks tools that can monitor and uncover biased predictions at runtime. This paper fills this gap by proposing BiasRV, the first tool to raise an alarm when a deployed SA system makes a biased prediction on a given input text. To implement this feature, BiasRV dynamically extracts a template from an input text and from the template generates gender-discriminatory mutants (semantically-equivalent texts that only differ in gender information). Based on popular metrics used to evaluate the overall fairness of an SA system, we define distributional fairness property for an individual prediction of an SA system. This property specifies a requirement that for one piece of text, mutants from different gender classes should be treated similarly as a whole. Verifying the distributional fairness property causes much overhead to the running system. To run more efficiently, BiasRV adopts a two-step heuristic: (1) sampling several mutants from each gender and checking if the system predicts them as of the same sentiment, (2) checking distributional fairness only when sampled mutants have conflicting results. Experiments show that compared to directly checking the distributional fairness property for each input text, our two-step heuristic can decrease overhead used for analyzing mutants by 73.81% while only resulting in 6.7% of biased predictions being missed. Besides, BiasRV can be used conveniently without knowing the implementation of SA systems. Future researchers can easily extend BiasRV to detect more types of bias, e.g. race and occupation.

preprint2021arXiv

CrossASR++: A Modular Differential Testing Framework for Automatic Speech Recognition

Developers need to perform adequate testing to ensure the quality of Automatic Speech Recognition (ASR) systems. However, manually collecting required test cases is tedious and time-consuming. Our recent work proposes CrossASR, a differential testing method for ASR systems. This method first utilizes Text-to-Speech (TTS) to generate audios from texts automatically and then feed these audios into different ASR systems for cross-referencing to uncover failed test cases. It also leverages a failure estimator to find failing test cases more efficiently. Such a method is inherently self-improvable: the performance can increase by leveraging more advanced TTS and ASR systems. So in this accompanying tool demo paper, we devote more engineering and propose CrossASR++, an easy-to-use ASR testing tool that can be conveniently extended to incorporate different TTS and ASR systems, and failure estimators. We also make CrossASR++ chunk texts from a given corpus dynamically and enable the estimator to work in a more effective and flexible way. We demonstrate that the new features can help CrossASR++ discover more failed test cases. Using the same TTS and ASR systems, CrossASR++ can uncover 26.2% more failed test cases for 4 ASRs than the original tool. Moreover, by simply adding one more ASR for cross-referencing, we can increase the number of failed test cases uncovered for each of the 4 ASR systems by 25.07%, 39.63%, 20.9\% and 8.17% respectively. We also extend CrossASR++ with 5 additional failure estimators. Compared to worst estimator, the best one can discover 10.41% more failed test cases within the same amount of time.

preprint2020arXiv

Analysis of the optimal exercise boundary of American put options with delivery lags

A make-your-mind-up option is an American derivative with delivery lags. We show that its put option can be decomposed as a European put and a new type of American-style derivative. The latter is an option for which the investor receives the Greek Theta of the corresponding European option as the running payoff, and decides an optimal stopping time to terminate the contract. Based on this decomposition and using free boundary techniques, we show that the associated optimal exercise boundary exists and is a strictly increasing and smooth curve, and analyze the asymptotic behavior of the value function and the optimal exercise boundary for both large maturity and small time lag.

preprint2020arXiv

COVID19 Tracking: An Interactive Tracking, Visualizing and Analyzing Platform

The Coronavirus Disease 2019 (COVID-19) has now become a pandemic, inflicting millions of people and causing tens of thousands of deaths. To better understand the dynamics of COVID-19, we present a comprehensive COVID-19 tracking and visualization platform that pinpoints the dynamics of the COVID-19 worldwide. Four essential components are implemented: 1) presenting the visualization map of COVID-19 confirmed cases and total counts all over the world; 2) showing the worldwide trends of COVID-19 at multi-grained levels; 3) provide multi-view comparisons, including confirmed cases per million people, mortality rate and accumulative cure rate; 4) integrating a multi-grained view of the disease spreading dynamics in China and showing how the epidemic is taken under control in China.

preprint2020arXiv

Data Centers Job Scheduling with Deep Reinforcement Learning

Efficient job scheduling on data centers under heterogeneous complexity is crucial but challenging since it involves the allocation of multi-dimensional resources over time and space. To adapt the complex computing environment in data centers, we proposed an innovative Advantage Actor-Critic (A2C) deep reinforcement learning based approach called A2cScheduler for job scheduling. A2cScheduler consists of two agents, one of which, dubbed the actor, is responsible for learning the scheduling policy automatically and the other one, the critic, reduces the estimation error. Unlike previous policy gradient approaches, A2cScheduler is designed to reduce the gradient estimation variance and to update parameters efficiently. We show that the A2cScheduler can achieve competitive scheduling performance using both simulated workloads and real data collected from an academic data center.

preprint2020arXiv

Forecasting People's Needs in Hurricane Events from Social Network

Social networks can serve as a valuable communication channel for calls for help, offering assistance, and coordinating rescue activities in disaster. Social networks such as Twitter allow users to continuously update relevant information, which is especially useful during a crisis, where the rapidly changing conditions make it crucial to be able to access accurate information promptly. Social media helps those directly affected to inform others of conditions on the ground in real time and thus enables rescue workers to coordinate their efforts more effectively, better meeting the survivors' need. This paper presents a new sequence to sequence based framework for forecasting people's needs during disasters using social media and weather data. It consists of two Long Short-Term Memory (LSTM) models, one of which encodes input sequences of weather information and the other plays as a conditional decoder that decodes the encoded vector and forecasts the survivors' needs. Case studies utilizing data collected during Hurricane Sandy in 2012, Hurricane Harvey and Hurricane Irma in 2017 were analyzed and the results compared with those obtained using a statistical language model n-gram and an LSTM generative model. Our proposed sequence to sequence method forecast people's needs more successfully than either of the other models. This new approach shows great promise for enhancing disaster management activities such as evacuation planning and commodity flow management.

preprint2020arXiv

Fusing Motion Patterns and Key Visual Information for Semantic Event Recognition in Basketball Videos

Many semantic events in team sport activities e.g. basketball often involve both group activities and the outcome (score or not). Motion patterns can be an effective means to identify different activities. Global and local motions have their respective emphasis on different activities, which are difficult to capture from the optical flow due to the mixture of global and local motions. Hence it calls for a more effective way to separate the global and local motions. When it comes to the specific case for basketball game analysis, the successful score for each round can be reliably detected by the appearance variation around the basket. Based on the observations, we propose a scheme to fuse global and local motion patterns (MPs) and key visual information (KVI) for semantic event recognition in basketball videos. Firstly, an algorithm is proposed to estimate the global motions from the mixed motions based on the intrinsic property of camera adjustments. And the local motions could be obtained from the mixed and global motions. Secondly, a two-stream 3D CNN framework is utilized for group activity recognition over the separated global and local motion patterns. Thirdly, the basket is detected and its appearance features are extracted through a CNN structure. The features are utilized to predict the success or failure. Finally, the group activity recognition and success/failure prediction results are integrated using the kronecker product for event recognition. Experiments on NCAA dataset demonstrate that the proposed method obtains state-of-the-art performance.

preprint2019arXiv

$T\bar{T}$ deformations and integrable spin chains

We consider current-current deformations that generalise $T\bar{T}$ ones, and show that they may be also introduced for integrable spin chains. In analogy with the integrable QFT setup, we define the deformation as a modification of the S matrix in the Bethe equations. Using results by Bargheer, Beisert and Loebbert we show that the deforming operator is composite and constructed out of two currents on the lattice; its expectation value factorises like for $T\bar{T}$. Such a deformation may be considered for any combination of charges that preserve the model's integrable structure.