Researcher profile

Junaed younus Khan

Junaed younus Khan contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
1topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2022arXiv

Aspect-Based API Review Classification: How Far Can Pre-Trained Transformer Model Go?

APIs (Application Programming Interfaces) are reusable software libraries and are building blocks for modern rapid software development. Previous research shows that programmers frequently share and search for reviews of APIs on the mainstream software question and answer (Q&A) platforms like Stack Overflow, which motivates researchers to design tasks and approaches related to process API reviews automatically. Among these tasks, classifying API reviews into different aspects (e.g., performance or security), which is called the aspect-based API review classification, is of great importance. The current state-of-the-art (SOTA) solution to this task is based on the traditional machine learning algorithm. Inspired by the great success achieved by pre-trained models on many software engineering tasks, this study fine-tunes six pre-trained models for the aspect-based API review classification task and compares them with the current SOTA solution on an API review benchmark collected by Uddin et al. The investigated models include four models (BERT, RoBERTa, ALBERT and XLNet) that are pre-trained on natural languages, BERTOverflow that is pre-trained on text corpus extracted from posts on Stack Overflow, and CosSensBERT that is designed for handling imbalanced data. The results show that all the six fine-tuned models outperform the traditional machine learning-based tool. More specifically, the improvement on the F1-score ranges from 21.0% to 30.2%. We also find that BERTOverflow, a model pre-trained on the corpus from Stack Overflow, does not show better performance than BERT. The result also suggests that CosSensBERT also does not exhibit better performance than BERT in terms of F1, but it is still worthy of being considered as it achieves better performance on MCC and AUC.

preprint2022arXiv

Automatic Detection and Analysis of Technical Debts in Peer-Review Documentation of R Packages

Technical debt (TD) is a metaphor for code-related problems that arise as a result of prioritizing speedy delivery over perfect code. Given that the reduction of TDs can have long-term positive impact in the software engineering life-cycle (SDLC), TDs are studied extensively in the literature. However, very few of the existing research focused on the technical debts of R programming language despite its popularity and usage. Recent research by Codabux et al. [21] finds that R packages can have 10 diverse TD types analyzing peer-review documentation. However, the findings are based on the manual analysis of a small sample of R package review comments. In this paper, we develop a suite of Machine Learning (ML) classifiers to detect the 10 TDs automatically. The best performing classifier is based on the deep ML model BERT, which achieves F1-scores of 0.71 - 0.91. We then apply the trained BERT models on all available peer-review issue comments from two platforms, rOpenSci and BioConductor (13.5K review comments coming from a total of 1297 R packages). We conduct an empirical study on the prevalence and evolution of 10 TDs in the two R platforms. We discovered documentation debt is the most prevalent among all types of TD, and it is also expanding rapidly. We also find that R packages of generic platform (i.e. rOpenSci) are more prone to TD compared to domain-specific platform (i.e. BioConductor). Our empirical study findings can guide future improvements opportunities in R package documentation. Our ML models can be used to automatically monitor the prevalence and evolution of TDs in R package documentation.

preprint2021arXiv

Automatic Detection of Five API Documentation Smells: Practitioners' Perspectives

The learning and usage of an API is supported by official documentation. Like source code, API documentation is itself a software product. Several research results show that bad design in API documentation can make the reuse of API features difficult. Indeed, similar to code smells or code antipatterns, poorly designed API documentation can also exhibit 'smells'. Such documentation smells can be described as bad documentation styles that do not necessarily produce an incorrect documentation but nevertheless make the documentation difficult to properly understand and to use. Recent research on API documentation has focused on finding content inaccuracies in API documentation and to complement API documentation with external resources (e.g., crowd-shared code examples). We are aware of no research that focused on the automatic detection of API documentation smells. This paper makes two contributions. First, we produce a catalog of five API documentation smells by consulting literature on API documentation presentation problems. We create a benchmark dataset of 1,000 API documentation units by exhaustively and manually validating the presence of the five smells in Java official API reference and instruction documentation. Second, we conduct a survey of 21 professional software developers to validate the catalog. The developers agreed that they frequently encounter all five smells in API official documentation and 95.2% of them reported that the presence of the documentation smells negatively affects their productivity. The participants wished for tool support to automatically detect and fix the smells in API official documentation. We develop a suite of rule-based, deep and shallow machine learning classifiers to automatically detect the smells. The best performing classifier BERT, a deep learning model, achieves F1-scores of 0.75 - 0.97.