Source author record

Supatsara Wattanakriengkrai

Supatsara Wattanakriengkrai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering Digital Libraries

Catalog footprint

What is connected

6works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Does Coding in Pythonic Zen Peak Performance? Preliminary Experiments of Nine Pythonic Idioms at Scale

In the field of data science, and for academics in general, the Python programming language is a popular choice, mainly because of its libraries for storing, manipulating, and gaining insight from data. Evidence includes the versatile set of machine learning, data visualization, and manipulation packages used for the ever-growing size of available data. The Zen of Python is a set of guiding design principles that developers use to write acceptable and elegant Python code. Most principles revolve around simplicity. However, as the need to compute large amounts of data, performance has become a necessity for the Python programmer. The new idea in this paper is to confirm whether writing the Pythonic way peaks performance at scale. As a starting point, we conduct a set of preliminary experiments to evaluate nine Pythonic code examples by comparing the performance of both Pythonic and Non-Pythonic code snippets. Our results reveal that writing in Pythonic idioms may save memory and time. We show that incorporating list comprehension, generator expression, zip, and itertools.zip_longest idioms can save up to 7,000 MB and up to 32.25 seconds. The results open more questions on how they could be utilized in a real-world setting. The replication package includes all scripts, and the results are available at https://doi.org/10.5281/zenodo.5712349

preprint2022arXiv

Giving Back: Contributions Congruent to Library Dependency Changes in a Software Ecosystem

Popular adoption of third-party libraries for contemporary software development has led to the creation of large inter-dependency networks, where sustainability issues of a single library can have widespread network effects. Maintainers of these libraries are often overworked, relying on the contributions of volunteers to sustain these libraries. In this work, we measure contributions that are aligned with dependency changes, to understand where they come from (i.e., non-maintainer, client maintainer, library maintainer, and library and client maintainer), analyze whether they contribute to library dormancy (i.e., a lack of activity), and investigate the similarities between these contributions and developers' typical contributions. Hence, we leverage socio-technical techniques to measure the dependency-contribution congruence (DC congruence), i.e., the degree to which contributions align with dependencies. We conduct a large-scale empirical study to measure the DC congruence for the NPM ecosystem using 1.7 million issues, 970 thousand pull requests (PR), and over 5.3 million commits belonging to 107,242 NPM packages. At the ecosystem level, we pinpoint in time peaks of congruence with dependency changes (i.e., 16% DC congruence score). Surprisingly, these contributions came from the ecosystem itself (i.e., non-maintainers of either client and library). At the project level, we find that DC congruence shares a statistically significant relationship with the likelihood of a package becoming dormant. Finally, by comparing source code of contributions, we find that congruent contributions are statistically different to typical contributions. Our work has implications to encourage and sustain contributions, especially to support library maintainers that require dependency changes.

preprint2022arXiv

Intertwining Ecosystems: A Large Scale Empirical Study of Libraries that Cross Software Ecosystems

An increase in diverse technology stacks and third-party library usage has led developers to inevitably switch technologies. To assist these developers, maintainers have started to release their libraries to multiple technologies, i.e., a cross-ecosystem library. Our goal is to explore the extent to which these cross-ecosystem libraries are intertwined between ecosystems. We perform a large-scale empirical study of 1.1 million libraries from five different software ecosystems, i.e., PyPI for Python, CRAN for R, Maven for Java, RubyGems for Ruby, and NPM for JavaScript to identify 4,146 GitHub projects that release libraries to these five ecosystems. Analyzing their contributions, we first find that a significant majority (median of 37.5%) of contributors of these cross-ecosystem libraries come from a single ecosystem, while also receiving a significant portion of contributions (median of 24.06%) from outside their target ecosystems. We also find that a cross-ecosystem library is written using multiple programming languages. Specifically, three (i.e., PyPI, CRAN, RubyGems) out of the five ecosystems has the majority of source code is written using languages not specific to that ecosystem. As ecosystems become intertwined, this opens up new avenues for research, such as whether or not cross-ecosystem libraries will solve the search for replacement libraries, or how these libraries fit within each ecosystem just to name a few.

preprint2022arXiv

Understanding the Role of External Pull Requests in the NPM Ecosystem

The risk to using third-party libraries in a software application is that much needed maintenance is solely carried out by library maintainers. These libraries may rely on a core team of maintainers (who might be a single maintainer that is unpaid and overworked) to serve a massive client user-base. On the other hand, being open source has the benefit of receiving contributions (in the form of External PRs) to help fix bugs and add new features. In this paper, we investigate the role by which External PRs (contributions from outside the core team of maintainers) contribute to a library. Through a preliminary analysis, we find that External PRs are prevalent, and just as likely to be accepted as maintainer PRs. We find that 26.75% of External PRs submitted fix existing issues. Moreover, fixes also belong to labels such as breaking changes, urgent, and on-hold. Differently from Internal PRs, External PRs cover documentation changes (44 out of 384 PRs), while not having as much refactoring (34 out of 384 PRs). On the other hand, External PRs also cover new features (380 out of 384 PRs) and bugs (120 out of 384). Our results lay the groundwork for understanding how maintainers decide which external contributions they select to evolve their libraries and what role they play in reducing the workload.

preprint2020arXiv

From Academia to Software Development: Publication Citations in Source Code Comments

Academic publications have been evaluated in terms of their impact on research communities based on many metrics, such as the number of citations. On the other hand, the impact of academic publications on industry has been rarely studied. This paper investigates how academic publications contribute to software development by analyzing publication citations in source code comments in open source software repositories. We propose an automated approach for detecting academic publications based on Named Entity Recognition, and achieve 0.90 in $F_1$ as detection accuracy. We conduct a large-scale study of publication citations with 319,438,977 comments collected from 25,925 active repositories written in seven programming languages. Our findings indicate that academic publications can be knowledge sources for software development. These referenced publications are particularly from journals. In terms of knowledge transfer, algorithm is the most prevalent type of knowledge transferred from the publications, with proposed formulas or equations typically implemented in methods or functions in source code files. In a closer look at GitHub repositories referencing academic publications, we find that science-related repositories are the most frequent among GitHub repositories with publication citations, and that the vast majority of these publications are referenced by repository owners who are different from the publication authors. We also find that referencing older publications can lead to potential issues related to obsolete knowledge.

preprint2020arXiv

Predicting Defective Lines Using a Model-Agnostic Technique

Defect prediction models are proposed to help a team prioritize source code areas files that need Software QualityAssurance (SQA) based on the likelihood of having defects. However, developers may waste their unnecessary effort on the whole filewhile only a small fraction of its source code lines are defective. Indeed, we find that as little as 1%-3% of lines of a file are defective. Hence, in this work, we propose a novel framework (called LINE-DP) to identify defective lines using a model-agnostic technique, i.e., an Explainable AI technique that provides information why the model makes such a prediction. Broadly speaking, our LINE-DP first builds a file-level defect model using code token features. Then, our LINE-DP uses a state-of-the-art model-agnostic technique (i.e.,LIME) to identify risky tokens, i.e., code tokens that lead the file-level defect model to predict that the file will be defective. Then, the lines that contain risky tokens are predicted as defective lines. Through a case study of 32 releases of nine Java open source systems, our evaluation results show that our LINE-DP achieves an average recall of 0.61, a false alarm rate of 0.47, a top 20%LOC recall of0.27, and an initial false alarm of 16, which are statistically better than six baseline approaches. Our evaluation shows that our LINE-DP requires an average computation time of 10 seconds including model construction and defective line identification time. In addition, we find that 63% of defective lines that can be identified by our LINE-DP are related to common defects (e.g., argument change, condition change). These results suggest that our LINE-DP can effectively identify defective lines that contain common defectswhile requiring a smaller amount of inspection effort and a manageable computation cost.

Supatsara Wattanakriengkrai

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Does Coding in Pythonic Zen Peak Performance? Preliminary Experiments of Nine Pythonic Idioms at Scale

Giving Back: Contributions Congruent to Library Dependency Changes in a Software Ecosystem

Intertwining Ecosystems: A Large Scale Empirical Study of Libraries that Cross Software Ecosystems

Understanding the Role of External Pull Requests in the NPM Ecosystem

From Academia to Software Development: Publication Citations in Source Code Comments

Predicting Defective Lines Using a Model-Agnostic Technique