Source author record

Jürgen Börstler

Jürgen Börstler appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering

Catalog footprint

What is connected

2works

1topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

An Empirical Study on the Effectiveness of Data Resampling Approaches for Cross-Project Software Defect Prediction

Crossp-roject defect prediction (CPDP), where data from different software projects are used to predict defects, has been proposed as a way to provide data for software projects that lack historical data. Evaluations of CPDP models using the Nearest Neighbour (NN) Filter approach have shown promising results in recent studies. A key challenge with defect-prediction datasets is class imbalance, that is highly skewed datasets where non buggy modules dominate the buggy modules. In the past, data resampling approaches have been applied to within-projects defect prediction models to help alleviate the negative effects of class imbalance in the datasets. To address the class imbalance issue in CPDP, the authors assess the impact of data resampling approaches on CPDP models after the NN Filter is applied. The impact on prediction performance of five oversampling approaches (MAHAKIL, SMOTE, Borderline-SMOTE, Random Oversampling, and ADASYN) and three undersampling approaches (Random Undersampling, Tomek Links, and Onesided selection) is investigated and results are compared to approaches without data resampling. The authors' examined six defect prediction models on 34 datasets extracted from the PROMISE repository. The authors results show that there is a significant positive effect of data resampling on CPDP performance, suggesting that software quality teams and researchers should consider applying data resampling approaches for improved recall (pd) and g-measure prediction performance. However if the goal is to improve precision and reduce false alarm (pf) then data resampling approaches should be avoided.

preprint2022arXiv

Lessons learned from replicating a study on information-retrieval based test case prioritization

Objective: In this study, we aim to replicate an artefact-based study on software testing to address the gap. We focus on (a) providing a step by step guide of the replication, reflecting on challenges when replicating artefact-based testing research, (b) Evaluating the replicated study concerning its validity and robustness of the findings. Method: We replicate a test case prioritization technique by Kwon et al. We replicated the original study using four programs, two from the original study and two new programs. The replication study was implemented using Python to support future replications. Results: Various general factors facilitating replications are identified, such as: (1) the importance of documentation; (2) the need of assistance from the original authors; (3) issues in the maintenance of open source repositories (e.g., concerning needed software dependencies); (4) availability of scripts. We also raised several observations specific to the study and its context, such as insights from using different mutation tools and strategies for mutant generation. Conclusion: We conclude that the study by Kwon et al. is replicable for small and medium programs and could be automated to facilitate software practitioners, given the availability of required information.

Jürgen Börstler

What is connected

Connect this record

See the researcher in context

Building this map preview

2 published item(s)

An Empirical Study on the Effectiveness of Data Resampling Approaches for Cross-Project Software Defect Prediction

Lessons learned from replicating a study on information-retrieval based test case prioritization