Source author record

Fabian Trautsch

Fabian Trautsch appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering

Catalog footprint

What is connected

3works

1topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Problems with SZZ and Features: An empirical study of the state of practice of defect prediction data collection

Context: The SZZ algorithm is the de facto standard for labeling bug fixing commits and finding inducing changes for defect prediction data. Recent research uncovered potential problems in different parts of the SZZ algorithm. Most defect prediction data sets provide only static code metrics as features, while research indicates that other features are also important. Objective: We provide an empirical analysis of the defect labels created with the SZZ algorithm and the impact of commonly used features on results. Method: We used a combination of manual validation and adopted or improved heuristics for the collection of defect data. We conducted an empirical study on 398 releases of 38 Apache projects. Results: We found that only half of the bug fixing commits determined by SZZ are actually bug fixing. If a six-month time frame is used in combination with SZZ to determine which bugs affect a release, one file is incorrectly labeled as defective for every file that is correctly labeled as defective. In addition, two defective files are missed. We also explored the impact of the relatively small set of features that are available in most defect prediction data sets, as there are multiple publications that indicate that, e.g., churn related features are important for defect prediction. We found that the difference of using more features is not significant. Conclusion: Problems with inaccurate defect labels are a severe threat to the validity of the state of the art of defect prediction. Small feature sets seem to be a less severe threat.

preprint2020arXiv

A systematic mapping study of developer social network research

Developer social networks (DSNs) are a tool for the analysis of community structures and collaborations between developers in software projects and software ecosystems. Within this paper, we present the results of a systematic mapping study on the use of DSNs in software engineering research. We identified 255 primary studies on DSNs. We mapped the primary studies to research directions, collected information about the data sources and the size of the studies, and conducted a bibliometric assessment. We found that nearly half of the research investigates the structure of developer communities. Other frequent topics are prediction systems build using DSNs, collaboration behavior between developers, and the roles of developers. Moreover, we determined that many publications use a small sample size regarding the number of projects, which could be problematic for the external validity of the research. Our study uncovered several open issues in the state of the art, e.g., studying inter-company collaborations, using multiple information sources for DSN research, as well as general lack of reporting guidelines or replication studies.

preprint2020arXiv

The SmartSHARK Ecosystem for Software Repository Mining

Software repository mining is the foundation for many empirical software engineering studies. The collection and analysis of detailed data can be challenging, especially if data shall be shared to enable replicable research and open science practices. SmartSHARK is an ecosystem that supports replicable and reproducible research based on software repository mining.