Researcher profile

Christoph Treude

Christoph Treude contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
29works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

29 published item(s)

preprint2026arXiv

Accountable Agents in Software Engineering: An Analysis of Terms of Service and a Research Roadmap

AI coding assistants and autonomous agents are becoming integral to software development workflows, reshaping how code is produced, reviewed, and maintained. While recent research has focused mainly on the capabilities and impacts of productivity of these systems, much less attention has been paid to accountability: who is responsible when agents generate, modify, or recommend code? In practice, accountability is defined through the Terms of Service (ToS) and related policy documents that govern the use of AI-powered development tools. In this vision paper, we present a comparative analysis of the Terms of Service for widely used AI coding assistants and agent-enabled development tools. We examine how these documents allocate ownership, responsibility, liability, and disclosure obligations between tool providers and software developers, and we identify common patterns and divergences between providers. Our analysis reveals a consistent tendency to shift responsibility for correctness, safety, and legal compliance onto users, as well as substantial variation in how providers address issues such as indemnification, data reuse, and acceptable use. Based on these findings, we argue that existing policy frameworks are poorly aligned with increasingly agent-mediated and autonomous software development workflows. We outline a research roadmap for accountable agents in software engineering, identifying challenges and opportunities for modeling responsibility, designing governance artifacts, developing tooling that supports accountability, and conducting empirical studies of developers' perceptions and practices.

preprint2024arXiv

"My GitHub Sponsors profile is live!" Investigating the Impact of Twitter/X Mentions on GitHub Sponsors

GitHub Sponsors was launched in 2019, enabling donations to open-source software developers to provide financial support, as per GitHub's slogan: "Invest in the projects you depend on". However, a 2022 study on GitHub Sponsors found that only two-fifths of developers who were seeking sponsorship received a donation. The study found that, other than internal actions (such as offering perks to sponsors), developers had advertised their GitHub Sponsors profiles on social media, such as Twitter (also known as X). Therefore, in this work, we investigate the impact of tweets that contain links to GitHub Sponsors profiles on sponsorship, as well as their reception on Twitter/X. We further characterize these tweets to understand their context and find that (1) such tweets have the impact of increasing the number of sponsors acquired, (2) compared to other donation platforms such as Open Collective and Patreon, GitHub Sponsors has significantly fewer interactions but is more visible on Twitter/X, and (3) developers tend to contribute more to open-source software during the week of posting such tweets. Our findings are the first step toward investigating the impact of social media on obtaining funding to sustain open-source software.

preprint2024arXiv

Ethical Considerations Towards Protestware

A key drawback to using a Open Source third-party library is the risk of introducing malicious attacks. In recently times, these threats have taken a new form, when maintainers turn their Open Source libraries into protestware. This is defined as software containing political messages delivered through these libraries, which can either be malicious or benign. Since developers are willing to freely open-up their software to these libraries, much trust and responsibility are placed on the maintainers to ensure that the library does what it promises to do. Using different frameworks commonly used in AI ethics, we illustrate how an open-source maintainer's decision to protest is influenced by different stakeholders (viz., their membership in the OSS community, their personal views, financial motivations, social status, and moral viewpoints), making protestware a multifaceted and intricate matter.

preprint2023arXiv

NCQ: Code reuse support for node.js developers

Code reuse is an important part of software development. The adoption of code reuse practices is especially common among Node.js developers. The Node.js package manager, NPM, indexes over 1 Million packages and developers often seek out packages to solve programming tasks. Due to the vast number of packages, selecting the right package is difficult and time consuming. With the goal of improving productivity of developers that heavily reuse code through third-party packages, we present Node Code Query (NCQ), a Read-Eval-Print-Loop environment that allows developers to 1) search for NPM packages using natural language queries, 2) search for code snippets related to those packages, 3) automatically correct errors in these code snippets, 4) quickly setup new environments for testing those snippets, and 5) transition between search and editing modes. In two user studies with a total of 20 participants, we find that participants begin programming faster and conclude tasks faster with NCQ than with baseline approaches, and that they like, among other features, the search for code snippets and packages. Our results suggest that NCQ makes Node.js developers more efficient in reusing code.

preprint2022arXiv

Adding Context to Source Code Representations for Deep Learning

Deep learning models have been successfully applied to a variety of software engineering tasks, such as code classification, summarisation, and bug and vulnerability detection. In order to apply deep learning to these tasks, source code needs to be represented in a format that is suitable for input into the deep learning model. Most approaches to representing source code, such as tokens, abstract syntax trees (ASTs), data flow graphs (DFGs), and control flow graphs (CFGs) only focus on the code itself and do not take into account additional context that could be useful for deep learning models. In this paper, we argue that it is beneficial for deep learning models to have access to additional contextual information about the code being analysed. We present preliminary evidence that encoding context from the call hierarchy along with information from the code itself can improve the performance of a state-of-the-art deep learning model for two software engineering tasks. We outline our research agenda for adding further contextual information to source code representations for deep learning.

preprint2022arXiv

Automatically Categorising GitHub Repositories by Application Domain

GitHub is the largest host of open source software on the Internet. This large, freely accessible database has attracted the attention of practitioners and researchers alike. But as GitHub's growth continues, it is becoming increasingly hard to navigate the plethora of repositories which span a wide range of domains. Past work has shown that taking the application domain into account is crucial for tasks such as predicting the popularity of a repository and reasoning about project quality. In this work, we build on a previously annotated dataset of 5,000 GitHub repositories to design an automated classifier for categorising repositories by their application domain. The classifier uses state-of-the-art natural language processing techniques and machine learning to learn from multiple data sources and catalogue repositories according to five application domains. We contribute with (1) an automated classifier that can assign popular repositories to each application domain with at least 70% precision, (2) an investigation of the approach's performance on less popular repositories, and (3) a practical application of this approach to answer how the adoption of software engineering practices differs across application domains. Our work aims to help the GitHub community identify repositories of interest and opens promising avenues for future work investigating differences between repositories from different application domains.

preprint2022arXiv

DevOps Education: An Interview Study of Challenges and Recommendations

Over the last years, the software industry has adopted several DevOps technologies related to practices such as continuous integration and continuous delivery. The high demand for DevOps practitioners requires non-trivial adjustments in traditional software engineering courses and educational methodologies. This work presents an interview study with 14 DevOps educators from different universities and countries, aiming to identify the main challenges and recommendations for DevOps teaching. Our study identified 83 challenges, 185 recommendations, and several association links and conflicts between them. Our findings can help educators plan, execute and evaluate DevOps courses. They also highlight several opportunities for researchers to propose new methods and tools for teaching DevOps.

preprint2022arXiv

Does This Apply to Me? An Empirical Study of Technical Context in Stack Overflow

Stack Overflow has become an essential technical resource for developers. However, given the vast amount of knowledge available on Stack Overflow, finding the right information that is relevant for a given task is still challenging, especially when a developer is looking for a solution that applies to their specific requirements or technology stack. Clearly marking answers with their technical context, i.e., the information that characterizes the technologies and assumptions needed for this answer, is potentially one way to improve navigation. However, there is no information about how often such context is mentioned, and what kind of information it might offer. In this paper, we conduct an empirical study to understand the occurrence of technical context in Stack Overflow answers and comments, using tags as a proxy for technical context. We specifically focus on additional context, where answers/comments mention information that is not already discussed in the question. Our results show that nearly half of our studied threads contain at least one additional context. We find that almost 50% of the additional context are either a library/framework, a programming language, a tool/application, an API, or a database. Overall, our findings show the promise of using additional context as navigational cues.

preprint2022arXiv

GitHub Sponsors: Exploring a New Way to Contribute to Open Source

GitHub Sponsors, launched in 2019, enables donations to individual open source software (OSS) developers. Financial support for OSS maintainers and developers is a major issue in terms of sustaining OSS projects, and the ability to donate to individuals is expected to support the sustainability of developers, projects, and community. In this work, we conducted a mixed-methods study of GitHub Sponsors, including quantitative and qualitative analyses, to understand the characteristics of developers who are likely to receive donations and what developers think about donations to individuals. We found that: (1) sponsored developers are more active than non-sponsored developers, (2) the possibility to receive donations is related to whether there is someone in their community who is donating, and (3) developers are sponsoring as a new way to contribute to OSS. Our findings are the first step towards data-informed guidance for using GitHub Sponsors, opening up avenues for future work on this new way of financially sustaining the OSS community.

preprint2022arXiv

Giving Back: Contributions Congruent to Library Dependency Changes in a Software Ecosystem

Popular adoption of third-party libraries for contemporary software development has led to the creation of large inter-dependency networks, where sustainability issues of a single library can have widespread network effects. Maintainers of these libraries are often overworked, relying on the contributions of volunteers to sustain these libraries. In this work, we measure contributions that are aligned with dependency changes, to understand where they come from (i.e., non-maintainer, client maintainer, library maintainer, and library and client maintainer), analyze whether they contribute to library dormancy (i.e., a lack of activity), and investigate the similarities between these contributions and developers' typical contributions. Hence, we leverage socio-technical techniques to measure the dependency-contribution congruence (DC congruence), i.e., the degree to which contributions align with dependencies. We conduct a large-scale empirical study to measure the DC congruence for the NPM ecosystem using 1.7 million issues, 970 thousand pull requests (PR), and over 5.3 million commits belonging to 107,242 NPM packages. At the ecosystem level, we pinpoint in time peaks of congruence with dependency changes (i.e., 16% DC congruence score). Surprisingly, these contributions came from the ecosystem itself (i.e., non-maintainers of either client and library). At the project level, we find that DC congruence shares a statistically significant relationship with the likelihood of a package becoming dormant. Finally, by comparing source code of contributions, we find that congruent contributions are statistically different to typical contributions. Our work has implications to encourage and sustain contributions, especially to support library maintainers that require dependency changes.

preprint2022arXiv

In War and Peace: The Impact of World Politics on Software Ecosystems

Reliance on third-party libraries is now commonplace in contemporary software engineering. Being open source in nature, these libraries should advocate for a world where the freedoms and opportunities of open source software can be enjoyed by all. Yet, there is a growing concern related to maintainers using their influence to make political stances (i.e., referred to as protestware). In this paper, we reflect on the impact of world politics on software ecosystems, especially in the context of the ongoing War in Ukraine. We show three cases where world politics has had an impact on a software ecosystem, and how these incidents may result in either benign or malignant consequences. We further point to specific opportunities for research, and conclude with a research agenda with ten research questions to guide future research directions.

preprint2022arXiv

Is Surprisal in Issue Trackers Actionable?

Background. From information theory, surprisal is a measurement of how unexpected an event is. Statistical language models provide a probabilistic approximation of natural languages, and because surprisal is constructed with the probability of an event occuring, it is therefore possible to determine the surprisal associated with English sentences. The issues and pull requests of software repository issue trackers give insight into the development process and likely contain the surprising events of this process. Objective. Prior works have identified that unusual events in software repositories are of interest to developers, and use simple code metrics-based methods for detecting them. In this study we will propose a new method for unusual event detection in software repositories using surprisal. With the ability to find surprising issues and pull requests, we intend to further analyse them to determine if they actually hold importance in a repository, or if they pose a significant challenge to address. If it is possible to find bad surprises early, or before they cause additional troubles, it is plausible that effort, cost and time will be saved as a result. Method. After extracting the issues and pull requests from 5000 of the most popular software repositories on GitHub, we will train a language model to represent these issues. We will measure their perceived importance in the repository, measure their resolution difficulty using several analogues, measure the surprisal of each, and finally generate inferential statistics to describe any correlations.

preprint2022arXiv

Querying Spatial-Temporal-Spectral Data Using a Graphical Query Builder

Constructing complex queries on data which combines spatial, temporal, and spectral aspects is a challenging and error-prone process. Query interfaces of general-purpose database management systems fall short in providing intuitive support for users to effectively and efficiently construct queries. To address this situation, we developed GraphicalQueryBuilder, a tool which provides interactive and immediate feedback during query construction and visually represents search space and queries. A user study with ten professionals showed that users were able to complete queries on average 40% faster and 40 percentage points more accurately compared to a Microsoft Access baseline.

preprint2022arXiv

Self-Adaptive Systems: A Systematic Literature Review Across Categories and Domains

Context: Championed by IBM's vision of autonomic computing paper in 2003, the autonomic computing research field has seen increased research activity over the last 20 years. Several conferences and workshops have been established and have contributed to the autonomic computing knowledge base in search of a new kind of system -- a self-adaptive system (SAS). These systems are characterized by being context-aware and can act on that awareness. The actions carried out could be on the system or on the context (or environment). The underlying goal of a SAS is the sustained achievement of its goals despite changes in its environment. Objective: Despite a number of literature reviews on specific aspects of SASs ranging from their requirements to quality attributes, we lack a systematic understanding of the current state of the art. Method: This paper contributes a systematic literature review into self-adaptive systems using the dblp computer science bibliography as a database. We filtered the records systematically in successive steps to arrive at 293 relevant papers. Each paper was critically analyzed and categorized into an attribute matrix. This matrix consisted of five categories, with each category having multiple attributes. The attributes of each paper, along with the summary of its contents formed the basis of the literature review that spanned 30 years. Results: We characterize the maturation process of the research area from theoretical papers over practical implementations to more holistic and generic approaches, frameworks, and exemplars, applied to areas such as networking, web services, and robotics, with much of the recent work focusing on IoT and IaaS. Conclusion: While there is an ebb and flow of application domains, domains like bio-inspired approaches, security, and cyber physical systems showed promise to grow heading into the 2020s.

preprint2022arXiv

Software Engineering User Study Recruitment on Prolific: An Experience Report

Online participant recruitment platforms such as Prolific have been gaining popularity in research, as they enable researchers to easily access large pools of participants. However, participant quality can be an issue; participants may give incorrect information to gain access to more studies, adding unwanted noise to results. This paper details our experience recruiting participants from Prolific for a user study requiring programming skills in Node.js, with the aim of helping other researchers conduct similar studies. We explore a method of recruiting programmer participants using prescreening validation, attention checks and a series of programming knowledge questions. We received 680 responses, and determined that 55 met the criteria to be invited to our user study. We ultimately conducted user study sessions via video calls with 10 participants. We conclude this paper with a series of recommendations for researchers.

preprint2022arXiv

Taming Multi-Output Recommenders for Software Engineering

Recommender systems are a valuable tool for software engineers. For example, they can provide developers with a ranked list of files likely to contain a bug, or multiple auto-complete suggestions for a given method stub. However, the way these recommender systems interact with developers is often rudimentary -- a long list of recommendations only ranked by the model's confidence. In this vision paper, we lay out our research agenda for re-imagining how recommender systems for software engineering communicate their insights to developers. When issuing recommendations, our aim is to recommend diverse rather than redundant solutions and present them in ways that highlight their differences. We also want to allow for seamless and interactive navigation of suggestions while striving for holistic end-to-end evaluations. By doing so, we believe that recommender systems can play an even more important role in helping developers write better software.

preprint2022arXiv

Understanding the Role of External Pull Requests in the NPM Ecosystem

The risk to using third-party libraries in a software application is that much needed maintenance is solely carried out by library maintainers. These libraries may rely on a core team of maintainers (who might be a single maintainer that is unpaid and overworked) to serve a massive client user-base. On the other hand, being open source has the benefit of receiving contributions (in the form of External PRs) to help fix bugs and add new features. In this paper, we investigate the role by which External PRs (contributions from outside the core team of maintainers) contribute to a library. Through a preliminary analysis, we find that External PRs are prevalent, and just as likely to be accepted as maintainer PRs. We find that 26.75% of External PRs submitted fix existing issues. Moreover, fixes also belong to labels such as breaking changes, urgent, and on-hold. Differently from Internal PRs, External PRs cover documentation changes (44 out of 384 PRs), while not having as much refactoring (34 out of 384 PRs). On the other hand, External PRs also cover new features (380 out of 384 PRs) and bugs (120 out of 384). Our results lay the groundwork for understanding how maintainers decide which external contributions they select to evolve their libraries and what role they play in reducing the workload.

preprint2021arXiv

Automated Query Reformulation for Efficient Search based on Query Logs From Stack Overflow

As a popular Q&A site for programming, Stack Overflow is a treasure for developers. However, the amount of questions and answers on Stack Overflow make it difficult for developers to efficiently locate the information they are looking for. There are two gaps leading to poor search results: the gap between the user's intention and the textual query, and the semantic gap between the query and the post content. Therefore, developers have to constantly reformulate their queries by correcting misspelled words, adding limitations to certain programming languages or platforms, etc. As query reformulation is tedious for developers, especially for novices, we propose an automated software-specific query reformulation approach based on deep learning. With query logs provided by Stack Overflow, we construct a large-scale query reformulation corpus, including the original queries and corresponding reformulated ones. Our approach trains a Transformer model that can automatically generate candidate reformulated queries when given the user's original query. The evaluation results show that our approach outperforms five state-of-the-art baselines, and achieves a 5.6% to 33.5% boost in terms of $\mathit{ExactMatch}$ and a 4.8% to 14.4% boost in terms of $\mathit{GLEU}$.

preprint2021arXiv

Empirical Standards for Software Engineering Research

Empirical Standards are natural-language models of a scientific community's expectations for a specific kind of study (e.g. a questionnaire survey). The ACM SIGSOFT Paper and Peer Review Quality Initiative generated empirical standards for research methods commonly used in software engineering. These living documents, which should be continuously revised to reflect evolving consensus around research best practices, will improve research quality and make peer review more effective, reliable, transparent and fair.

preprint2021arXiv

Same File, Different Changes: The Potential of Meta-Maintenance on GitHub

Online collaboration platforms such as GitHub have provided software developers with the ability to easily reuse and share code between repositories. With clone-and-own and forking becoming prevalent, maintaining these shared files is important, especially for keeping the most up-to-date version of reused code. Different to related work, we propose the concept of meta-maintenance -- i.e., tracking how the same files evolve in different repositories with the aim to provide useful maintenance opportunities to those files. We conduct an exploratory study by analyzing repositories from seven different programming languages to explore the potential of meta-maintenance. Our results indicate that a majority of active repositories on GitHub contains at least one file which is also present in another repository, and that a significant minority of these files are maintained differently in the different repositories which contain them. We manually analyzed a representative sample of shared files and their variants to understand which changes might be useful for meta-maintenance. Our findings support the potential of meta-maintenance and open up avenues for future work to capitalize on this potential.

preprint2020arXiv

Beyond Accuracy: Assessing Software Documentation Quality

Good software documentation encourages good software engineering, but the meaning of "good" documentation is vaguely defined in the software engineering literature. To clarify this ambiguity, we draw on work from the data and information quality community to propose a framework that decomposes documentation quality into ten dimensions of structure, content, and style. To demonstrate its application, we recruited technical editors to apply the framework when evaluating examples from several genres of software documentation. We summarise their assessments -- for example, reference documentation and README files excel in quality whereas blog articles have more problems -- and we describe our vision for reasoning about software documentation quality and for the expansion and potential of a unified quality framework.

preprint2020arXiv

Code Duplication on Stack Overflow

Despite the unarguable importance of Stack Overflow (SO) for the daily work of many software developers and despite existing knowledge about the impact of code duplication on software maintainability, the prevalence and implications of code clones on SO have not yet received the attention they deserve. In this paper, we motivate why studies on code duplication within SO are needed and how existing studies on code reuse differ from this new research direction. We present similarities and differences between code clones in general and code clones on SO and point to open questions that need to be addressed to be able to make data-informed decisions about how to properly handle clones on this important platform. We present results from a first preliminary investigation, indicating that clones on SO are common and diverse. We further point to specific challenges, including incentives for users to clone successful answers and difficulties with bulk edits on the platform, and conclude with possible directions for future work.

preprint2020arXiv

Contextual Documentation Referencing on Stack Overflow

Software engineering is knowledge-intensive and requires software developers to continually search for knowledge, often on community question answering platforms such as Stack Overflow. Such information sharing platforms do not exist in isolation, and part of the evidence that they exist in a broader software documentation ecosystem is the common presence of hyperlinks to other documentation resources found in forum posts. With the goal of helping to improve the information diffusion between Stack Overflow and other documentation resources, we conducted a study to answer the question of how and why documentation is referenced in Stack Overflow threads. We sampled and classified 759 links from two different domains, regular expressions and Android development, to qualitatively and quantitatively analyze the links' context and purpose, including attribution, awareness, and recommendations. We found that links on Stack Overflow serve a wide range of distinct purposes, ranging from citation links attributing content copied into Stack Overflow, over links clarifying concepts using Wikipedia pages, to recommendations of software components and resources for background reading. This purpose spectrum has major corollaries, including our observation that links to documentation resources are a reflection of the information needs typical to a technology domain. We contribute a framework and method to analyze the context and purpose of Stack Overflow links, a public dataset of annotated links, and a description of five major observations about linking practices on Stack Overflow. We further point to potential tool support to enhance the information diffusion between Stack Overflow and other documentation resources.

preprint2020arXiv

From Academia to Software Development: Publication Citations in Source Code Comments

Academic publications have been evaluated in terms of their impact on research communities based on many metrics, such as the number of citations. On the other hand, the impact of academic publications on industry has been rarely studied. This paper investigates how academic publications contribute to software development by analyzing publication citations in source code comments in open source software repositories. We propose an automated approach for detecting academic publications based on Named Entity Recognition, and achieve 0.90 in $F_1$ as detection accuracy. We conduct a large-scale study of publication citations with 319,438,977 comments collected from 25,925 active repositories written in seven programming languages. Our findings indicate that academic publications can be knowledge sources for software development. These referenced publications are particularly from journals. In terms of knowledge transfer, algorithm is the most prevalent type of knowledge transferred from the publications, with proposed formulas or equations typically implemented in methods or functions in source code files. In a closer look at GitHub repositories referencing academic publications, we find that science-related repositories are the most frequent among GitHub repositories with publication citations, and that the vast majority of these publications are referenced by repository owners who are different from the publication authors. We also find that referencing older publications can lead to potential issues related to obsolete knowledge.

preprint2020arXiv

Human-Like Summaries from Heterogeneous and Time-Windowed Software Development Artefacts

Automatic text summarisation has drawn considerable interest in the area of software engineering. It is challenging to summarise the activities related to a software project, (1) because of the volume and heterogeneity of involved software artefacts, and (2) because it is unclear what information a developer seeks in such a multi-document summary. We present the first framework for summarising multi-document software artefacts containing heterogeneous data within a given time frame. To produce human-like summaries, we employ a range of iterative heuristics to minimise the cosine-similarity between texts and high-dimensional feature vectors. A first study shows that users find the automatically generated summaries the most useful when they are generated using word similarity and based on the eight most relevant software artefacts.

preprint2020arXiv

Optimising the Fit of Stack Overflow Code Snippets into Existing Code

Software developers often reuse code from online sources such as Stack Overflow within their projects. However, the process of searching for code snippets and integrating them within existing source code can be tedious. In order to improve efficiency and reduce time spent on code reuse, we present an automated code reuse tool for the Eclipse IDE (Integrated Developer Environment), NLP2TestableCode. NLP2TestableCode can not only search for Java code snippets using natural language tasks, but also evaluate code snippets based on a user's existing code, modify snippets to improve fit and correct errors, before presenting the user with the best snippet, all without leaving the editor. NLP2TestableCode also includes functionality to automatically generate customisable test cases and suggest argument and return types, in order to further evaluate code snippets. In evaluation, NLP2TestableCode was capable of finding compilable code snippets for 82.9% of tasks, and testable code snippets for 42.9%.

preprint2020arXiv

Selecting third-party libraries: The practitioners' perspective

The selection of third-party libraries is an essential element of virtually any software development project. However, deciding which libraries to choose is a challenging practical problem. Selecting the wrong library can severely impact a software project in terms of cost, time, and development effort, with the severity of the impact depending on the role of the library in the software architecture, among others. Despite the importance of following a careful library selection process, in practice, the selection of third-party libraries is still conducted in an ad-hoc manner, where dozens of factors play an influential role in the decision. In this paper, we study the factors that influence the selection process of libraries, as perceived by industry developers. To that aim, we perform a cross-sectional interview study with 16 developers from 11 different businesses and survey 115 developers that are involved in the selection of libraries. We systematically devised a comprehensive set of 26 technical, human, and economic factors that developers take into consideration when selecting a software library. Eight of these factors are new to the literature. We explain each of these factors and how they play a role in the decision. Finally, we discuss the implications of our work to library maintainers, potential library users, package manager developers, and empirical software engineering researchers.

preprint2020arXiv

The Impact of Dynamics of Collaborative Software Engineering on Introverts: A Study Protocol

Background: Collaboration among software engineers through face-to-face discussions in teams has been promoted since the adoption of agile methods. However, these discussions might demote the contribution of software engineers who are introverts, possibly leading to sub-optimal solutions and creating work environments that benefit extroverts. Objective: We aim to evaluate whether providing software engineers with time to work individually and reason about a collective problem is a setting that makes introverts more comfortable to interact and contribute more, ultimately leading to better solutions. Method: We plan to conduct a between-subjects study, with teams in a control group that design a software architecture in a team discussion meeting and teams in a treatment group in which subjects work individually before engaging in a meeting. We will assess and compare the amount of contribution of introverts, their subjective experiences, and the designed solutions. Limitations: As extroverts will be present in both groups, we will not be able to conclude that better solutions are solely due to the increased participation of introverts. The analyses of their subjective experience and amount of contributions might provide evidence to suggest the reasons for observed differences.

preprint2019arXiv

Essential Sentences for Navigating Stack Overflow Answers

Stack Overflow (SO) has become an essential resource for software development. Despite its success and prevalence, navigating SO remains a challenge. Ideally, SO users could benefit from highlighted navigational cues that help them decide if an answer is relevant to their task and context. Such navigational cues could be in the form of essential sentences that help the searcher decide whether they want to read the answer or skip over it. In this paper, we compare four potential approaches for identifying essential sentences. We adopt two existing approaches and develop two new approaches based on the idea that contextual information in a sentence (e.g., "if using windows") could help identify essential sentences. We compare the four techniques using a survey of 43 participants. Our participants indicate that it is not always easy to figure out what the best solution for their specific problem is, given the options, and that they would indeed like to easily spot contextual information that may narrow down the search. Our quantitative comparison of the techniques shows that there is no single technique sufficient for identifying essential sentences that can serve as navigational cues, while our qualitative analysis shows that participants valued explanations and specific conditions, and did not value filler sentences or speculations. Our work sheds light on the importance of navigational cues, and our findings can be used to guide future research to find the best combination of techniques to identify such cues.