Source author record

Filippo Lanubile

Filippo Lanubile appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering Machine Learning Artificial Intelligence Computation and Language Human-Computer Interaction

Catalog footprint

What is connected

9works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A large-scale, in-depth analysis of developers' personalities in the Apache ecosystem

Context: Large-scale distributed projects are typically the results of collective efforts performed by multiple developers with heterogeneous personalities. Objective: We aim to find evidence that personalities can explain developers' behavior in large scale-distributed projects. For example, the propensity to trust others - a critical factor for the success of global software engineering - has been found to influence positively the result of code reviews in distributed projects. Method: In this paper, we perform a quantitative analysis of ecosystem-level data from the code commits and email messages contributed by the developers working on the Apache Software Foundation (ASF) projects, as representative of large scale-distributed projects. Results: We find that there are three common types of personality profiles among Apache developers, characterized in particular by their level of Agreeableness and Neuroticism. We also confirm that developers' personality is stable over time. Moreover, personality traits do not vary with their role, membership, and extent of contribution to the projects. We also find evidence that more open developers are more likely to make contributors to Apache projects. Conclusion: Overall, our findings reinforce the need for future studies on human factors in software engineering to use psychometric tools to control for differences in developers' personalities.

preprint2022arXiv

Eliciting Best Practices for Collaboration with Computational Notebooks

Despite the widespread adoption of computational notebooks, little is known about best practices for their usage in collaborative contexts. In this paper, we fill this gap by eliciting a catalog of best practices for collaborative data science with computational notebooks. With this aim, we first look for best practices through a multivocal literature review. Then, we conduct interviews with professional data scientists to assess their awareness of these best practices. Finally, we assess the adoption of best practices through the analysis of 1,380 Jupyter notebooks retrieved from the Kaggle platform. Findings reveal that experts are mostly aware of the best practices and tend to adopt them in their daily work. Nonetheless, they do not consistently follow all the recommendations as, depending on specific contexts, some are deemed unfeasible or counterproductive due to the lack of proper tool support. As such, we envision the design of notebook solutions that allow data scientists not to have to prioritize exploration and rapid prototyping over writing code of quality.

preprint2022arXiv

Pynblint: a Static Analyzer for Python Jupyter Notebooks

Jupyter Notebook is the tool of choice of many data scientists in the early stages of ML workflows. The notebook format, however, has been criticized for inducing bad programming practices; indeed, researchers have already shown that open-source repositories are inundated by poor-quality notebooks. Low-quality output from the prototypical stages of ML workflows constitutes a clear bottleneck towards the productization of ML models. To foster the creation of better notebooks, we developed Pynblint, a static analyzer for Jupyter notebooks written in Python. The tool checks the compliance of notebooks (and surrounding repositories) with a set of empirically validated best practices and provides targeted recommendations when violations are detected.

preprint2021arXiv

An in-depth Analysis of Occasional and Recurring Collaborations in Online Music Co-creation

The success of online creative communities depends on the will of participants to create and derive content in a collaborative environment. Despite their growing popularity, the factors that lead to remixing existing content in online creative communities are not entirely understood. In this paper, we focus on overdubbing, that is, a dyadic collaboration where one author mixes one new track with an audio recording previously uploaded by another. We study musicians who collaborate regularly, that is, frequently overdub each other's songs. Building on frequent pattern mining techniques, we develop an approach to seek instances of such recurring collaborations in the Songtree community. We identify 43 instances involving two or three members with a similar reputation in the community. Our findings highlight common and different remix factors in occasional and recurring collaborations. Specifically, fresh and less mature songs are generally overdubbed more; instead, exchanging messages and invitations to collaborate are significant factors only for songs generated through recurring collaborations whereas author reputation (ranking) and applying metadata tags to songs have a positive effect only in occasional collaborations.

preprint2021arXiv

Using Personality Detection Tools for Software Engineering Research: How Far Can We Go?

Assessing the personality of software engineers may help to match individual traits with the characteristics of development activities such as code review and testing, as well as support managers in team composition. However, self-assessment questionnaires are not a practical solution for collecting multiple observations on a large scale. Instead, automatic personality detection, while overcoming these limitations, is based on off-the-shelf solutions trained on non-technical corpora, which might not be readily applicable to technical domains like Software Engineering (SE). In this paper, we first assess the performance of general-purpose personality detection tools when applied to a technical corpus of developers' emails retrieved from the public archives of the Apache Software Foundation. We observe a general low accuracy of predictions and an overall disagreement among the tools. Second, we replicate two previous research studies in SE by replacing the personality detection tool used to infer developers' personalities from pull-request discussions and emails. We observe that the original results are not confirmed, i.e., changing the tool used in the original study leads to diverging conclusions. Our results suggest a need for personality detection tools specially targeted for the software engineering domain.

preprint2021arXiv

Will You Come Back to Contribute? Investigating the Inactivity of OSS Core Developers in GitHub

Several Open Source Software (OSS) projects depend on the continuity of their development communities to remain sustainable. Understanding how developers become inactive or why they take breaks can help communities prevent abandonment and incentivize developers to come back. In this paper, we propose a novel method to identify developers' inactive periods by analyzing the individual rhythm of contributions to the projects. Using this method, we quantitatively analyze the inactivity of core developers in 18 OSS organizations hosted on GitHub. We also survey core developers to receive their feedback about the identified breaks and transitions. Our results show that our method was effective for identifying developers' breaks. About 94% of the surveyed core developers agreed with our state model of inactivity; 71% and 79% of them acknowledged their breaks and state transition, respectively. We also show that all core developers take breaks (at least once) and about a half of them (~45%}) have completely disengaged from a project for at least one year. We also analyzed the probability of transitions to/from inactivity and found that developers who pause their activity have a ~35-55\% chance to return to an active state; yet, if the break lasts for a year or longer, then the probability of resuming activities drops to ~21-26%, with a ~54% chance of complete disengagement. These results may support the creation of policies and mechanisms to make OSS community managers aware of breaks and potential project abandonment.

preprint2020arXiv

A Case Study on Tool Support for Collaboration in Agile Development

We report on a longitudinal case study conducted at the Italian site of a large software company to further our understanding of how development and communication tools can be improved to better support agile practices and collaboration. After observing inconsistencies in the way communication tools (i.e., email, Skype, and Slack) were used, we first reinforced the use of Slack as the central hub for internal communication, while setting clear rules regarding tools usage. As a second main change, we refactored the Jira Scrum board into two separate boards, a detailed one for developers and a high-level one for managers, while also introducing automation rules and the integration with Slack. The first change revealed that the teams of developers used and appreciated Slack differently with the QA team being the most favorable and that the use of channels is hindered by automatic notifications from development tools (e.g., Jenkins). The findings from the second change show that 85\% of the interviewees reported perceived improvements in their workflow. Despite the limitations due to the single nature of the reported case, we highlight the importance for companies to reflect on how to properly set up their agile work environment to improve communication and facilitate collaboration.

preprint2020arXiv

Can We Use SE-specific Sentiment Analysis Tools in a Cross-Platform Setting?

In this paper, we address the problem of using sentiment analysis tools 'off-the-shelf,' that is when a gold standard is not available for retraining. We evaluate the performance of four SE-specific tools in a cross-platform setting, i.e., on a test set collected from data sources different from the one used for training. We find that (i) the lexicon-based tools outperform the supervised approaches retrained in a cross-platform setting and (ii) retraining can be beneficial in within-platform settings in the presence of robust gold standard datasets, even using a minimal training set. Based on our empirical findings, we derive guidelines for reliable use of sentiment analysis tools in software engineering.

preprint2020arXiv

Love, Joy, Anger, Sadness, Fear, and Surprise: SE Needs Special Kinds of AI: A Case Study on Text Mining and SE

Do you like your code? What kind of code makes developers happiest? What makes them angriest? Is it possible to monitor the mood of a large team of coders to determine when and where a codebase needs additional help?

Filippo Lanubile

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

A large-scale, in-depth analysis of developers' personalities in the Apache ecosystem

Eliciting Best Practices for Collaboration with Computational Notebooks

Pynblint: a Static Analyzer for Python Jupyter Notebooks

An in-depth Analysis of Occasional and Recurring Collaborations in Online Music Co-creation

Using Personality Detection Tools for Software Engineering Research: How Far Can We Go?

Will You Come Back to Contribute? Investigating the Inactivity of OSS Core Developers in GitHub

A Case Study on Tool Support for Collaboration in Agile Development

Can We Use SE-specific Sentiment Analysis Tools in a Cross-Platform Setting?

Love, Joy, Anger, Sadness, Fear, and Surprise: SE Needs Special Kinds of AI: A Case Study on Text Mining and SE