Source author record

Gustavo Pinto

Gustavo Pinto appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering Human-Computer Interaction Machine Learning

Catalog footprint

What is connected

9works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Lessons from Building StackSpot AI: A Contextualized AI Coding Assistant

With their exceptional natural language processing capabilities, tools based on Large Language Models (LLMs) like ChatGPT and Co-Pilot have swiftly become indispensable resources in the software developer's toolkit. While recent studies suggest the potential productivity gains these tools can unlock, users still encounter drawbacks, such as generic or incorrect answers. Additionally, the pursuit of improved responses often leads to extensive prompt engineering efforts, diverting valuable time from writing code that delivers actual value. To address these challenges, a new breed of tools, built atop LLMs, is emerging. These tools aim to mitigate drawbacks by employing techniques like fine-tuning or enriching user prompts with contextualized information. In this paper, we delve into the lessons learned by a software development team venturing into the creation of such a contextualized LLM-based application, using retrieval-based techniques, called CodeBuddy. Over a four-month period, the team, despite lacking prior professional experience in LLM-based applications, built the product from scratch. Following the initial product release, we engaged with the development team responsible for the code generative components. Through interviews and analysis of the application's issue tracker, we uncover various intriguing challenges that teams working on LLM-based applications might encounter. For instance, we found three main group of lessons: LLM-based lessons, User-based lessons, and Technical lessons. By understanding these lessons, software development teams could become better prepared to build LLM-based applications.

preprint2022arXiv

Automatically Categorising GitHub Repositories by Application Domain

GitHub is the largest host of open source software on the Internet. This large, freely accessible database has attracted the attention of practitioners and researchers alike. But as GitHub's growth continues, it is becoming increasingly hard to navigate the plethora of repositories which span a wide range of domains. Past work has shown that taking the application domain into account is crucial for tasks such as predicting the popularity of a repository and reasoning about project quality. In this work, we build on a previously annotated dataset of 5,000 GitHub repositories to design an automated classifier for categorising repositories by their application domain. The classifier uses state-of-the-art natural language processing techniques and machine learning to learn from multiple data sources and catalogue repositories according to five application domains. We contribute with (1) an automated classifier that can assign popular repositories to each application domain with at least 70% precision, (2) an investigation of the approach's performance on less popular repositories, and (3) a practical application of this approach to answer how the adoption of software engineering practices differs across application domains. Our work aims to help the GitHub community identify repositories of interest and opens promising avenues for future work investigating differences between repositories from different application domains.

preprint2022arXiv

Socio-technical constraints and affordances of virtual collaboration -- A study of four online hackathons

Hackathons and similar time-bounded events have become a popular form of collaboration. They are commonly organized as in-person events during which teams engage in intense collaboration over a short period of time to complete a project that is of interest to them. Most research to date has focused on studying how teams collaborate in a co-located setting, pointing towards the advantages of radical co-location. The global pandemic of 2020, however, has led to many hackathons moving online, which challenges our current understanding of how they function. In this paper, we address this gap by presenting findings from a multiple-case study of 10 hackathon teams that participated in 4 hackathons across two continents. By analyzing the collected data, we found that teams merged synchronous and asynchronous means of communication to maintain a common understanding of work progress as well as to maintain awareness of each other's tasks. Task division was self-assigned based on individual skills or interests, while leaders emerged from different strategies (e.g., participant experience, the responsibility of registering the team in an event). Some of the affordances of in-person hackathons, such as the radical co-location of team members, could be partially reproduced in teams that kept synchronous communication channels while working (i.e., shared audio territories), in a sort of "radical virtual co-location". However, others, such as interactions with other teams, easy access to mentors, and networking with other participants, decreased. In addition, the technical constraints of the different communication tools and platforms brought technical problems and were overwhelming to participants. Our work contributes to understanding the virtual collaboration of small teams in the context of online hackathons and how technologies and event structures proposed by organizers imply this collaboration.

preprint2022arXiv

To What Extent Cognitive-Driven Development Improves Code Readability?

Cognitive-Driven Development (CDD) is a coding design technique that aims to reduce the cognitive effort that developers place in understanding a given code unit (e.g., a class). By following CDD design practices, it is expected that the coding units to be smaller, and, thus, easier to maintain and evolve. However, it is so far unknown whether these smaller code units coded using CDD standards are, indeed, easier to understand. In this work we aim to assess to what CDD improves code readability. To achieve this goal, we conducted a two-phase study. We start by inviting professional software developers to vote (and justify their rationale) on the most readable pair of code snippets (from a set of 10 pairs); one of the pairs was coded using CDD practices. We received 133 answers. In the second phase, we applied the state-of-the art readability model on the 10-pairs of CDD-guided refactorings. We observed some conflicting results. On the one hand, developers perceived that seven (out of 10) CDD-guided refactorings were more readable than their counterparts; for two other CDD-guided refactorings, developers were undecided, while only in one of the CDD-guided refactorings, developers preferred the original code snippet. On the other hand, we noticed that only one CDD-guided refactorings have better performance readability, assessed by state-of-the-art readability models. Our results provide initial evidence that CDD could be an interesting approach for software design.

preprint2020arXiv

Analyzing the evolution and diversity of SBES Program Committee

The Brazilian Symposium on Software Engineering (SBES) is one of the most important Latin American Software Engineering conferences. It was first held in 1987, and in 2019 marks its 33rd edition. Over these years, many researchers have participated in SBES, attending the conference, submitting, and reviewing papers. The researchers who participate in the Program Committee (PC) and perform the reviewers' role are fundamentally important to SBES, since their evaluations (e.g., deciding whether a paper is accepted or not) have the potential of drawing what SBES is now. Knowing that diversity is an important aspect of any group work, we wanted to understand diversity in the SBES PC community. We investigated a number of characteristics of SBES PC members, including their gender and geographic location. We also analyzed the turnover and renovation of the committee. Among the findings, we observed that although the number of participants in the SBES PC has increased over the years, most of them are men (~80%) and from the Southeast and Northeast of Brazil, with very few members from the North region. We also observed that there is a small turnover: during the 2010 decade, only 11% of new members were added to the PC. Finally, we investigated the participation of the PC members publishing papers at SBES. We observed that only 24% of the papers accepted to SBES were authored by members who were not committee members of the respective year. Moreover, committee members usually do not collaborate among themselves: a significant number of the papers are authored by the PC members and students. This paper may contribute to the SBES community, in particular, its special interest group, in understanding the needs and challenges of the PC's participants.

preprint2020arXiv

Characterizing the Roles of Contributors in Open-source Scientific Software Projects

The development of scientific software is, more than ever, critical to the practice of science, and this is accompanied by a trend towards more open and collaborative efforts. Unfortunately, there has been little investigation into who is driving the evolution of such scientific software or how the collaboration happens. In this paper, we address this problem. We present an extensive analysis of seven open-source scientific software projects in order to develop an empirically-informed model of the development process. This analysis was complemented by a survey of 72 scientific software developers. In the majority of the projects, we found senior research staff (e.g. professors) to be responsible for half or more of commits (an average commit share of 72%) and heavily involved in architectural concerns (seniors were more likely to interact with files related to the build system, project meta-data, and developer documentation). Juniors (e.g.graduate students) also contribute substantially -- in one studied project, juniors made almost 100% of its commits. Still, graduate students had the longest contribution periods among juniors (with 1.72 years of commit activity compared to 0.98 years for postdocs and 4 months for undergraduates). Moreover, we also found that third-party contributors are scarce, contributing for just one day for the project. The results from this study aim to help scientists to better understand their own projects, communities, and the contributors' behavior, while paving the road for future software engineering research

preprint2020arXiv

On the Use of Grey Literature: A Survey with the Brazilian Software Engineering Research Community

Background: The use of Grey Literature (GL) has been investigate in diverse research areas. In Software Engineering (SE), this topic has an increasing interest over the last years. Problem: Even with the increase of GL published in diverse sources, the understanding of their use on the SE research community is still controversial. Objective: To understand how Brazilian SE researchers use GL, we aimed to become aware of the criteria to assess the credibility of their use, as well as the benefits and challenges. Method: We surveyed 76 active SE researchers participants of a flagship SE conference in Brazil, using a questionnaire with 11 questions to share their views on the use of GL in the context of SE research. We followed a qualitative approach to analyze open questions. Results: We found that most surveyed researchers use GL mainly to understand new topics. Our work identified new findings, including: 1) GL sources used by SE researchers (e.g., blogs, community website); 2) motivations to use (e.g., to understand problems and to complement research findings) or reasons to avoid GL (e.g., lack of reliability, lack of scientific value); 3) the benefit that is easy to access and read GL and the challenge of GL to have its scientific value recognized; and 4) criteria to assess GL credibility, showing the importance of the content owner to be renowned (e.g., renowned author and institutions). Conclusions: Our findings contribute to form a body of knowledge on the use of GL by SE researchers, by discussing novel (some contradictory) results and providing a set of lessons learned to both SE researchers and practitioners.

preprint2020arXiv

Rapid Reviews in Software Engineering

Integrating research evidence into practice is one of the main goals of Evidence-Based Software Engineering (EBSE). Secondary studies, one of the main EBSE products, are intended to summarize the best research evidence and make them easily consumable by practitioners. However, recent studies show that some secondary studies lack connections with software engineering practice. In this chapter, we present the concept of Rapid Reviews, which are lightweight secondary studies focused on delivering evidence to practitioners in a timely manner. Rapid reviews support practitioners in their decision-making, and should be conducted bounded to a practical problem, inserted into a practical context. Thus, Rapid Reviews can be easily integrated in a knowledge/technology transfer initiative. After describing the basic concepts, we present the results and experiences of conducting two Rapid Reviews. We also provide guidelines to help researchers and practitioners who want to conduct Rapid Reviews, and we finally discuss topics that my concern the research community about the feasibility of Rapid Reviews as an Evidence-Based method. In conclusion, we believe Rapid Reviews might interest researchers and practitioners working in the intersection between software engineering research and practice.

preprint2020arXiv

Work Practices and Perceptions from Women Core Developers in OSS Communities

The effect of gender diversity in open source communities has gained increasing attention from practitioners and researchers. For instance, organizations such as the Python Software Foundation and the OpenStack Foundation started actions to increase gender diversity and promote women to top positions in the communities. Although the general underrepresentation of women (a.k.a. horizontal segregation) in open source communities has been explored in a number of research studies, little is known about the vertical segregation in open source communities -- which occurs when there are fewer women in high-level positions. To address this research gap, in this paper we present the results of a mixed-methods study on gender diversity and work practices of core developers contributing to open-source communities. In the first study, we used mining-software repositories procedures to identify the core developers of 711 open source projects, in order to understand how common are women core developers in open source communities and characterize their work practices. In the second study, we surveyed the women core developers we identified in the first study to collect their perceptions of gender diversity and gender bias they might have observed while contributing to open source systems. Our findings show that open source communities present both horizontal and vertical segregation (only 2.3% of the core developers are women). Nevertheless, differently from previous studies, most of the women core developers (65.7%) report never having experienced gender discrimination when contributing to an open source project. Finally, we did not note substantial differences between the work practices among women and men core developers. We reflect on these findings and present some ideas that might increase the participation of women in open source communities.

Gustavo Pinto

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Lessons from Building StackSpot AI: A Contextualized AI Coding Assistant

Automatically Categorising GitHub Repositories by Application Domain

Socio-technical constraints and affordances of virtual collaboration -- A study of four online hackathons

To What Extent Cognitive-Driven Development Improves Code Readability?

Analyzing the evolution and diversity of SBES Program Committee

Characterizing the Roles of Contributors in Open-source Scientific Software Projects

On the Use of Grey Literature: A Survey with the Brazilian Software Engineering Research Community

Rapid Reviews in Software Engineering

Work Practices and Perceptions from Women Core Developers in OSS Communities