Researcher profile

Giuliano Antoniol

Giuliano Antoniol contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2023arXiv

AmbieGen: A Search-based Framework for Autonomous Systems Testing

Thorough testing of safety-critical autonomous systems, such as self-driving cars, autonomous robots, and drones, is essential for detecting potential failures before deployment. One crucial testing stage is model-in-the-loop testing, where the system model is evaluated by executing various scenarios in a simulator. However, the search space of possible parameters defining these test scenarios is vast, and simulating all combinations is computationally infeasible. To address this challenge, we introduce AmbieGen, a search-based test case generation framework for autonomous systems. AmbieGen uses evolutionary search to identify the most critical scenarios for a given system, and has a modular architecture that allows for the addition of new systems under test, algorithms, and search operators. Currently, AmbieGen supports test case generation for autonomous robots and autonomous car lane keeping assist systems. In this paper, we provide a high-level overview of the framework's architecture and demonstrate its practical use cases.

preprint2022arXiv

A Probabilistic Framework for Mutation Testing in Deep Neural Networks

Context: Mutation Testing (MT) is an important tool in traditional Software Engineering (SE) white-box testing. It aims to artificially inject faults in a system to evaluate a test suite's capability to detect them, assuming that the test suite defects finding capability will then translate to real faults. If MT has long been used in SE, it is only recently that it started gaining the attention of the Deep Learning (DL) community, with researchers adapting it to improve the testability of DL models and improve the trustworthiness of DL systems. Objective: If several techniques have been proposed for MT, most of them neglected the stochasticity inherent to DL resulting from the training phase. Even the latest MT approaches in DL, which propose to tackle MT through a statistical approach, might give inconsistent results. Indeed, as their statistic is based on a fixed set of sampled training instances, it can lead to different results across instances set when results should be consistent for any instance. Methods: In this work, we propose a Probabilistic Mutation Testing (PMT) approach that alleviates the inconsistency problem and allows for a more consistent decision on whether a mutant is killed or not. Results: We show that PMT effectively allows a more consistent and informed decision on mutations through evaluation using three models and eight mutation operators used in previously proposed MT methods. We also analyze the trade-off between the approximation error and the cost of our method, showing that relatively small error can be achieved for a manageable cost. Conclusion: Our results showed the limitation of current MT practices in DNN and the need to rethink them. We believe PMT is the first step in that direction which effectively removes the lack of consistency across test executions of previous methods caused by the stochasticity of DNN training.

preprint2022arXiv

Data-access performance anti-patterns in data-intensive systems

Data-intensive systems handle variable, high volume, and high-velocity data generated by human and digital devices. Like traditional software, data-intensive systems are prone to technical debts introduced to cope-up with the pressure of time and resource constraints on developers. Data-access is a critical component of data-intensive systems as it determines the overall performance and functionality of such systems. While data access technical debts are getting attention from the research community, technical debts affecting the performance, are not well investigated. Objective: Identify, categorize, and validate data access performance issues in the context of NoSQL-based and polyglot persistence data-intensive systems using qualitative study. Method: We collect issues from NoSQL-based and polyglot persistence open-source data-intensive systems and identify data access performance issues using inductive coding and build a taxonomy of the root causes. Then, we validate the perceived relevance of the newly identified performance issues using a developer survey.

preprint2022arXiv

Do Developers Refactor Data Access Code? An Empirical Study

Developers often refactor code to improve the maintainability and comprehension of the software. There are many studies on refactoring activities in traditional software systems. However, refactoring in data-intensive systems is not well explored. Understanding the refactoring practices of developers is important to develop efficient tool support.We conducted a longitudinal study of refactoring activities in data access classes using 12 data-intensive subject systems. We investigated the prevalence and evolution of refactorings and the association of refactorings with data access smells. We also conducted a manual analysis of over 378 samples of data access refactoring instances to identify the functionalities of the code that are targeted by such refactorings. Our results show that (1) data access refactorings are prevalent and different in type. \textit{Rename variable} is the most prevalent data access refactoring. (2) The prevalence and type of refactorings vary as systems evolve in time. (3) Most data access refactorings target codes that implement data fetching and insertion. (4) Data access refactorings do not generally touch SQL queries. Overall, the results show that data access refactorings focus on improving the code quality but not the underlying data access operations. Hence, more work is needed from the research community on providing awareness and support to practitioners on the benefits of addressing data access smells with refactorings.

preprint2022arXiv

FIXME: Synchronize with Database An Empirical Study of Data Access Self-Admitted Technical Debt

Developers sometimes choose design and implementation shortcuts due to the pressure from tight release schedules. However, shortcuts introduce technical debt that increases as the software evolves. The debt needs to be repaid as fast as possible to minimize its impact on software development and software quality. Sometimes, technical debt is admitted by developers in comments and commit messages. Such debt is known as self-admitted technical debt (SATD). In data-intensive systems, where data manipulation is a critical functionality, the presence of SATD in the data access logic could seriously harm performance and maintainability. Understanding the composition and distribution of the SATDs across software systems and their evolution could provide insights into managing technical debt efficiently. We present a large-scale empirical study on the prevalence, composition, and evolution of SATD in data-intensive systems. We analyzed 83 open-source systems relying on relational databases as well as 19 systems relying on NoSQL databases. We detected SATD in source code comments obtained from different snapshots of the subject systems. To understand the evolution dynamics of SATDs, we conducted a survival analysis. Next, we performed a manual analysis of 361 sample data-access SATDs, investigating the composition of data-access SATDs and the reasons behind their introduction and removal. We identified 15 new SATD categories, out of which 11 are specific to database access operations. We found that most of the data-access SATDs are introduced in the later stages of change history rather than at the beginning. We also observed that bug fixing and refactoring are the main reasons behind the introduction of data-access SATDs.

preprint2022arXiv

On the Prevalence, Impact, and Evolution of SQL Code Smells in Data-Intensive Systems

Code smells indicate software design problems that harm software quality. Data-intensive systems that frequently access databases often suffer from SQL code smells besides the traditional smells. While there have been extensive studies on traditional code smells, recently, there has been a growing interest in SQL code smells. In this paper, we conduct an empirical study to investigate the prevalence and evolution of SQL code smells in open-source, data-intensive systems. We collected 150 projects and examined both traditional and SQL code smells in these projects. Our investigation delivers several important findings. First, SQL code smells are indeed prevalent in data-intensive software systems. Second, SQL code smells have a weak co-occurrence with traditional code smells. Third, SQL code smells have a weaker association with bugs than that of traditional code smells. Fourth, SQL code smells are more likely to be introduced at the beginning of the project lifetime and likely to be left in the code without a fix, compared to traditional code smells. Overall, our results show that SQL code smells are indeed prevalent and persistent in the studied data-intensive software systems. Developers should be aware of these smells and consider detecting and refactoring SQL code smells and traditional code smells separately, using dedicated tools.

preprint2021arXiv

Machine Learning Application Development: Practitioners' Insights

Nowadays, intelligent systems and services are getting increasingly popular as they provide data-driven solutions to diverse real-world problems, thanks to recent breakthroughs in Artificial Intelligence (AI) and Machine Learning (ML). However, machine learning meets software engineering not only with promising potentials but also with some inherent challenges. Despite some recent research efforts, we still do not have a clear understanding of the challenges of developing ML-based applications and the current industry practices. Moreover, it is unclear where software engineering researchers should focus their efforts to better support ML application developers. In this paper, we report about a survey that aimed to understand the challenges and best practices of ML application development. We synthesize the results obtained from 80 practitioners (with diverse skills, experience, and application domains) into 17 findings; outlining challenges and best practices for ML application development. Practitioners involved in the development of ML-based software systems can leverage the summarized best practices to improve the quality of their system. We hope that the reported challenges will inform the research community about topics that need to be investigated to improve the engineering process and the quality of ML-based applications.

preprint2020arXiv

Documentation of Machine Learning Software

Machine Learning software documentation is different from most of the documentations that were studied in software engineering research. Often, the users of these documentations are not software experts. The increasing interest in using data science and in particular, machine learning in different fields attracted scientists and engineers with various levels of knowledge about programming and software engineering. Our ultimate goal is automated generation and adaptation of machine learning software documents for users with different levels of expertise. We are interested in understanding the nature and triggers of the problems and the impact of the users' levels of expertise in the process of documentation evolution. We will investigate the Stack Overflow Q/As and classify the documentation related Q/As within the machine learning domain to understand the types and triggers of the problems as well as the potential change requests to the documentation. We intend to use the results for building on top of the state of the art techniques for automatic documentation generation and extending on the adoption, summarization, and explanation of software functionalities.