Source author record

Mattia Fazzini

Mattia Fazzini appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering Machine Learning Programming Languages

Catalog footprint

What is connected

6works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CUDABeaver: Benchmarking LLM-Based Automated CUDA Debugging

Debugging CUDA programs has long been challenging because failures often arise from subtle interactions among hardware behavior, compiler decisions, memory hierarchy, and asynchronous execution. More importantly, with the rapid expansion of GPU usage across scientific computing, machine learning, graphics, and systems workloads, CUDA debugging has become more challenging than ever. Current evaluations of LLM-based CUDA programming largely miss this setting: a model can pass correctness tests with repair by degeneration, simplifying the CUDA code into a safer but slower program that abandons the original optimization structure. We introduce CUDABEAVER, a benchmark for CUDA debugging from real failing workspaces produced during LLM-based CUDA generation. Each task provides the broken candidate, native build/test commands, raw error evidence, and a single editable file. CUDABEAVER evaluates whether a fixer truly repairs the failing CUDA code or merely finds a slower test-passing replacement, reporting results by failure category, debugging trajectory, stagnation mode, and performance preservation. We further propose pass@k(M,C,A), a protocol-conditional CUDA debugging metric by making the fixer M, corpus C, and protocol axes Aexplicit. Using this metric across 213 tasks and seven frontier LLMs, we show that protocol-aware evaluation gives a more faithful view of CUDA debugging ability: when performance-loss tolerance is high, fixers appear much stronger, but even a minor stricter performance requirement can sharply reduce measured success, shifting scores by up to 40 percentage points.

preprint2023arXiv

An Empirical Investigation into the Reproduction of Bug Reports for Android Apps

One of the key tasks related to ensuring mobile app quality is the reporting, management, and resolution of bug reports. As such, researchers have committed considerable resources toward automating various tasks of the bug management process for mobile apps, such as reproduction and triaging. However, the success of these automated approaches is largely dictated by the characteristics and properties of the bug reports they operate upon. As such, understanding mobile app bug reports is imperative to drive the continued advancement of report management techniques. While prior studies have examined high-level statistics of large sets of reports, we currently lack an in-depth investigation of how the information typically reported in mobile app issue trackers relates to the specific details generally required to reproduce the underlying failures. In this paper, we perform an in-depth analysis of 180 reproducible bug reports systematically mined from Android apps on GitHub and investigate how the information contained in the reports relates to the task of reproducing the described bugs. In our analysis, we focus on three pieces of information: the environment needed to reproduce the bug report, the steps to reproduce (S2Rs), and the observed behavior. Focusing on this information, we characterize failure types, identify the modality used to report the information, and characterize the quality of the information within the reports. We find that bugs are reported in a multi-modal fashion, the environment is not always provided, and S2Rs often contain missing or non-specific enough information. These findings carry with them important implications on automated bug reproduction techniques as well as automated bug report management approaches more generally.

preprint2022arXiv

Automatically Detecting API-induced Compatibility Issues in Android Apps: A Comparative Analysis (Replicability Study)

Fragmentation is a serious problem in the Android ecosystem. This problem is mainly caused by the fast evolution of the system itself and the various customizations independently maintained by different smartphone manufacturers. Many efforts have attempted to mitigate its impact via approaches to automatically pinpoint compatibility issues in Android apps. Unfortunately, at this stage, it is still unknown if this objective has been fulfilled, and the existing approaches can indeed be replicated and reliably leveraged to pinpoint compatibility issues in the wild. We, therefore, propose to fill this gap by first conducting a literature review within this topic to identify all the available approaches. Among the nine identified approaches, we then try our best to reproduce them based on their original datasets. After that, we go one step further to empirically compare those approaches against common datasets with real-world apps containing compatibility issues. Experimental results show that existing tools can indeed be reproduced, but their capabilities are quite distinct, as confirmed by the fact that there is only a small overlap of the results reported by the selected tools. This evidence suggests that more efforts should be spent by our community to achieve sound compatibility issues detection.

preprint2022arXiv

Do Customized Android Frameworks Keep Pace with Android?

To satisfy varying customer needs, device vendors and OS providers often rely on the open-source nature of the Android OS and offer customized versions of the Android OS. When a new version of the Android OS is released, device vendors and OS providers need to merge the changes from the Android OS into their customizations to account for its bug fixes, security patches, and new features. Because developers of customized OSs might have made changes to code locations that were also modified by the developers of the Android OS, the merge task can be characterized by conflicts, which can be time-consuming and error-prone to resolve. To provide more insight into this critical aspect of the Android ecosystem, we present an empirical study that investigates how eight open-source customizations of the Android OS merge the changes from the Android OS into their projects. The study analyzes how often the developers from the customized OSs merge changes from the Android OS, how often the developers experience textual merge conflicts, and the characteristics of these conflicts. Furthermore, to analyze the effect of the conflicts, the study also analyzes how the conflicts can affect a randomly selected sample of 1,000 apps. After analyzing 1,148 merge operations, we identified that developers perform these operations for 9.7\% of the released versions of the Android OS, developers will encounter at least one conflict in 41.3\% of the merge operations, 58.1\% of the conflicts required developers to change the customized OSs, and 64.4\% of the apps considered use at least one method affected by a conflict. In addition to detailing our results, the paper also discusses the implications of our findings and provides insights for researchers and practitioners working with Android and its customizations.

preprint2022arXiv

Identifying and Characterizing Silently-Evolved Methods in the Android API

With over 500,000 commits and more than 700 contributors, the Android platform is undoubtedly one of the largest industrial-scale software projects. This project provides the Android API, and developers heavily rely on this API to develop their Android apps. Unfortunately, because the Android platform and its API evolve at an extremely rapid pace, app developers need to continually monitor API changes to avoid compatibility issues in their apps (\ie issues that prevent apps from working as expected when running on newer versions of the API). Despite a large number of studies on compatibility issues in the Android API, the research community has not yet investigated issues related to silently-evolved methods (SEMs). These methods are functions whose behavior might have changed but the corresponding documentation did not change accordingly. Because app developers rely on the provided documentation to evolve their apps, changes to methods that are not suitably documented may lead to unexpected failures in the apps using these methods. To shed light on this type of issue, we conducted a large-scale empirical study in which we identified and characterized SEMs across ten versions of the Android API. In the study, we identified SEMs, characterized the nature of the changes, and analyzed the impact of SEMs on a set of 1,000 real-world Android apps. Our experimental results show that SEMs do exist in the Android API, and that 957 of the apps we considered use at least one SEM. Based on these results, we argue that the Android platform developers should take actions to avoid introducing SEMs, especially those involving semantic changes. This situation highlights the need for automated techniques and tools to help Android practitioners in this task.

preprint2016arXiv

From Manual Android Tests to Automated and Platform Independent Test Scripts

Because Mobile apps are extremely popular and often mission critical nowadays, companies invest a great deal of resources in testing the apps they provide to their customers. Testing is particularly important for Android apps, which must run on a multitude of devices and operating system versions. Unfortunately, as we confirmed in many interviews with quality assurance professionals, app testing is today a very human intensive, and therefore tedious and error prone, activity. To address this problem, and better support testing of Android apps, we propose a new technique that allows testers to easily create platform independent test scripts for an app and automatically run the generated test scripts on multiple devices and operating system versions. The technique does so without modifying the app under test or the runtime system, by (1) intercepting the interactions of the tester with the app and (2) providing the tester with an intuitive way to specify expected results that it then encode as test oracles. We implemented our technique in a tool named Barista and used the tool to evaluate the practical usefulness and applicability of our approach. Our results show that Barista can faithfully encode user defined test cases as test scripts with built-in oracles, generates test scripts that can run on multiple platforms, and can outperform a state-of-the-art tool with similar functionality. Barista and our experimental infrastructure are publicly available.

Mattia Fazzini

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

CUDABeaver: Benchmarking LLM-Based Automated CUDA Debugging

An Empirical Investigation into the Reproduction of Bug Reports for Android Apps

Automatically Detecting API-induced Compatibility Issues in Android Apps: A Comparative Analysis (Replicability Study)

Do Customized Android Frameworks Keep Pace with Android?

Identifying and Characterizing Silently-Evolved Methods in the Android API

From Manual Android Tests to Automated and Platform Independent Test Scripts