Source author record

Andrea Stocco

Andrea Stocco appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering Artificial Intelligence Robotics

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Coverage-Guided Road Selection and Prioritization for Efficient Testing in Autonomous Driving Systems

Autonomous Driving Assistance Systems (ADAS) rely on extensive testing to ensure safety and reliability, yet road scenario datasets often contain redundant cases that slow down the testing process without improving fault detection. To address this issue, we present a novel test prioritization framework that reduces redundancy while preserving geometric and behavioral diversity. Road scenarios are clustered based on geometric and dynamic features of the ADAS driving behavior, from which representative cases are selected to guarantee coverage. Roads are finally prioritized based on geometric complexity, driving difficulty, and historical failures, ensuring that the most critical and challenging tests are executed first. We evaluate our framework on the OPENCAT dataset and the Udacity self-driving car simulator using two ADAS models. On average, our approach achieves an 89% reduction in test suite size while retaining an average of 79% of failed road scenarios. The prioritization strategy improves early failure detection by up to 95x compared to random baselines.

preprint2026arXiv

STELLAR: A Search-Based Testing Framework for Large Language Model Applications

Large Language Model (LLM)-based applications are increasingly deployed across various domains, including customer service, education, and mobility. However, these systems are prone to inaccurate, fictitious, or harmful responses, and their vast, high-dimensional input space makes systematic testing particularly challenging. To address this, we present STELLAR, an automated search-based testing framework for LLM-based applications that systematically uncovers text inputs leading to inappropriate system responses. Our framework models test generation as an optimization problem and discretizes the input space into stylistic, content-related, and perturbation features. Unlike prior work that focuses on prompt optimization or coverage heuristics, our work employs evolutionary optimization to dynamically explore feature combinations that are more likely to expose failures. We evaluate STELLAR on three LLM-based conversational question-answering systems. The first focuses on safety, benchmarking both public and proprietary LLMs against malicious or unsafe prompts. The second and third target navigation, using an open-source and an industrial retrieval-augmented system for in-vehicle venue recommendations. Overall, STELLAR exposes up to 4.3 times (average 2.5 times) more failures than the existing baseline approaches.

preprint2022arXiv

Mind the Gap! A Study on the Transferability of Virtual vs Physical-world Testing of Autonomous Driving Systems

Safe deployment of self-driving cars (SDC) necessitates thorough simulated and in-field testing. Most testing techniques consider virtualized SDCs within a simulation environment, whereas less effort has been directed towards assessing whether such techniques transfer to and are effective with a physical real-world vehicle. In this paper, we shed light on the problem of generalizing testing results obtained in a driving simulator to a physical platform and provide a characterization and quantification of the sim2real gap affecting SDC testing. In our empirical study, we compare SDC testing when deployed on a physical small-scale vehicle vs its digital twin. Due to the unavailability of driving quality indicators from the physical platform, we use neural rendering to estimate them through visual odometry, hence allowing full comparability with the digital twin. Then, we investigate the transferability of behavior and failure exposure between virtual and real-world environments, targeting both unintended abnormal test data and intended adversarial examples. Our study shows that, despite the usage of a faithful digital twin, there are still critical shortcomings that contribute to the reality gap between the virtual and physical world, threatening existing testing solutions that only consider virtual SDCs. On the positive side, our results present the test configurations for which physical testing can be avoided, either because their outcome does transfer between virtual and physical environments, or because the uncertainty profiles in the simulator can help predict their outcome in the real world.