Source author record

Sven Apel

Sven Apel appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering Programming Languages Artificial Intelligence Computation and Language Logic in Computer Science

Catalog footprint

What is connected

13works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Pragmatic Reasoning improves LLM Code Generation

Pragmatic reasoning is pervasive in human-human communication - it allows us to leverage shared knowledge and counterfactual reasoning in order to infer the intention of a conversational partner given their ambiguous or underspecified message. In human-computer communication, underspecified messages often represent a major challenge: for instance, translating natural language instructions into code is difficult when user instructions contain inherent ambiguities. In the present paper, we aim to scale up the pragmatic "Rational Speech Act" framework to naturalistic language-to-code problems, and propose a way of dealing with multiple meaning-equivalent instruction alternatives, an issue that does not arise in previous toy-scale problems. We evaluate our method, CodeRSA, with two recent LLMs (Llama-3-8B-Instruct and Qwen-2.5-7B-Instruct) on two widely used code generation benchmarks (HumanEval and MBPP). Our experimental results show that CodeRSA consistently outperforms common baselines, surpasses the state-of-the-art approach in most cases, and demonstrates robust overall performance. Qualitative analyses demonstrate that it exhibits the desired behavior for the right reasons. These findings underscore the effectiveness of integrating pragmatic reasoning into a naturalistic complex communication task, language-to-code generation, offering a promising direction for enhancing code generation quality in LLMs and emphasizing the importance of pragmatic reasoning in complex communication settings.

preprint2022arXiv

Causality in Configurable Software Systems

Detecting and understanding reasons for defects and inadvertent behavior in software is challenging due to their increasing complexity. In configurable software systems, the combinatorics that arises from the multitude of features a user might select from adds a further layer of complexity. We introduce the notion of feature causality, which is based on counterfactual reasoning and inspired by the seminal definition of actual causality by Halpern and Pearl. Feature causality operates at the level of system configurations and is capable of identifying features and their interactions that are the reason for emerging functional and non-functional properties. We present various methods to explicate these reasons, in particular well-established notions of responsibility and blame that we extend to the feature-oriented setting. Establishing a close connection of feature causality to prime implicants, we provide algorithms to effectively compute feature causes and causal explications. By means of an evaluation on a wide range of configurable software systems, including community benchmarks and real-world systems, we demonstrate the feasibility of our approach: We illustrate how our notion of causality facilitates to identify root causes, estimate the effects of features, and detect feature interactions.

preprint2022arXiv

On Debugging the Performance of Configurable Software Systems: Developer Needs and Tailored Tool Support

Determining whether a configurable software system has a performance bug or it was misconfigured is often challenging. While there are numerous debugging techniques that can support developers in this task, there is limited empirical evidence of how useful the techniques are to address the actual needs that developers have when debugging the performance of configurable software systems; most techniques are often evaluated in terms of technical accuracy instead of their usability. In this paper, we take a human-centered approach to identify, design, implement, and evaluate a solution to support developers in the process of debugging the performance of configurable software systems. We first conduct an exploratory study with 19 developers to identify the information needs that developers have during this process. Subsequently, we design and implement a tailored tool, adapting techniques from prior work, to support those needs. Two user studies, with a total of 20 developers, validate and confirm that the information that we provide helps developers debug the performance of configurable software systems.

preprint2021arXiv

White-Box Analysis over Machine Learning: Modeling Performance of Configurable Systems

Performance-influence models can help stakeholders understand how and where configuration options and their interactions influence the performance of a system. With this understanding, stakeholders can debug performance behavior and make deliberate configuration decisions. Current black-box techniques to build such models combine various sampling and learning strategies, resulting in tradeoffs between measurement effort, accuracy, and interpretability. We present Comprex, a white-box approach to build performance-influence models for configurable systems, combining insights of local measurements, dynamic taint analysis to track options in the implementation, compositionality, and compression of the configuration space, without relying on machine learning to extrapolate incomplete samples. Our evaluation on 4 widely-used, open-source projects demonstrates that Comprex builds similarly accurate performance-influence models to the most accurate and expensive black-box approach, but at a reduced cost and with additional benefits from interpretable and local models.

preprint2021arXiv

White-Box Performance-Influence Models: A Profiling and Learning Approach

Many modern software systems are highly configurable, allowing the user to tune them for performance and more. Current performance modeling approaches aim at finding performance-optimal configurations by building performance models in a black-box manner. While these models provide accurate estimates, they cannot pinpoint causes of observed performance behavior to specific code regions. This does not only hinder system understanding, but it also complicates tracing the influence of configuration options to individual methods. We propose a white-box approach that models configuration-dependent performance behavior at the method level. This allows us to predict the influence of configuration decisions on individual methods, supporting system understanding and performance debugging. The approach consists of two steps: First, we use a coarse-grained profiler and learn performance-influence models for all methods, potentially identifying some methods that are highly configuration- and performance-sensitive, causing inaccurate predictions. Second, we re-measure these methods with a fine-grained profiler and learn more accurate models, at higher cost, though. By means of 9 real-world Java software systems, we demonstrate that our approach can efficiently identify configuration-relevant methods and learn accurate performance-influence models.

preprint2020arXiv

ConfigCrusher: Towards White-Box Performance Analysis for Configurable Systems

Stakeholders of configurable systems are often interested in knowing how configuration options influence the performance of a system to facilitate, for example, the debugging and optimization processes of these systems. Several black-box approaches can be used to obtain this information, but they either sample a large number of configurations to make accurate predictions or miss important performance-influencing interactions when sampling few configurations. Furthermore, black-box approaches cannot pinpoint the parts of a system that are responsible for performance differences among configurations. This article proposes ConfigCrusher, a white-box performance analysis that inspects the implementation of a system to guide the performance analysis, exploiting several insights of configurable systems in the process. ConfigCrusher employs a static data-flow analysis to identify how configuration options may influence control-flow statements and instruments code regions, corresponding to these statements, to dynamically analyze the influence of configuration options on the regions' performance. Our evaluation on 10 configurable systems shows the feasibility of our white-box approach to more efficiently build performance-influence models that are similar to or more accurate than current state of the art approaches. Overall, we showcase the benefits of white-box performance analyses and their potential to outperform black-box approaches and provide additional information for analyzing configurable systems.

preprint2016arXiv

A Comparison of 10 Sampling Algorithms for Configurable Systems

Almost every software system provides configuration options to tailor the system to the target platform and application scenario. Often, this configurability renders the analysis of every individual system configuration infeasible. To address this problem, researchers have proposed a diverse set of sampling algorithms. We present a comparative study of 10 state-of-the-art sampling algorithms regarding their fault-detection capability and size of sample sets. The former is important to improve software quality and the latter to reduce the time of analysis. In a nutshell, we found that sampling algorithms with larger sample sets are able to detect higher numbers of faults, but simple algorithms with small sample sets, such as most-enabled-disabled, are the most efficient in most contexts. Furthermore, we observed that the limiting assumptions made in previous work influence the number of detected faults, the size of sample sets, and the ranking of algorithms. Finally, we have identified a number of technical challenges when trying to avoid the limiting assumptions, which questions the practicality of certain sampling algorithms.

preprint2016arXiv

Classifying Developers into Core and Peripheral: An Empirical Study on Count and Network Metrics

Knowledge about the roles developers play in a software project is crucial to understanding the project's collaborative dynamics. Developers are often classified according to the dichotomy of core and peripheral roles. Typically, operationalizations based on simple counts of developer activities (e.g., number of commits) are used for this purpose, but there is concern regarding their validity and ability to elicit meaningful insights. To shed light on this issue, we investigate whether commonly used operationalizations of core--peripheral roles produce consistent results, and we validate them with respect to developers' perceptions by surveying 166 developers. Improving over the state of the art, we propose a relational perspective on developer roles, using developer networks to model the organizational structure, and by examining core--peripheral roles in terms of developers' positions and stability within the organizational structure. In a study of 10 substantial open-source projects, we found that the existing and our proposed core--peripheral operationalizations are largely consistent and valid. Furthermore, we demonstrate that a relational perspective can reveal further meaningful insights, such as that core developers exhibit high positional stability, upper positions in the hierarchy, and high levels of coordination with other core developers.

preprint2016arXiv

Do #ifdefs Influence the Occurrence of Vulnerabilities? An Empirical Study of the Linux Kernel

Preprocessors support the diversification of software products with #ifdefs, but also require additional effort from developers to maintain and understand variable code. We conjecture that #ifdefs cause developers to produce more vulnerable code because they are required to reason about multiple features simultaneously and maintain complex mental models of dependencies of configurable code. We extracted a variational call graph across all configurations of the Linux kernel, and used configuration complexity metrics to compare vulnerable and non-vulnerable functions considering their vulnerability history. Our goal was to learn about whether we can observe a measurable influence of configuration complexity on the occurrence of vulnerabilities. Our results suggest, among others, that vulnerable functions have higher variability than non-vulnerable ones and are also constrained by fewer configuration options. This suggests that developers are inclined to notice functions appear in frequently-compiled product variants. We aim to raise developers' awareness to address variability more systematically, since configuration complexity is an important, but often ignored aspect of software product lines.

preprint2016arXiv

Evolutionary Trends of Developer Coordination: A Network Approach

Software evolution is a fundamental process that transcends the realm of technical artifacts and permeates the entire organizational structure of a software project. By means of a longitudinal empirical study of 18 large open-source projects, we examine and discuss the evolutionary principles that govern the coordination of developers. By applying a network-analytic approach, we found that the implicit and self-organizing structure of developer coordination is ubiquitously described by non-random organizational principles that defy conventional software-engineering wisdom. In particular, we found that: (a) developers form scale-free networks, in which the majority of coordination requirements arise among an extremely small number of developers, (b) developers tend to accumulate coordination requirements with more and more developers over time, presumably limited by an upper bound, and (c) initially developers are hierarchically arranged, but over time, form a hybrid structure, in which core developers are hierarchically arranged and peripheral developers are not. Our results suggest that the organizational structure of large projects is constrained to evolve towards a state that balances the costs and benefits of developer coordination, and the mechanisms used to achieve this state depend on the project's scale.

preprint2013arXiv

Domain Types: Selecting Abstractions Based on Variable Usage

The success of software model checking depends on finding an appropriate abstraction of the subject program. The choice of the abstract domain and the analysis configuration is currently left to the user, who may not be familiar with the tradeoffs and performance details of the available abstract domains. We introduce the concept of domain types, which classify the program variables into types that are more fine-grained than standard declared types, such as int or long, in order to guide the selection of an appropriate abstract domain for a model checker. Our implementation determines the domain type for each variable in a pre-processing step, based on the variable usage in the program, and then assigns each variable to an abstract domain. The model-checking framework that we use supports to specify a separate analysis precision for each abstract domain, such that we can freely configure the analysis. We experimentally demonstrate a significant impact of the choice of the abstract domain per variable. We consider one explicit (hash tables for integer values) and one symbolic (binary decision diagrams) domain. The experiments are based on standard verification tasks that are taken from recent competitions on software verification. Each abstract domain has unique advantages in representing the state space of variables of a certain domain type. Our experiments show that software model checkers can be improved with a domain-type guided combination of abstract domains.

preprint2011arXiv

Feature-Aware Verification

A software product line is a set of software products that are distinguished in terms of features (i.e., end-user--visible units of behavior). Feature interactions ---situations in which the combination of features leads to emergent and possibly critical behavior--- are a major source of failures in software product lines. We explore how feature-aware verification can improve the automatic detection of feature interactions in software product lines. Feature-aware verification uses product-line verification techniques and supports the specification of feature properties along with the features in separate and composable units. It integrates the technique of variability encoding to verify a product line without generating and checking a possibly exponential number of feature combinations. We developed the tool suite SPLverifier for feature-aware verification, which is based on standard model-checking technology. We applied it to an e-mail system that incorporates domain knowledge of AT&T. We found that feature interactions can be detected automatically based on specifications that have only feature-local knowledge, and that variability encoding significantly improves the verification performance when proving the absence of interactions.

preprint2010arXiv

Type-Safe Feature-Oriented Product Lines

A feature-oriented product line is a family of programs that share a common set of features. A feature implements a stakeholder's requirement, represents a design decision and configuration option and, when added to a program, involves the introduction of new structures, such as classes and methods, and the refinement of existing ones, such as extending methods. With feature-oriented decomposition, programs can be generated, solely on the basis of a user's selection of features, by the composition of the corresponding feature code. A key challenge of feature-oriented product line engineering is how to guarantee the correctness of an entire feature-oriented product line, i.e., of all of the member programs generated from different combinations of features. As the number of valid feature combinations grows progressively with the number of features, it is not feasible to check all individual programs. The only feasible approach is to have a type system check the entire code base of the feature-oriented product line. We have developed such a type system on the basis of a formal model of a feature-oriented Java-like language. We demonstrate that the type system ensures that every valid program of a feature-oriented product line is well-typed and that the type system is complete.

Sven Apel

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Pragmatic Reasoning improves LLM Code Generation

Causality in Configurable Software Systems

On Debugging the Performance of Configurable Software Systems: Developer Needs and Tailored Tool Support

White-Box Analysis over Machine Learning: Modeling Performance of Configurable Systems

White-Box Performance-Influence Models: A Profiling and Learning Approach

ConfigCrusher: Towards White-Box Performance Analysis for Configurable Systems

A Comparison of 10 Sampling Algorithms for Configurable Systems

Classifying Developers into Core and Peripheral: An Empirical Study on Count and Network Metrics

Do #ifdefs Influence the Occurrence of Vulnerabilities? An Empirical Study of the Linux Kernel

Evolutionary Trends of Developer Coordination: A Network Approach

Domain Types: Selecting Abstractions Based on Variable Usage

Feature-Aware Verification

Type-Safe Feature-Oriented Product Lines