Source author record

Yves Le Traon

Yves Le Traon appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Software Engineering Machine Learning Artificial Intelligence Cryptography and Security Computation and Language Computer Vision cs.CY Neural and Evolutionary Computing physics.soc-ph Populations and Evolution

Catalog footprint

What is connected

35works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Centralized vs Decentralized Federated Learning: A trade-off performance analysis

Federated Learning (FL) has emerged as a promising paradigm for collaborative model training across distributed edge devices while preserving data privacy especially with the huge increase amount of data due to the adoption of technologies which contributes to the growing number of IoT devices. Storing this amount of data centrally is challenging due to issues like limited communication, privacy, and regulations. FL can be Centralized (CFL), Decentralized (DFL), and Semi-decentralized (SDFL). Choosing the right FL architecture depends on the application's needs. However, very few research studies have experimentally compared these three types of architectures to not only understand the respective strengths and limitations, but also trade-offs between different performance indicators. This paper overcome this lack of analysis, conducting experimental analyses using the Fedstellar simulator, MNIST dataset, and MLP classifier.

preprint2026arXiv

Federated Imputation under Heterogeneous Feature Spaces

Federated Learning (FL) enables collaborative training across decentralized clients, but most methods assume aligned feature schemas, an assumption that rarely holds in tabular settings where clients observe only partially overlapping feature subsets. In these heterogeneous feature spaces, parameter-averaging methods (e.g., FedAvg) transfer little information across weakly overlapping or disjoint feature groups, limiting their effectiveness for federated imputation. To overcome this, we propose \textbf{FedHF-Impute}, a federated imputation framework that separates structural feature unavailability from conventional missingness and uses a shared global feature graph to propagate information across statistically related features through message passing. This enables indirect cross-client knowledge transfer, even when features are never jointly observed locally, while preserving standard federated communication. Under simulated partial schema overlap on the SECOM and AirQuality datasets, FedHF-Impute improves imputation accuracy (RMSE) over FL baselines by 26.9\%, and 8.4\% respectively, while achieving comparable performance on PhysioNET, with only a 0.3\% difference relative to the best baseline.

preprint2026arXiv

From Flat Language Labels to Typological Priors: Structured Language Conditioning for Multilingual Speech-to-Speech Translation

Compositional speech-to-speech translation (S2ST) systems built upon speech large language models (SpeechLLMs) have recently shown promising performance. However, existing S2ST systems often either neglect source-language information or encode it through a language-as-label paradigm, representing each source language as an independent flat embedding. Such a design overlooks systematic linguistic structure shared across languages, which may limit data-efficient multilingual adaptation when supervised S2ST data are scarce. To address this issue, we propose S2ST-Omni 2, a many-to-one compositional S2ST framework that systematically reformulates multilingual language conditioning from flat language labels to structured typological priors. Specifically, S2ST-Omni 2 revisits language conditioning at three levels: typology-informed hierarchical language encoding for structured source-language representation, dynamically-gated language-aware Dual-CTC for content-adaptive acoustic modulation, and typology-aware LLM prompting for decoder-side linguistic guidance. Experiments on CVSS-C show that S2ST-Omni 2 achieves superior average performance among representative S2ST approaches across BLEU, COMET, ASR-BLEU, and BLASER 2.0 under the adopted evaluation protocol. Ablation studies indicate that the proposed representation-level, acoustic-level, and decoding-level strategies provide complementary benefits. Moreover, controlled data-budget analyses and a Japanese-to-English evaluation using only approximately 3 hours of supervised training data suggest that explicit typological priors provide useful inductive biases for data-efficient multilingual S2ST.

preprint2023arXiv

Efficient Mutation Testing via Pre-Trained Language Models

Mutation testing is an established fault-based testing technique. It operates by seeding faults into the programs under test and asking developers to write tests that reveal these faults. These tests have the potential to reveal a large number of faults -- those that couple with the seeded ones -- and thus are deemed important. To this end, mutation testing should seed faults that are both "natural" in a sense easily understood by developers and strong (have high chances to reveal faults). To achieve this we propose using pre-trained generative language models (i.e. CodeBERT) that have the ability to produce developer-like code that operates similarly, but not exactly, as the target code. This means that the models have the ability to seed natural faults, thereby offering opportunities to perform mutation testing. We realise this idea by implementing $μ$BERT, a mutation testing technique that performs mutation testing using CodeBert and empirically evaluated it using 689 faulty program versions. Our results show that the fault revelation ability of $μ$BERT is higher than that of a state-of-the-art mutation testing (PiTest), yielding tests that have up to 17% higher fault detection potential than that of PiTest. Moreover, we observe that $μ$BERT can complement PiTest, being able to detect 47 bugs missed by PiTest, while at the same time, PiTest can find 13 bugs missed by $μ$BERT.

preprint2023arXiv

LaF: Labeling-Free Model Selection for Automated Deep Neural Network Reusing

Applying deep learning to science is a new trend in recent years which leads DL engineering to become an important problem. Although training data preparation, model architecture design, and model training are the normal processes to build DL models, all of them are complex and costly. Therefore, reusing the open-sourced pre-trained model is a practical way to bypass this hurdle for developers. Given a specific task, developers can collect massive pre-trained deep neural networks from public sources for re-using. However, testing the performance (e.g., accuracy and robustness) of multiple DNNs and recommending which model should be used is challenging regarding the scarcity of labeled data and the demand for domain expertise. In this paper, we propose a labeling-free (LaF) model selection approach to overcome the limitations of labeling efforts for automated model reusing. The main idea is to statistically learn a Bayesian model to infer the models' specialty only based on predicted labels. We evaluate LaF using 9 benchmark datasets including image, text, and source code, and 165 DNNs, considering both the accuracy and robustness of models. The experimental results demonstrate that LaF outperforms the baseline methods by up to 0.74 and 0.53 on Spearman's correlation and Kendall's $τ$, respectively.

preprint2023arXiv

MIXCODE: Enhancing Code Classification by Mixup-Based Data Augmentation

Inspired by the great success of Deep Neural Networks (DNNs) in natural language processing (NLP), DNNs have been increasingly applied in source code analysis and attracted significant attention from the software engineering community. Due to its data-driven nature, a DNN model requires massive and high-quality labeled training data to achieve expert-level performance. Collecting such data is often not hard, but the labeling process is notoriously laborious. The task of DNN-based code analysis even worsens the situation because source code labeling also demands sophisticated expertise. Data augmentation has been a popular approach to supplement training data in domains such as computer vision and NLP. However, existing data augmentation approaches in code analysis adopt simple methods, such as data transformation and adversarial example generation, thus bringing limited performance superiority. In this paper, we propose a data augmentation approach MIXCODE that aims to effectively supplement valid training data, inspired by the recent advance named Mixup in computer vision. Specifically, we first utilize multiple code refactoring methods to generate transformed code that holds consistent labels with the original data. Then, we adapt the Mixup technique to mix the original code with the transformed code to augment the training data. We evaluate MIXCODE on two programming languages (Java and Python), two code tasks (problem classification and bug detection), four benchmark datasets (JAVA250, Python800, CodRep1, and Refactory), and seven model architectures (including two pre-trained models CodeBERT and GraphCodeBERT). Experimental results demonstrate that MIXCODE outperforms the baseline data augmentation approach by up to 6.24% in accuracy and 26.06% in robustness.

preprint2022arXiv

A Unified Framework for Adversarial Attack and Defense in Constrained Feature Space

The generation of feasible adversarial examples is necessary for properly assessing models that work in constrained feature space. However, it remains a challenging task to enforce constraints into attacks that were designed for computer vision. We propose a unified framework to generate feasible adversarial examples that satisfy given domain constraints. Our framework can handle both linear and non-linear constraints. We instantiate our framework into two algorithms: a gradient-based attack that introduces constraints in the loss function to maximize, and a multi-objective search algorithm that aims for misclassification, perturbation minimization, and constraint satisfaction. We show that our approach is effective in four different domains, with a success rate of up to 100%, where state-of-the-art attacks fail to generate a single feasible example. In addition to adversarial retraining, we propose to introduce engineered non-convex constraints to improve model adversarial robustness. We demonstrate that this new defense is as effective as adversarial retraining. Our framework forms the starting point for research on constrained adversarial attacks and provides relevant baselines and datasets that future research can exploit.

preprint2022arXiv

An In-depth Study of Java Deserialization Remote-Code Execution Exploits and Vulnerabilities

Nowadays, an increasing number of applications uses deserialization. This technique, based on rebuilding the instance of objects from serialized byte streams, can be dangerous since it can open the application to attacks such as remote code execution (RCE) if the data to deserialize is originating from an untrusted source. Deserialization vulnerabilities are so critical that they are in OWASP's list of top 10 security risks for web applications. This is mainly caused by faults in the development process of applications and by flaws in their dependencies, i.e., flaws in the libraries used by these applications. No previous work has studied deserialization attacks in-depth: How are they performed? How are weaknesses introduced and patched? And for how long are vulnerabilities present in the codebase? To yield a deeper understanding of this important kind of vulnerability, we perform two main analyses: one on attack gadgets, i.e., exploitable pieces of code, present in Java libraries, and one on vulnerabilities present in Java applications. For the first analysis, we conduct an exploratory large-scale study by running 256515 experiments in which we vary the versions of libraries for each of the 19 publicly available exploits. Such attacks rely on a combination of gadgets present in one or multiple Java libraries. A gadget is a method which is using objects or fields that can be attacker-controlled. Our goal is to precisely identify library versions containing gadgets and to understand how gadgets have been introduced and how they have been patched. We observe that the modification of one innocent-looking detail in a class -- such as making it public -- can already introduce a gadget. Furthermore, we noticed that among the studied libraries, 37.5% are not patched, leaving gadgets available for future attacks. For the second analysis, we manually analyze 104 deserialization vulnerabilities CVEs to understand how vulnerabilities are introduced and patched in real-life Java applications. Results indicate that the vulnerabilities are not always completely patched or that a workaround solution is proposed. With a workaround solution, applications are still vulnerable since the code itself is unchanged.

preprint2022arXiv

Cerebro: Static Subsuming Mutant Selection

Mutation testing research has indicated that a major part of its application cost is due to the large number of low utility mutants that it introduces. Although previous research has identified this issue, no previous study has proposed any effective solution to the problem. Thus, it remains unclear how to mutate and test a given piece of code in a best effort way, i.e., achieving a good trade-off between invested effort and test effectiveness. To achieve this, we propose Cerebro, a machine learning approach that statically selects subsuming mutants, i.e., the set of mutants that resides on the top of the subsumption hierarchy, based on the mutants' surrounding code context. We evaluate Cerebro using 48 and 10 programs written in C and Java, respectively, and demonstrate that it preserves the mutation testing benefits while limiting application cost, i.e., reduces all cost application factors such as equivalent mutants, mutant executions, and the mutants requiring analysis. We demonstrate that Cerebro has strong inter-project prediction ability, which is significantly higher than two baseline methods, i.e., supervised learning on features proposed by state-of-the-art, and random mutant selection. More importantly, our results show that Cerebro's selected mutants lead to strong tests that are respectively capable of killing 2 times higher than the number of subsuming mutants killed by the baselines when selecting the same number of mutants. At the same time, Cerebro reduces the cost-related factors, as it selects, on average, 68% fewer equivalent mutants, while requiring 90% fewer test executions than the baselines.

preprint2022arXiv

Characterizing and Understanding the Behavior of Quantized Models for Reliable Deployment

Deep Neural Networks (DNNs) have gained considerable attention in the past decades due to their astounding performance in different applications, such as natural language modeling, self-driving assistance, and source code understanding. With rapid exploration, more and more complex DNN architectures have been proposed along with huge pre-trained model parameters. The common way to use such DNN models in user-friendly devices (e.g., mobile phones) is to perform model compression before deployment. However, recent research has demonstrated that model compression, e.g., model quantization, yields accuracy degradation as well as outputs disagreements when tested on unseen data. Since the unseen data always include distribution shifts and often appear in the wild, the quality and reliability of quantized models are not ensured. In this paper, we conduct a comprehensive study to characterize and help users understand the behaviors of quantized models. Our study considers 4 datasets spanning from image to text, 8 DNN architectures including feed-forward neural networks and recurrent neural networks, and 42 shifted sets with both synthetic and natural distribution shifts. The results reveal that 1) data with distribution shifts happen more disagreements than without. 2) Quantization-aware training can produce more stable models than standard, adversarial, and Mixup training. 3) Disagreements often have closer top-1 and top-2 output probabilities, and $Margin$ is a better indicator than the other uncertainty metrics to distinguish disagreements. 4) Retraining with disagreements has limited efficiency in removing disagreements. We opensource our code and models as a new benchmark for further studying the quantized models.

preprint2022arXiv

CodeBERT-nt: code naturalness via CodeBERT

Much of software-engineering research relies on the naturalness of code, the fact that code, in small code snippets, is repetitive and can be predicted using statistical language models like n-gram. Although powerful, training such models on large code corpus is tedious, time-consuming and sensitive to code patterns (and practices) encountered during training. Consequently, these models are often trained on a small corpora and estimate the language naturalness that is relative to a specific style of programming or type of project. To overcome these issues, we propose using pre-trained language models to infer code naturalness. Pre-trained models are often built on big data, are easy to use in an out-of-the-box way and include powerful learning associations mechanisms. Our key idea is to quantify code naturalness through its predictability, by using state-of-the-art generative pre-trained language models. To this end, we infer naturalness by masking (omitting) code tokens, one at a time, of code-sequences, and checking the models' ability to predict them. To this end, we evaluate three different predictability metrics; a) measuring the number of exact matches of the predictions, b) computing the embedding similarity between the original and predicted code, i.e., similarity at the vector space, and c) computing the confidence of the model when doing the token completion task irrespective of the outcome. We implement this workflow, named CodeBERT-nt, and evaluate its capability to prioritize buggy lines over non-buggy ones when ranking code based on its naturalness. Our results, on 2510 buggy versions of 40 projects from the SmartShark dataset, show that CodeBERT-nt outperforms both, random-uniform and complexity-based ranking techniques, and yields comparable results (slightly better) than the n-gram models.

preprint2022arXiv

Efficient and Transferable Adversarial Examples from Bayesian Neural Networks

An established way to improve the transferability of black-box evasion attacks is to craft the adversarial examples on an ensemble-based surrogate to increase diversity. We argue that transferability is fundamentally related to uncertainty. Based on a state-of-the-art Bayesian Deep Learning technique, we propose a new method to efficiently build a surrogate by sampling approximately from the posterior distribution of neural network weights, which represents the belief about the value of each parameter. Our extensive experiments on ImageNet, CIFAR-10 and MNIST show that our approach improves the success rates of four state-of-the-art attacks significantly (up to 83.2 percentage points), in both intra-architecture and inter-architecture transferability. On ImageNet, our approach can reach 94% of success rate while reducing training computations from 11.6 to 2.4 exaflops, compared to an ensemble of independently trained DNNs. Our vanilla surrogate achieves 87.5% of the time higher transferability than three test-time techniques designed for this purpose. Our work demonstrates that the way to train a surrogate has been overlooked, although it is an important element of transfer-based attacks. We are, therefore, the first to review the effectiveness of several training methods in increasing transferability. We provide new directions to better understand the transferability phenomenon and offer a simple but strong baseline for future work.

preprint2022arXiv

GraphCode2Vec: Generic Code Embedding via Lexical and Program Dependence Analyses

Code embedding is a keystone in the application of machine learning on several Software Engineering (SE) tasks. To effectively support a plethora of SE tasks, the embedding needs to capture program syntax and semantics in a way that is generic. To this end, we propose the first self-supervised pre-training approach (called GraphCode2Vec) which produces task-agnostic embedding of lexical and program dependence features. GraphCode2Vec achieves this via a synergistic combination of code analysis and Graph Neural Networks. GraphCode2Vec is generic, it allows pre-training, and it is applicable to several SE downstream tasks. We evaluate the effectiveness of GraphCode2Vec on four (4) tasks (method name prediction, solution classification, mutation testing and overfitted patch classification), and compare it with four (4) similarly generic code embedding baselines (Code2Seq, Code2Vec, CodeBERT, GraphCodeBERT) and 7 task-specific, learning-based methods. In particular, GraphCode2Vec is more effective than both generic and task-specific learning-based baselines. It is also complementary and comparable to GraphCodeBERT (a larger and more complex model). We also demonstrate through a probing and ablation study that GraphCode2Vec learns lexical and program dependence features and that self-supervised pre-training improves effectiveness.

preprint2022arXiv

Influence-Driven Data Poisoning in Graph-Based Semi-Supervised Classifiers

Graph-based Semi-Supervised Learning (GSSL) is a practical solution to learn from a limited amount of labelled data together with a vast amount of unlabelled data. However, due to their reliance on the known labels to infer the unknown labels, these algorithms are sensitive to data quality. It is therefore essential to study the potential threats related to the labelled data, more specifically, label poisoning. In this paper, we propose a novel data poisoning method which efficiently approximates the result of label inference to identify the inputs which, if poisoned, would produce the highest number of incorrectly inferred labels. We extensively evaluate our approach on three classification problems under 24 different experimental settings each. Compared to the state of the art, our influence-driven attack produces an average increase of error rate 50\% higher, while being faster by multiple orders of magnitude. Moreover, our method can inform engineers of inputs that deserve investigation (relabelling them) before training the learning model. We show that relabelling one-third of the poisoned inputs (selected based on their influence) reduces the poisoning effect by 50\%.

preprint2022arXiv

LGV: Boosting Adversarial Example Transferability from Large Geometric Vicinity

We propose transferability from Large Geometric Vicinity (LGV), a new technique to increase the transferability of black-box adversarial attacks. LGV starts from a pretrained surrogate model and collects multiple weight sets from a few additional training epochs with a constant and high learning rate. LGV exploits two geometric properties that we relate to transferability. First, models that belong to a wider weight optimum are better surrogates. Second, we identify a subspace able to generate an effective surrogate ensemble among this wider optimum. Through extensive experiments, we show that LGV alone outperforms all (combinations of) four established test-time transformations by 1.8 to 59.9 percentage points. Our findings shed new light on the importance of the geometry of the weight space to explain the transferability of adversarial examples.

preprint2022arXiv

Predicting Flaky Tests Categories using Few-Shot Learning

Flaky tests are tests that yield different outcomes when run on the same version of a program. This non-deterministic behaviour plagues continuous integration with false signals, wasting developers' time and reducing their trust in test suites. Studies highlighted the importance of keeping tests flakiness-free. Recently, the research community has been pushing forward the detection of flaky tests by suggesting many static and dynamic approaches. While promising, those approaches mainly focus on classifying tests as flaky or not and, even when high performances are reported, it remains challenging to understand the cause of flakiness. This part is crucial for researchers and developers that aim to fix it. To help with the comprehension of a given flaky test, we propose FlakyCat, the first approach for classifying flaky tests based on their root cause category. FlakyCat relies on CodeBERT for code representation and leverages a Siamese network-based Few-Shot learning method to train a multi-class classifier with few data. We train and evaluate FlakyCat on a set of 343 flaky tests collected from open-source Java projects. Our evaluation shows that FlakyCat categorises flaky tests accurately, with a weighted F1 score of 70%. Furthermore, we investigate the performance of our approach for each category, revealing that Async waits, Unordered collections and Time-related flaky tests are accurately classified, while Concurrency-related flaky tests are more challenging to predict. Finally, to facilitate the comprehension of FlakyCat's predictions, we present a new technique for CodeBERT-based model interpretability that highlights code statements influencing the categorization.

preprint2022arXiv

What Made This Test Flake? Pinpointing Classes Responsible for Test Flakiness

Flaky tests are defined as tests that manifest non-deterministic behaviour by passing and failing intermittently for the same version of the code. These tests cripple continuous integration with false alerts that waste developers' time and break their trust in regression testing. To mitigate the effects of flakiness, both researchers and industrial experts proposed strategies and tools to detect and isolate flaky tests. However, flaky tests are rarely fixed as developers struggle to localise and understand their causes. Additionally, developers working with large codebases often need to know the sources of non-determinism to preserve code quality, i.e., avoid introducing technical debt linked with non-deterministic behaviour, and to avoid introducing new flaky tests. To aid with these tasks, we propose re-targeting Fault Localisation techniques to the flaky component localisation problem, i.e., pinpointing program classes that cause the non-deterministic behaviour of flaky tests. In particular, we employ Spectrum-Based Fault Localisation (SBFL), a coverage-based fault localisation technique commonly adopted for its simplicity and effectiveness. We also utilise other data sources, such as change history and static code metrics, to further improve the localisation. Our results show that augmenting SBFL with change and code metrics ranks flaky classes in the top-1 and top-5 suggestions, in 26% and 47% of the cases. Overall, we successfully reduced the average number of classes inspected to locate the first flaky class to 19% of the total number of classes covered by flaky tests. Our results also show that localisation methods are effective in major flakiness categories, such as concurrency and asynchronous waits, indicating their general ability to identify flaky components.

preprint2020arXiv

Data-driven Simulation and Optimization for Covid-19 Exit Strategies

The rapid spread of the Coronavirus SARS-2 is a major challenge that led almost all governments worldwide to take drastic measures to respond to the tragedy. Chief among those measures is the massive lockdown of entire countries and cities, which beyond its global economic impact has created some deep social and psychological tensions within populations. While the adopted mitigation measures (including the lockdown) have generally proven useful, policymakers are now facing a critical question: how and when to lift the mitigation measures? A carefully-planned exit strategy is indeed necessary to recover from the pandemic without risking a new outbreak. Classically, exit strategies rely on mathematical modeling to predict the effect of public health interventions. Such models are unfortunately known to be sensitive to some key parameters, which are usually set based on rules-of-thumb.In this paper, we propose to augment epidemiological forecasting with actual data-driven models that will learn to fine-tune predictions for different contexts (e.g., per country). We have therefore built a pandemic simulation and forecasting toolkit that combines a deep learning estimation of the epidemiological parameters of the disease in order to predict the cases and deaths, and a genetic algorithm component searching for optimal trade-offs/policies between constraints and objectives set by decision-makers. Replaying pandemic evolution in various countries, we experimentally show that our approach yields predictions with much lower error rates than pure epidemiological models in 75% of the cases and achieves a 95% R2 score when the learning is transferred and tested on unseen countries. When used for forecasting, this approach provides actionable insights into the impact of individual measures and strategies.

preprint2020arXiv

IBIR: Bug Report driven Fault Injection

Much research on software engineering and software testing relies on experimental studies based on fault injection. Fault injection, however, is not often relevant to emulate real-world software faults since it "blindly" injects large numbers of faults. It remains indeed challenging to inject few but realistic faults that target a particular functionality in a program. In this work, we introduce IBIR, a fault injection tool that addresses this challenge by exploring change patterns associated to user-reported faults. To inject realistic faults, we create mutants by retargeting a bug report driven automated program repair system, i.e., reversing its code transformation templates. IBIR is further appealing in practice since it requires deep knowledge of neither of the code nor the tests, but just of the program's relevant bug reports. Thus, our approach focuses the fault injection on the feature targeted by the bug report. We assess IBIR by considering the Defects4J dataset. Experimental results show that our approach outperforms the fault injection performed by traditional mutation testing in terms of semantic similarity with the original bug, when applied at either system or class levels of granularity, and provides better, statistically significant, estimations of test effectiveness (fault detection). Additionally, when injecting 100 faults, IBIR injects faults that couple with the real ones in 36% of the cases, while mutants from mutation testing inject less than 1%. Overall, IBIR targets real functionality and injects realistic and diverse faults.

preprint2020arXiv

Killing Stubborn Mutants with Symbolic Execution

We introduce SeMu, a Dynamic Symbolic Execution technique that generates test inputs capable of killing stubborn mutants (killable mutants that remain undetected after a reasonable amount of testing). SeMu aims at mutant propagation (triggering erroneous states to the program output) by incrementally searching for divergent program behaviours between the original and the mutant versions. We model the mutant killing problem as a symbolic execution search within a specific area in the programs' symbolic tree. In this framework, the search area is defined and controlled by parameters that allow scalable and cost-effective mutant killing. We integrate SeMu in KLEE and experimented with Coreutils (a benchmark frequently used in symbolic execution studies). Our results show that our modelling plays an important role in mutant killing. Perhaps more importantly, our results also show that, within a two-hour time limit, SeMu kills 37% of the stubborn mutants, where KLEE kills none and where the mutant infection strategy (strategy suggested by previous research) kills 17%.

preprint2020arXiv

Learning to Catch Security Patches

Timely patching is paramount to safeguard users and maintainers against dire consequences of malicious attacks. In practice, patching is prioritized following the nature of the code change that is committed in the code repository. When such a change is labeled as being security-relevant, i.e., as fixing a vulnerability, maintainers rapidly spread the change and users are notified about the need to update to a new version of the library or of the application. Unfortunately, oftentimes, some security-relevant changes go unnoticed as they represent silent fixes of vulnerabilities. In this paper, we propose a Co-Training-based approach to catch security patches as part of an automatic monitoring service of code repositories. Leveraging different classes of features, we empirically show that such automation is feasible and can yield a precision of over 90% in identifying security patches, with an unprecedented recall of over 80%. Beyond such a benchmarking with ground truth data which demonstrates an improvement over the state-of-the-art, we confirmed that our approach can help catch security patches that were not reported as such.

preprint2020arXiv

On the Efficiency of Test Suite based Program Repair: A Systematic Assessment of 16 Automated Repair Systems for Java Programs

Test-based automated program repair has been a prolific field of research in software engineering in the last decade. Many approaches have indeed been proposed, which leverage test suites as a weak, but affordable, approximation to program specifications. Although the literature regularly sets new records on the number of benchmark bugs that can be fixed, several studies increasingly raise concerns about the limitations and biases of state-of-the-art approaches. For example, the correctness of generated patches has been questioned in a number of studies, while other researchers pointed out that evaluation schemes may be misleading with respect to the processing of fault localization results. Nevertheless, there is little work addressing the efficiency of patch generation, with regard to the practicality of program repair. In this paper, we fill this gap in the literature, by providing an extensive review on the efficiency of test suite based program repair. Our objective is to assess the number of generated patch candidates, since this information is correlated to (1) the strategy to traverse the search space efficiently in order to select sensical repair attempts, (2) the strategy to minimize the test effort for identifying a plausible patch, (3) as well as the strategy to prioritize the generation of a correct patch. To that end, we perform a large-scale empirical study on the efficiency, in terms of quantity of generated patch candidates of the 16 open-source repair tools for Java programs. The experiments are carefully conducted under the same fault localization configurations to limit biases.

preprint2020arXiv

What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning

Recent successes in training word embeddings for NLP tasks have encouraged a wave of research on representation learning for source code, which builds on similar NLP methods. The overall objective is then to produce code embeddings that capture the maximum of program semantics. State-of-the-art approaches invariably rely on a syntactic representation (i.e., raw lexical tokens, abstract syntax trees, or intermediate representation tokens) to generate embeddings, which are criticized in the literature as non-robust or non-generalizable. In this work, we investigate a novel embedding approach based on the intuition that source code has visual patterns of semantics. We further use these patterns to address the outstanding challenge of identifying semantic code clones. We propose the WYSIWIM ("What You See Is What It Means") approach where visual representations of source code are fed into powerful pre-trained image classification neural networks from the field of computer vision to benefit from the practical advantages of transfer learning. We evaluate the proposed embedding approach on two variations of the task of semantic code clone identification: code clone detection (a binary classification problem), and code classification (a multi-classification problem). We show with experiments on the BigCloneBench (Java) and Open Judge (C) datasets that although simple, our WYSIWIM approach performs as effectively as state of the art approaches such as ASTNN or TBCNN. We further explore the influence of different steps in our approach, such as the choice of visual representations or the classification algorithm, to eventually discuss the promises and limitations of this research direction.

preprint2016arXiv

Assessing and Comparing Mutation-based Fault Localization Techniques

Recent research demonstrated that mutation-based fault localization techniques are relatively accurate and practical. However, these methods have never been compared and have only been assessed with simple hand-seeded faults. Therefore, their actual practicality is questionable when it comes to real-wold faults. To deal with this limitation we asses and compare the two main mutation-based fault localization methods, named Metallaxis and MUSE, on a set of real-world programs and faults. Our results based on three typical evaluation metrics indicate that mutation-based fault localization methods are relatively accurate and provide relevant information to developers. Overall, our result indicate that Metallaxis and MUSE require 18% and 37% of the program statements to find the sought faults. Additionally, both methods locate 50% and 80% of the studied faults when developers inspect 10 and 25 statements.

preprint2016arXiv

Assessing and Improving the Mutation Testing Practice of PIT

Mutation testing is used extensively to support the experimentation of software engineering studies. Its application to real-world projects is possible thanks to modern tools that automate the whole mutation analysis process. However, popular mutation testing tools use a restrictive set of mutants which do not conform to the community standards as supported by the mutation testing literature. This can be problematic since the effectiveness of mutation depends on its mutants. We therefore examine how effective are the mutants of a popular mutation testing tool, named PIT, compared to comprehensive ones, as drawn from the literature and personal experience. We show that comprehensive mutants are harder to kill and encode faults not captured by the mutants of PIT for a range of 11% to 62% of the Java classes of the considered projects.

preprint2015arXiv

An Extensive Systematic Review on Model-Driven Development of Secure Systems

Context: Model-Driven Security (MDS) is as a specialised Model-Driven Engineering research area for supporting the development of secure systems. Over a decade of research on MDS has resulted in a large number of publications. Objective: To provide a detailed analysis of the state of the art in MDS, a systematic literature review (SLR) is essential. Method: We conducted an extensive SLR on MDS. Derived from our research questions, we designed a rigorous, extensive search and selection process to identify a set of primary MDS studies that is as complete as possible. Our three-pronged search process consists of automatic searching, manual searching, and snowballing. After discovering and considering more than thousand relevant papers, we identified, strictly selected, and reviewed 108 MDS publications. Results: The results of our SLR show the overall status of the key artefacts of MDS, and the identified primary MDS studies. E.g. regarding security modelling artefact, we found that developing domain-specific languages plays a key role in many MDS approaches. The current limitations in each MDS artefact are pointed out and corresponding potential research directions are suggested. Moreover, we categorise the identified primary MDS studies into 5 principal MDS studies, and other emerging or less common MDS studies. Finally, some trend analyses of MDS research are given. Conclusion: Our results suggest the need for addressing multiple security concerns more systematically and simultaneously, for tool chains supporting the MDS development cycle, and for more empirical studies on the application of MDS methodologies. To the best of our knowledge, this SLR is the first in the field of Software Engineering that combines a snowballing strategy with database searching. This combination has delivered an extensive literature study on MDS.

preprint2015arXiv

An Investigation into the Use of Common Libraries in Android Apps

The packaging model of Android apps requires the entire code necessary for the execution of an app to be shipped into one single apk file. Thus, an analysis of Android apps often visits code which is not part of the functionality delivered by the app. Such code is often contributed by the common libraries which are used pervasively by all apps. Unfortunately, Android analyses, e.g., for piggybacking detection and malware detection, can produce inaccurate results if they do not take into account the case of library code, which constitute noise in app features. Despite some efforts on investigating Android libraries, the momentum of Android research has not yet produced a complete set of common libraries to further support in-depth analysis of Android apps. In this paper, we leverage a dataset of about 1.5 million apps from Google Play to harvest potential common libraries, including advertisement libraries. With several steps of refinements, we finally collect by far the largest set of 1,113 libraries supporting common functionalities and 240 libraries for advertisement. We use the dataset to investigates several aspects of Android libraries, including their popularity and their proportion in Android app code. Based on these datasets, we have further performed several empirical investigations to confirm the motivations behind our work.

preprint2014arXiv

Artificial Mutation inspired Hyper-heuristic for Runtime Usage of Multi-objective Algorithms

In the last years, multi-objective evolutionary algorithms (MOEA) have been applied to different software engineering problems where many conflicting objectives have to be optimized simultaneously. In theory, evolutionary algorithms feature a nice property for runtime optimization as they can provide a solution in any execution time. In practice, based on a Darwinian inspired natural selection, these evolutionary algorithms produce many deadborn solutions whose computation results in a computational resources wastage: natural selection is naturally slow. In this paper, we reconsider this founding analogy to accelerate convergence of MOEA, by looking at modern biology studies: artificial selection has been used to achieve an anticipated specific purpose instead of only relying on crossover and natural selection (i.e., Muller et al [18] research on artificial mutation of fruits with X-Ray). Putting aside the analogy with natural selection , the present paper proposes an hyper-heuristic for MOEA algorithms named Sputnik 1 that uses artificial selective mutation to improve the convergence speed of MOEA. Sputnik leverages the past history of mutation efficiency to select the most relevant mutations to perform. We evaluate Sputnik on a cloud-reasoning engine, which drives on-demand provisioning while considering conflicting performance and cost objectives. We have conducted experiments to highlight the significant performance improvement of Sputnik in terms of resolution time.

preprint2014arXiv

I know what leaked in your pocket: uncovering privacy leaks on Android Apps with Static Taint Analysis

Android applications may leak privacy data carelessly or maliciously. In this work we perform inter-component data-flow analysis to detect privacy leaks between components of Android applications. Unlike all current approaches, our tool, called IccTA, propagates the context between the components, which improves the precision of the analysis. IccTA outperforms all other available tools by reaching a precision of 95.0% and a recall of 82.6% on DroidBench. Our approach detects 147 inter-component based privacy leaks in 14 applications in a set of 3000 real-world applications with a precision of 88.4%. With the help of ApkCombiner, our approach is able to detect inter-app based privacy leaks.

preprint2014arXiv

Leveraging Time Distortion for seamless Navigation into Data Space-Time Continuum

Intelligent software systems continuously analyze their surrounding environment and accordingly adapt their internal state. Depending on the criticality index of the situation, the system should dynamically focus or widen its analysis and reasoning scope. A naive -why have less when you can have more- approach would consist in systematically sampling the context at a very high rate and triggering the reasoning process regularly. This reasoning process would then need to mine a huge amount of data, extract a relevant view, and finally analyze this adequate view. This overall process would require some heavy resources and/or be time-consuming, conflicting with the (near) real-time response time requirements of intelligent systems. We claim that a continuous and more flexible navigation into context models, in space and in time, can significantly improve reasoning processes. This paper thus introduces a novel modeling approach together with a navigation concept, which consider time and locality as first-class properties crosscutting any model element, and enable the seamless navigation of models in this space-time continuum. In particular, we leverage a time-relative navigation (inspired by the space-time and distortion theory [7]) able to efficiently empower continuous reasoning processes. We integrate our approach into an open-source modeling framework and evaluate it on a smart grid reasoning engine for electric load prediction. We demonstrate that reasoners leveraging this distorted space-time continuum outperform the full sampling approach, and is compatible with most of (near) real-time requirements.

preprint2013arXiv

Automatically Securing Permission-Based Software by Reducing the Attack Surface: An Application to Android

A common security architecture, called the permission-based security model (used e.g. in Android and Blackberry), entails intrinsic risks. For instance, applications can be granted more permissions than they actually need, what we call a "permission gap". Malware can leverage the unused permissions for achieving their malicious goals, for instance using code injection. In this paper, we present an approach to detecting permission gaps using static analysis. Our prototype implementation in the context of Android shows that the static analysis must take into account a significant amount of platform-specific knowledge. Using our tool on two datasets of Android applications, we found out that a non negligible part of applications suffers from permission gaps, i.e. does not use all the permissions they declare.

preprint2013arXiv

In-Vivo Bytecode Instrumentation for Improving Privacy on Android Smartphones in Uncertain Environments

In this paper we claim that an efficient and readily applicable means to improve privacy of Android applications is: 1) to perform runtime monitoring by instrumenting the application bytecode and 2) in-vivo, i.e. directly on the smartphone. We present a tool chain to do this and present experimental results showing that this tool chain can run on smartphones in a reasonable amount of time and with a realistic effort. Our findings also identify challenges to be addressed before running powerful runtime monitoring and instrumentations directly on smartphones. We implemented two use-cases leveraging the tool chain: BetterPermissions, a fine-grained user centric permission policy system and AdRemover an advertisement remover. Both prototypes improve the privacy of Android systems thanks to in-vivo bytecode instrumentation.

preprint2012arXiv

Bypassing the Combinatorial Explosion: Using Similarity to Generate and Prioritize T-wise Test Suites for Large Software Product Lines

Software Product Lines (SPLs) are families of products whose commonalities and variability can be captured by Feature Models (FMs). T-wise testing aims at finding errors triggered by all interactions amongst t features, thus reducing drastically the number of products to test. T-wise testing approaches for SPLs are limited to small values of t -- which miss faulty interactions -- or limited by the size of the FM. Furthermore, they neither prioritize the products to test nor provide means to finely control the generation process. This paper offers (a) a search-based approach capable of generating products for large SPLs, forming a scalable and flexible alternative to current techniques and (b) prioritization algorithms for any set of products. Experiments conducted on 124 FMs (including large FMs such as the Linux kernel) demonstrate the feasibility and the practicality of our approach.

preprint2012arXiv

Model Driven Mutation Applied to Adaptative Systems Testing

Dynamically Adaptive Systems modify their behav- ior and structure in response to changes in their surrounding environment and according to an adaptation logic. Critical sys- tems increasingly incorporate dynamic adaptation capabilities; examples include disaster relief and space exploration systems. In this paper, we focus on mutation testing of the adaptation logic. We propose a fault model for adaptation logics that classifies faults into environmental completeness and adaptation correct- ness. Since there are several adaptation logic languages relying on the same underlying concepts, the fault model is expressed independently from specific adaptation languages. Taking benefit from model-driven engineering technology, we express these common concepts in a metamodel and define the operational semantics of mutation operators at this level. Mutation is applied on model elements and model transformations are used to propagate these changes to a given adaptation policy in the chosen formalism. Preliminary results on an adaptive web server highlight the difficulty of killing mutants for adaptive systems, and thus the difficulty of generating efficient tests.

preprint2012arXiv

XSS-FP: Browser Fingerprinting using HTML Parser Quirks

There are many scenarios in which inferring the type of a client browser is desirable, for instance to fight against session stealing. This is known as browser fingerprinting. This paper presents and evaluates a novel fingerprinting technique to determine the exact nature (browser type and version, eg Firefox 15) of a web-browser, exploiting HTML parser quirks exercised through XSS. Our experiments show that the exact version of a web browser can be determined with 71% of accuracy, and that only 6 tests are sufficient to quickly determine the exact family a web browser belongs to.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2605.16089:author:4:yves-le-traon

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.16099:author:5:yves-le-traon

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.16026:author:5:yves-le-traon

Imported May 20, 2026Synced May 20, 2026

18 works

Mike Papadakis

Researcher

Mike Papadakis contributes to research discovery and scholarly infrastructure.

Open to collaborate

12 works

Jacques Klein

Researcher

Jacques Klein contributes to research discovery and scholarly infrastructure.

Open to collaborate

12 works

Maxime Cordy

Researcher

Maxime Cordy contributes to research discovery and scholarly infrastructure.

Open to collaborate

5 works

Alexandre Bartel

Researcher

Alexandre Bartel contributes to research discovery and scholarly infrastructure.

Open to collaborate

Yves Le Traon

What is connected

Connect this record

See the researcher in context

Building this map preview

35 published item(s)

Centralized vs Decentralized Federated Learning: A trade-off performance analysis

Federated Imputation under Heterogeneous Feature Spaces

From Flat Language Labels to Typological Priors: Structured Language Conditioning for Multilingual Speech-to-Speech Translation

Efficient Mutation Testing via Pre-Trained Language Models

LaF: Labeling-Free Model Selection for Automated Deep Neural Network Reusing

MIXCODE: Enhancing Code Classification by Mixup-Based Data Augmentation

A Unified Framework for Adversarial Attack and Defense in Constrained Feature Space

An In-depth Study of Java Deserialization Remote-Code Execution Exploits and Vulnerabilities

Cerebro: Static Subsuming Mutant Selection

Characterizing and Understanding the Behavior of Quantized Models for Reliable Deployment

CodeBERT-nt: code naturalness via CodeBERT

Efficient and Transferable Adversarial Examples from Bayesian Neural Networks

GraphCode2Vec: Generic Code Embedding via Lexical and Program Dependence Analyses

Influence-Driven Data Poisoning in Graph-Based Semi-Supervised Classifiers

LGV: Boosting Adversarial Example Transferability from Large Geometric Vicinity

Predicting Flaky Tests Categories using Few-Shot Learning

What Made This Test Flake? Pinpointing Classes Responsible for Test Flakiness

Data-driven Simulation and Optimization for Covid-19 Exit Strategies

IBIR: Bug Report driven Fault Injection

Killing Stubborn Mutants with Symbolic Execution

Learning to Catch Security Patches

On the Efficiency of Test Suite based Program Repair: A Systematic Assessment of 16 Automated Repair Systems for Java Programs

What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning

Assessing and Comparing Mutation-based Fault Localization Techniques

Assessing and Improving the Mutation Testing Practice of PIT

An Extensive Systematic Review on Model-Driven Development of Secure Systems

An Investigation into the Use of Common Libraries in Android Apps

Artificial Mutation inspired Hyper-heuristic for Runtime Usage of Multi-objective Algorithms

I know what leaked in your pocket: uncovering privacy leaks on Android Apps with Static Taint Analysis

Leveraging Time Distortion for seamless Navigation into Data Space-Time Continuum

Automatically Securing Permission-Based Software by Reducing the Attack Surface: An Application to Android

In-Vivo Bytecode Instrumentation for Improving Privacy on Android Smartphones in Uncertain Environments

Bypassing the Combinatorial Explosion: Using Similarity to Generate and Prioritize T-wise Test Suites for Large Software Product Lines

Model Driven Mutation Applied to Adaptative Systems Testing

XSS-FP: Browser Fingerprinting using HTML Parser Quirks