Source author record

Lorenzo Cavallaro

Lorenzo Cavallaro appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Artificial Intelligence Machine Learning

Catalog footprint

What is connected

7works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Transcending Transcend: Revisiting Malware Classification in the Presence of Concept Drift

Machine learning for malware classification shows encouraging results, but real deployments suffer from performance degradation as malware authors adapt their techniques to evade detection. This phenomenon, known as concept drift, occurs as new malware examples evolve and become less and less like the original training examples. One promising method to cope with concept drift is classification with rejection in which examples that are likely to be misclassified are instead quarantined until they can be expertly analyzed. We propose TRANSCENDENT, a rejection framework built on Transcend, a recently proposed strategy based on conformal prediction theory. In particular, we provide a formal treatment of Transcend, enabling us to refine conformal evaluation theory -- its underlying statistical engine -- and gain a better understanding of the theoretical reasons for its effectiveness. In the process, we develop two additional conformal evaluators that match or surpass the performance of the original while significantly decreasing the computational overhead. We evaluate TRANSCENDENT on a malware dataset spanning 5 years that removes sources of experimental bias present in the original evaluation. TRANSCENDENT outperforms state-of-the-art approaches while generalizing across different malware domains and classifiers. To further assist practitioners, we determine the optimal operational settings for a TRANSCENDENT deployment and show how it can be applied to many popular learning algorithms. These insights support both old and new empirical findings, making Transcend a sound and practical solution for the first time. To this end, we release TRANSCENDENT as open source, to aid the adoption of rejection strategies by the security community.

preprint2022arXiv

Designing a Provenance Analysis for SGX Enclaves

Intel SGX enables memory isolation and static integrity verification of code and data stored in user-space memory regions called enclaves. SGX effectively shields the execution of enclaves from the underlying untrusted OS. Attackers cannot tamper nor examine enclaves' content. However, these properties equally challenge defenders as they are precluded from any provenance analysis to infer intrusions inside SGX enclaves. In this work, we propose SgxMonitor, a novel provenance analysis to monitor and identify anomalous executions of enclave code. To this end, we design a technique to extract contextual runtime information from an enclave and propose a novel model to represent enclaves' intrusions. Our experiments show that not only SgxMonitor incurs an overhead comparable to traditional provenance tools, but it also exhibits macro-benchmarks' overheads and slowdowns that marginally affect real use cases deployment. Our evaluation shows SgxMonitor successfully identifies enclave intrusions carried out by the state of the art attacks while reporting no false positives and negatives during normal enclaves executions, thus supporting the use of SgxMonitor in realistic scenarios.

preprint2022arXiv

Jigsaw Puzzle: Selective Backdoor Attack to Subvert Malware Classifiers

Malware classifiers are subject to training-time exploitation due to the need to regularly retrain using samples collected from the wild. Recent work has demonstrated the feasibility of backdoor attacks against malware classifiers, and yet the stealthiness of such attacks is not well understood. In this paper, we investigate this phenomenon under the clean-label setting (i.e., attackers do not have complete control over the training or labeling process). Empirically, we show that existing backdoor attacks in malware classifiers are still detectable by recent defenses such as MNTD. To improve stealthiness, we propose a new attack, Jigsaw Puzzle (JP), based on the key observation that malware authors have little to no incentive to protect any other authors' malware but their own. As such, Jigsaw Puzzle learns a trigger to complement the latent patterns of the malware author's samples, and activates the backdoor only when the trigger and the latent pattern are pieced together in a sample. We further focus on realizable triggers in the problem space (e.g., software code) using bytecode gadgets broadly harvested from benign software. Our evaluation confirms that Jigsaw Puzzle is effective as a backdoor, remains stealthy against state-of-the-art defenses, and is a threat in realistic settings that depart from reasoning about feature-space only attacks. We conclude by exploring promising approaches to improve backdoor defenses.

preprint2022arXiv

Realizable Universal Adversarial Perturbations for Malware

Machine learning classifiers are vulnerable to adversarial examples -- input-specific perturbations that manipulate models' output. Universal Adversarial Perturbations (UAPs), which identify noisy patterns that generalize across the input space, allow the attacker to greatly scale up the generation of such examples. Although UAPs have been explored in application domains beyond computer vision, little is known about their properties and implications in the specific context of realizable attacks, such as malware, where attackers must satisfy challenging problem-space constraints. In this paper we explore the challenges and strengths of UAPs in the context of malware classification. We generate sequences of problem-space transformations that induce UAPs in the corresponding feature-space embedding and evaluate their effectiveness across different malware domains. Additionally, we propose adversarial training-based mitigations using knowledge derived from the problem-space transformations, and compare against alternative feature-space defenses. Our experiments limit the effectiveness of a white box Android evasion attack to ~20% at the cost of ~3% TPR at 1% FPR. We additionally show how our method can be adapted to more restrictive domains such as Windows malware. We observe that while adversarial training in the feature space must deal with large and often unconstrained regions, UAPs in the problem space identify specific vulnerabilities that allow us to harden a classifier more effectively, shifting the challenges and associated cost of identifying new universal adversarial transformations back to the attacker.

preprint2021arXiv

Identifying Authorship Style in Malicious Binaries: Techniques, Challenges & Datasets

Attributing a piece of malware to its creator typically requires threat intelligence. Binary attribution increases the level of difficulty as it mostly relies upon the ability to disassemble binaries to identify authorship style. Our survey explores malicious author style and the adversarial techniques used by them to remain anonymous. We examine the adversarial impact on the state-of-the-art methods. We identify key findings and explore the open research challenges. To mitigate the lack of ground truth datasets in this domain, we publish alongside this survey the largest and most diverse meta-information dataset of 15,660 malware labeled to 164 threat actor groups.

preprint2014arXiv

PuppetDroid: A User-Centric UI Exerciser for Automatic Dynamic Analysis of Similar Android Applications

Popularity and complexity of malicious mobile applications are rising, making their analysis difficult and labor intensive. Mobile application analysis is indeed inherently different from desktop application analysis: In the latter, the interaction of the user (i.e., victim) is crucial for the malware to correctly expose all its malicious behaviors. We propose a novel approach to analyze (malicious) mobile applications. The goal is to exercise the user interface (UI) of an Android application to effectively trigger malicious behaviors, automatically. Our key intuition is to record and reproduce the UI interactions of a potential victim of the malware, so as to stimulate the relevant behaviors during dynamic analysis. To make our approach scale, we automatically re-execute the recorded UI interactions on apps that are similar to the original ones. These characteristics make our system orthogonal and complementary to current dynamic analysis and UI-exercising approaches. We developed our approach and experimentally shown that our stimulation allows to reach a higher code coverage than automatic UI exercisers, so to unveil interesting malicious behaviors that are not exposed when using other approaches. Our approach is also suitable for crowdsourcing scenarios, which would push further the collection of new stimulation traces. This can potentially change the way we conduct dynamic analysis of (mobile) applications, from fully automatic only, to user-centric and collaborative too.

preprint2013arXiv

Tracking and Characterizing Botnets Using Automatically Generated Domains

Modern botnets rely on domain-generation algorithms (DGAs) to build resilient command-and-control infrastructures. Recent works focus on recognizing automatically generated domains (AGDs) from DNS traffic, which potentially allows to identify previously unknown AGDs to hinder or disrupt botnets' communication capabilities. The state-of-the-art approaches require to deploy low-level DNS sensors to access data whose collection poses practical and privacy issues, making their adoption problematic. We propose a mechanism that overcomes the above limitations by analyzing DNS traffic data through a combination of linguistic and IP-based features of suspicious domains. In this way, we are able to identify AGD names, characterize their DGAs and isolate logical groups of domains that represent the respective botnets. Moreover, our system enriches these groups with new, previously unknown AGD names, and produce novel knowledge about the evolving behavior of each tracked botnet. We used our system in real-world settings, to help researchers that requested intelligence on suspicious domains and were able to label them as belonging to the correct botnet automatically. Additionally, we ran an evaluation on 1,153,516 domains, including AGDs from both modern (e.g., Bamital) and traditional (e.g., Conficker, Torpig) botnets. Our approach correctly isolated families of AGDs that belonged to distinct DGAs, and set automatically generated from non-automatically generated domains apart in 94.8 percent of the cases.

Lorenzo Cavallaro

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Transcending Transcend: Revisiting Malware Classification in the Presence of Concept Drift

Designing a Provenance Analysis for SGX Enclaves

Jigsaw Puzzle: Selective Backdoor Attack to Subvert Malware Classifiers

Realizable Universal Adversarial Perturbations for Malware

Identifying Authorship Style in Malicious Binaries: Techniques, Challenges & Datasets

PuppetDroid: A User-Centric UI Exerciser for Automatic Dynamic Analysis of Similar Android Applications

Tracking and Characterizing Botnets Using Automatically Generated Domains