Researcher profile

Chen Hajaj

Chen Hajaj contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2022arXiv

Do You Think You Can Hold Me? The Real Challenge of Problem-Space Evasion Attacks

Android malware is a spreading disease in the virtual world. Anti-virus and detection systems continuously undergo patches and updates to defend against these threats. Most of the latest approaches in malware detection use Machine Learning (ML). Against the robustifying effort of detection systems, raise the \emph{evasion attacks}, where an adversary changes its targeted samples so that they are misclassified as benign. This paper considers two kinds of evasion attacks: feature-space and problem-space. \emph{Feature-space} attacks consider an adversary who manipulates ML features to evade the correct classification while minimizing or constraining the total manipulations. \textit{Problem-space} attacks refer to evasion attacks that change the actual sample. Specifically, this paper analyzes the gap between these two types in the Android malware domain. The gap between the two types of evasion attacks is examined via the retraining process of classifiers using each one of the evasion attack types. The experiments show that the gap between these two types of retrained classifiers is dramatic and may increase to 96\%. Retrained classifiers of feature-space evasion attacks have been found to be either less effective or completely ineffective against problem-space evasion attacks. Additionally, exploration of different problem-space evasion attacks shows that retraining of one problem-space evasion attack may be effective against other problem-space evasion attacks.

preprint2022arXiv

MaMaDroid2.0 -- The Holes of Control Flow Graphs

Android malware is a continuously expanding threat to billions of mobile users around the globe. Detection systems are updated constantly to address these threats. However, a backlash takes the form of evasion attacks, in which an adversary changes malicious samples such that those samples will be misclassified as benign. This paper fully inspects a well-known Android malware detection system, MaMaDroid, which analyzes the control flow graph of the application. Changes to the portion of benign samples in the train set and models are considered to see their effect on the classifier. The changes in the ratio between benign and malicious samples have a clear effect on each one of the models, resulting in a decrease of more than 40% in their detection rate. Moreover, adopted ML models are implemented as well, including 5-NN, Decision Tree, and Adaboost. Exploration of the six models reveals a typical behavior in different cases, of tree-based models and distance-based models. Moreover, three novel attacks that manipulate the CFG and their detection rates are described for each one of the targeted models. The attacks decrease the detection rate of most of the models to 0%, with regards to different ratios of benign to malicious apps. As a result, a new version of MaMaDroid is engineered. This model fuses the CFG of the app and static analysis of features of the app. This improved model is proved to be robust against evasion attacks targeting both CFG-based models and static analysis models, achieving a detection rate of more than 90% against each one of the attacks.

preprint2022arXiv

Open-Source Framework for Encrypted Internet and Malicious Traffic Classification

Internet traffic classification plays a key role in network visibility, Quality of Services (QoS), intrusion detection, Quality of Experience (QoE) and traffic-trend analyses. In order to improve privacy, integrity, confidentiality, and protocol obfuscation, the current traffic is based on encryption protocols, e.g., SSL/TLS. With the increased use of Machine-Learning (ML) and Deep-Learning (DL) models in the literature, comparison between different models and methods has become cumbersome and difficult due to a lack of a standardized framework. In this paper, we propose an open-source framework, named OSF-EIMTC, which can provide the full pipeline of the learning process. From the well-known datasets to extracting new and well-known features, it provides implementations of well-known ML and DL models (from the traffic classification literature) as well as evaluations. Such a framework can facilitate research in traffic classification domains, so that it will be more repeatable, reproducible, easier to execute, and will allow a more accurate comparison of well-known and novel features and models. As part of our framework evaluation, we demonstrate a variety of cases where the framework can be of use, utilizing multiple datasets, models, and feature sets. We show analyses of publicly available datasets and invite the community to participate in our open challenges using the OSF-EIMTC.

preprint2022arXiv

Problem-Space Evasion Attacks in the Android OS: a Survey

Android is the most popular OS worldwide. Therefore, it is a target for various kinds of malware. As a countermeasure, the security community works day and night to develop appropriate Android malware detection systems, with ML-based or DL-based systems considered as some of the most common types. Against these detection systems, intelligent adversaries develop a wide set of evasion attacks, in which an attacker slightly modifies a malware sample to evade its target detection system. In this survey, we address problem-space evasion attacks in the Android OS, where attackers manipulate actual APKs, rather than their extracted feature vector. We aim to explore this kind of attacks, frequently overlooked by the research community due to a lack of knowledge of the Android domain, or due to focusing on general mathematical evasion attacks - i.e., feature-space evasion attacks. We discuss the different aspects of problem-space evasion attacks, using a new taxonomy, which focuses on key ingredients of each problem-space attack, such as the attacker model, the attacker's mode of operation, and the functional assessment of post-attack applications.

preprint2022arXiv

When a RF Beats a CNN and GRU, Together -- A Comparison of Deep Learning and Classical Machine Learning Approaches for Encrypted Malware Traffic Classification

Internet traffic classification is widely used to facilitate network management. It plays a crucial role in Quality of Services (QoS), Quality of Experience (QoE), network visibility, intrusion detection, and traffic trend analyses. While there is no theoretical guarantee that deep learning (DL)-based solutions perform better than classic machine learning (ML)-based ones, DL-based models have become the common default. This paper compares well-known DL-based and ML-based models and shows that in the case of malicious traffic classification, state-of-the-art DL-based solutions do not necessarily outperform the classical ML-based ones. We exemplify this finding using two well-known datasets for a varied set of tasks, such as: malware detection, malware family classification, detection of zero-day attacks, and classification of an iteratively growing dataset. Note that, it is not feasible to evaluate all possible models to make a concrete statement, thus, the above finding is not a recommendation to avoid DL-based models, but rather empirical proof that in some cases, there are more simplistic solutions, that may perform even better.

preprint2020arXiv

Less is More: Robust and Novel Features for Malicious Domain Detection

Malicious domains are increasingly common and pose a severe cybersecurity threat. Specifically, many types of current cyber attacks use URLs for attack communications (e.g., C\&C, phishing, and spear-phishing). Despite the continuous progress in detecting these attacks, many alarming problems remain open, such as the weak spots of the defense mechanisms. Since machine learning has become one of the most prominent methods of malware detection, A robust feature selection mechanism is proposed that results in malicious domain detection models that are resistant to evasion attacks. This mechanism exhibits high performance based on empirical data. This paper makes two main contributions: First, it provides an analysis of robust feature selection based on widely used features in the literature. Note that even though the feature set dimensional space is reduced by half (from nine to four features), the performance of the classifier is still improved (an increase in the model's F1-score from 92.92\% to 95.81\%). Second, it introduces novel features that are robust to the adversary's manipulation. Based on an extensive evaluation of the different feature sets and commonly used classification models, this paper shows that models that are based on robust features are resistant to malicious perturbations, and at the same time useful for classifying non-manipulated data.

preprint2020arXiv

Robust Machine Learning for Encrypted Traffic Classification

Desktops and laptops can be maliciously exploited to violate privacy. In this paper, we consider the daily battle between the passive attacker who is targeting a specific user against a user that may be adversarial opponent. In this scenario, while the attacker tries to choose the best vector attack by surreptitiously monitoring the victims encrypted network traffic in order to identify users parameters such as the Operating System (OS), browser and apps. The user may use tools such as a Virtual Private Network (VPN) or even change protocols parameters to protect his/her privacy. We provide a large dataset of more than 20,000 examples for this task. We run a comprehensive set of experiments, that achieves high (above 85) classification accuracy, robustness and resilience to changes of features as a function of different network conditions at test time. We also show the effect of a small training set on the accuracy.

preprint2020arXiv

When the Guard failed the Droid: A case study of Android malware

Android malware is a persistent threat to billions of users around the world. As a countermeasure, Android malware detection systems are occasionally implemented. However, these systems are often vulnerable to \emph{evasion attacks}, in which an adversary manipulates malicious instances so that they are misidentified as benign. In this paper, we launch various innovative evasion attacks against several Android malware detection systems. The vulnerability inherent to all of these systems is that they are part of Androguard~\cite{desnos2011androguard}, a popular open source library used in Android malware detection systems. Some of the detection systems decrease to a 0\% detection rate after the attack. Therefore, the use of open source libraries in malware detection systems calls for caution. In addition, we present a novel evaluation scheme for evasion attack generation that exploits the weak spots of known Android malware detection systems. In so doing, we evaluate the functionality and maliciousness of the manipulated instances created by our evasion attacks. We found variations in both the maliciousness and functionality tests of our manipulated apps. We show that non-functional apps, while considered malicious, do not threaten users and are thus useless from an attacker's point of view. We conclude that evasion attacks must be assessed for both functionality and maliciousness to evaluate their impact, a step which is far from commonplace today.