Source author record

Mohamed Nabeel

Mohamed Nabeel appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Machine Learning

Catalog footprint

What is connected

5works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Deep Dive into the Abuse of DL APIs To Create Malicious AI Models and How to Detect Them

According to Gartner, more than 70% of organizations will have integrated AI models into their workflows by the end of 2025. In order to reduce cost and foster innovation, it is often the case that pre-trained models are fetched from model hubs like Hugging Face or TensorFlow Hub. However, this introduces a security risk where attackers can inject malicious code into the models they upload to these hubs, leading to various kinds of attacks including remote code execution (RCE), sensitive data exfiltration, and system file modification when these models are loaded or executed (predict function). Since AI models play a critical role in digital transformation, this would drastically increase the number of software supply chain attacks. While there are several efforts at detecting malware when deserializing pickle based saved models (hiding malware in model parameters), the risk of abusing DL APIs (e.g. TensorFlow APIs) is understudied. Specifically, we show how one can abuse hidden functionalities of TensorFlow APIs such as file read/write and network send/receive along with their persistence APIs to launch attacks. It is concerning to note that existing scanners in model hubs like Hugging Face and TensorFlow Hub are unable to detect some of the stealthy abuse of such APIs. This is because scanning tools only have a syntactically identified set of suspicious functionality that is being analysed. They often do not have a semantic-level understanding of the functionality utilized. After demonstrating the possible attacks, we show how one may identify potentially abusable hidden API functionalities using LLMs and build scanners to detect such abuses.

preprint2022arXiv

A Large Scale Study and Classification of VirusTotal Reports on Phishing and Malware URLs

VirusTotal (VT) provides aggregated threat intelligence on various entities including URLs, IP addresses, and binaries. It is widely used by researchers and practitioners to collect ground truth and evaluate the maliciousness of entities. In this work, we provide a comprehensive analysis of VT URL scanning reports containing the results of 95 scanners for 1.577 Billion URLs over two years. Individual VT scanners are known to be noisy in terms of their detection and attack type classification. To obtain high quality ground truth of URLs and actively take proper actions to mitigate different types of attacks, there are two challenges: (1) how to decide whether a given URL is malicious given noisy reports and (2) how to determine attack types (e.g., phishing or malware hosting) that the URL is involved in, given conflicting attack labels from different scanners. In this work, we provide a systematic comparative study on the behavior of VT scanners for different attack types of URLs. A common practice to decide the maliciousness is to use a cut-off threshold of scanners that report the URL as malicious. However, in this work, we show that using a fixed threshold is suboptimal, due to several reasons: (1) correlations between scanners; (2) lead/lag behavior; (3) the specialty of scanners; (4) the quality and reliability of scanners. A common practice to determine an attack type is to use majority voting. However, we show that majority voting could not accurately classify the attack type of a URL due to the bias from correlated scanners. Instead, we propose a machine learning-based approach to assign an attack type to URLs given the VT reports.

preprint2022arXiv

CGraph: Graph Based Extensible Predictive Domain Threat Intelligence Platform

Ability to effectively investigate indicators of compromise and associated network resources involved in cyber attacks is paramount not only to identify affected network resources but also to detect related malicious resources. Today, most of the cyber threat intelligence platforms are reactive in that they can identify attack resources only after the attack is carried out. Further, these systems have limited functionality to investigate associated network resources. In this work, we propose an extensible predictive cyber threat intelligence platform called cGraph that addresses the above limitations. cGraph is built as a graph-first system where investigators can explore network resources utilizing a graph based API. Further, cGraph provides real-time predictive capabilities based on state-of-the-art inference algorithms to predict malicious domains from network graphs with a few known malicious and benign seeds. To the best of our knowledge, cGraph is the only threat intelligence platform to do so. cGraph is extensible in that additional network resources can be added to the system transparently.

preprint2022arXiv

PDNS-Net: A Large Heterogeneous Graph Benchmark Dataset of Network Resolutions for Graph Learning

In order to advance the state of the art in graph learning algorithms, it is necessary to construct large real-world datasets. While there are many benchmark datasets for homogeneous graphs, only a few of them are available for heterogeneous graphs. Furthermore, the latter graphs are small in size rendering them insufficient to understand how graph learning algorithms perform in terms of classification metrics and computational resource utilization. We introduce, PDNS-Net, the largest public heterogeneous graph dataset containing 447K nodes and 897K edges for the malicious domain classification task. Compared to the popular heterogeneous datasets IMDB and DBLP, PDNS-Net is 38 and 17 times bigger respectively. We provide a detailed analysis of PDNS-Net including the data collection methodology, heterogeneous graph construction, descriptive statistics and preliminary graph classification performance. The dataset is publicly available at https://github.com/qcri/PDNS-Net. Our preliminary evaluation of both popular homogeneous and heterogeneous graph neural networks on PDNS-Net reveals that further research is required to improve the performance of these models on large heterogeneous graphs.

preprint2022arXiv

PhishChain: A Decentralized and Transparent System to Blacklist Phishing URLs

Blacklists are a widely-used Internet security mechanism to protect Internet users from financial scams, malicious web pages and other cyber attacks based on blacklisted URLs. In this demo, we introduce PhishChain, a transparent and decentralized system to blacklisting phishing URLs. At present, public/private domain blacklists, such as PhishTank, CryptoScamDB, and APWG, are maintained by a centralized authority, but operate in a crowd sourcing fashion to create a manually verified blacklist periodically. In addition to being a single point of failure, the blacklisting process utilized by such systems is not transparent. We utilize the blockchain technology to support transparency and decentralization, where no single authority is controlling the blacklist and all operations are recorded in an immutable distributed ledger. Further, we design a page rank based truth discovery algorithm to assign a phishing score to each URL based on crowd sourced assessment of URLs. As an incentive for voluntary participation, we assign skill points to each user based on their participation in URL verification.

Mohamed Nabeel

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Deep Dive into the Abuse of DL APIs To Create Malicious AI Models and How to Detect Them

A Large Scale Study and Classification of VirusTotal Reports on Phishing and Malware URLs

CGraph: Graph Based Extensible Predictive Domain Threat Intelligence Platform

PDNS-Net: A Large Heterogeneous Graph Benchmark Dataset of Network Resolutions for Graph Learning

PhishChain: A Decentralized and Transparent System to Blacklist Phishing URLs