Researcher profile

Sanjay Jha

Sanjay Jha contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2022arXiv

De-anonymisation attacks on Tor: A Survey

Anonymity networks are becoming increasingly popular in today's online world as more users attempt to safeguard their online privacy. Tor is currently the most popular anonymity network in use and provides anonymity to both users and services (hidden services). However, the anonymity provided by Tor is also being misused in various ways. Hosting illegal sites for selling drugs, hosting command and control servers for botnets, and distributing censored content are but a few such examples. As a result, various parties, including governments and law enforcement agencies, are interested in attacks that assist in de-anonymising the Tor network, disrupting its operations, and bypassing its censorship circumvention mechanisms. In this survey paper, we review known Tor attacks and identify current techniques for the de-anonymisation of Tor users and hidden services. We discuss these techniques and analyse the practicality of their execution method. We conclude by discussing improvements to the Tor framework that help prevent the surveyed de-anonymisation attacks.

preprint2022arXiv

Fake News Quick Detection on Dynamic Heterogeneous Information Networks

The spread of fake news has caused great harm to society in recent years. So the quick detection of fake news has become an important task. Some current detection methods often model news articles and other related components as a static heterogeneous information network (HIN) and use expensive message-passing algorithms. However, in the real-world, quickly identifying fake news is of great significance and the network may vary over time in terms of dynamic nodes and edges. Therefore, in this paper, we propose a novel Dynamic Heterogeneous Graph Neural Network (DHGNN) for fake news quick detection. More specifically, we first implement BERT and fine-tuned BERT to get a semantic representation of the news article contents and author profiles and convert it into graph data. Then, we construct the heterogeneous news-author graph to reflect contextual information and relationships. Additionally, we adapt ideas from personalized PageRank propagation and dynamic propagation to heterogeneous networks in order to reduce the time complexity of back-propagating through many nodes during training. Experiments on three real-world fake news datasets show that DHGNN can outperform other GNN-based models in terms of both effectiveness and efficiency.

preprint2022arXiv

PhishSim: Aiding Phishing Website Detection with a Feature-Free Tool

In this paper, we propose a feature-free method for detecting phishing websites using the Normalized Compression Distance (NCD), a parameter-free similarity measure which computes the similarity of two websites by compressing them, thus eliminating the need to perform any feature extraction. It also removes any dependence on a specific set of website features. This method examines the HTML of webpages and computes their similarity with known phishing websites, in order to classify them. We use the Furthest Point First algorithm to perform phishing prototype extractions, in order to select instances that are representative of a cluster of phishing webpages. We also introduce the use of an incremental learning algorithm as a framework for continuous and adaptive detection without extracting new features when concept drift occurs. On a large dataset, our proposed method significantly outperforms previous methods in detecting phishing websites, with an AUC score of 98.68%, a high true positive rate (TPR) of around 90%, while maintaining a low false positive rate (FPR) of 0.58%. Our approach uses prototypes, eliminating the need to retain long term data in the future, and is feasible to deploy in real systems with a processing time of roughly 0.3 seconds.

preprint2021arXiv

All Infections are Not Created Equal: Time-Sensitive Prediction of Malware Generated Network Attacks

Many techniques have been proposed for quickly detecting and containing malware-generated network attacks such as large-scale denial of service attacks; unfortunately, much damage is already done within the first few minutes of an attack, before it is identified and contained. There is a need for an early warning system that can predict attacks before they actually manifest, so that upcoming attacks can be prevented altogether by blocking the hosts that are likely to engage in attacks. However, blocking responses may disrupt legitimate processes on blocked hosts; in order to minimise user inconvenience, it is important to also foretell the time when the predicted attacks will occur, so that only the most urgent threats result in auto-blocking responses, while less urgent ones are first manually investigated. To this end, we identify a typical infection sequence followed by modern malware; modelling this sequence as a Markov chain and training it on real malicious traffic, we are able to identify behaviour most likely to lead to attacks and predict 98\% of real-world spamming and port-scanning attacks before they occur. Moreover, using a Semi-Markov chain model, we are able to foretell the time of upcoming attacks, a novel capability that allows accurately predicting the times of 97% of real-world malware attacks. Our work represents an important and timely step towards enabling flexible threat response models that minimise disruption to legitimate users.

preprint2021arXiv

Network Growth From Global and Local Influential Nodes

In graph theory and network analysis, node degree is defined as a simple but powerful centrality to measure the local influence of node in a complex network. Preferential attachment based on node degree has been widely adopted for modeling network growth. However, many evidences exist which show deviation of real network growth from what a pure degree-based model suggests. It seems that node degree is not a reliable measure for predicting the preference of newcomers in attaching to the network, or at least, it does not tell the whole story. In this paper, we argue that there is another dimension to network growth, one that we call node "coreness". The new dimension gives insights on the global influence of nodes, in comparison to the local view the degree metric provides. We found that the probability of existing nodes attracting new nodes generally follows an exponential dependence on node coreness, while at the same time, follows a power-law dependence on node degree. That is to say, high-coreness nodes are more powerful than high-degree nodes in attracting newcomers. The new dimension further discloses some hidden phenomena which happen in the process of network growth. The power of node degree in attracting newcomers increases over time while the influence of coreness decreases, and finally, they reach a state of equilibrium in the growth. All these theories have been tested on real-world networks.

preprint2020arXiv

A Survey of COVID-19 Contact Tracing Apps

The recent outbreak of COVID-19 has taken the world by surprise, forcing lockdowns and straining public health care systems. COVID-19 is known to be a highly infectious virus, and infected individuals do not initially exhibit symptoms, while some remain asymptomatic. Thus, a non-negligible fraction of the population can, at any given time, be a hidden source of transmissions. In response, many governments have shown great interest in smartphone contact tracing apps that help automate the difficult task of tracing all recent contacts of newly identified infected individuals. However, tracing apps have generated much discussion around their key attributes, including system architecture, data management, privacy, security, proximity estimation, and attack vulnerability. In this article, we provide the first comprehensive review of these much-discussed tracing app attributes. We also present an overview of many proposed tracing app examples, some of which have been deployed countrywide, and discuss the concerns users have reported regarding their usage. We close by outlining potential research directions for next-generation app design, which would facilitate improved tracing and security performance, as well as wide adoption by the population at large.

preprint2020arXiv

B-FERL: Blockchain based Framework for Securing Smart Vehicles

The ubiquity of connecting technologies in smart vehicles and the incremental automation of its functionalities promise significant benefits, including a significant decline in congestion and road fatalities. However, increasing automation and connectedness broadens the attack surface and heightens the likelihood of a malicious entity successfully executing an attack. In this paper, we propose a Blockchain based Framework for sEcuring smaRt vehicLes (B-FERL). B-FERL uses permissioned blockchain technology to tailor information access to restricted entities in the connected vehicle ecosystem. It also uses a challenge-response data exchange between the vehicles and roadside units to monitor the internal state of the vehicle to identify cases of in-vehicle network compromise. In order to enable authentic and valid communication in the vehicular network, only vehicles with a verifiable record in the blockchain can exchange messages. Through qualitative arguments, we show that B-FERL is resilient to identified attacks. Also, quantitative evaluations in an emulated scenario show that B-FERL ensures a suitable response time and required storage size compatible with realistic scenarios. Finally, we demonstrate how B-FERL achieves various important functions relevant to the automotive ecosystem such as trust management, vehicular forensics and secure vehicular networks.

preprint2020arXiv

Health Access Broker: Secure, Patient-Controlled Management of Personal Health Records in the Cloud

Secure and privacy-preserving management of Personal Health Records (PHRs) has proved to be a major challenge in modern healthcare. Current solutions generally do not offer patients a choice in where the data is actually stored and also rely on at least one fully trusted element that patients must also trust with their data. In this work, we present the Health Access Broker (HAB), a patient-controlled service for secure PHR sharing that (a) does not impose a specific storage location (uniquely for a PHR system), and (b) does not assume any of its components to be fully secure against adversarial threats. Instead, HAB introduces a novel auditing and intrusion-detection mechanism where its workflow is securely logged and continuously inspected to provide auditability of data access and quickly detect any intrusions.

preprint2020arXiv

Leveraging lightweight blockchain to establish data integrity for surveillance cameras

The video footage produced by the surveillance cameras is an important evidence to support criminal investigations. Video evidence can be sourced from public (trusted) as well as private (untrusted) surveillance systems. This raises the issue of establishing integrity and auditability for information provided by the untrusted video sources. In this paper, we focus on a airport ecosystem, where multiple entities with varying levels of trust are involved in producing and exchanging video surveillance information. We present a framework to ensure the data integrity of the stored videos, allowing authorities to validate whether video footage has not been tampered. Our proposal uses a lightweight blockchain technology to store the video metadata as blockchain transactions to support the validation of video integrity. The proposed framework also ensures video auditability and non-repudiation. Our evaluations show that the overhead introduced by employing the blockchain to create and query the transactions introduces a very minor latency of a few milliseconds.

preprint2020arXiv

PhishZip: A New Compression-based Algorithm for Detecting Phishing Websites

Phishing has grown significantly in the past few years and is predicted to further increase in the future. The dynamics of phishing introduce challenges in implementing a robust phishing detection system and selecting features which can represent phishing despite the change of attack. In this paper, we propose PhishZip which is a novel phishing detection approach using a compression algorithm to perform website classification and demonstrate a systematic way to construct the word dictionaries for the compression models using word occurrence likelihood analysis. PhishZip outperforms the use of best-performing HTML-based features in past studies, with a true positive rate of 80.04%. We also propose the use of compression ratio as a novel machine learning feature which significantly improves machine learning based phishing detection over previous studies. Using compression ratios as additional features, the true positive rate significantly improves by 30.3% (from 51.47% to 81.77%), while the accuracy increases by 11.84% (from 71.20% to 83.04%).