Researcher profile

Rachit Agarwal

Rachit Agarwal contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2022arXiv

DNS based In-Browser Cryptojacking Detection

The metadata aspect of Domain Names (DNs) enables us to perform a behavioral study of DNs and detect if a DN is involved in in-browser cryptojacking. Thus, we are motivated to study different temporal and behavioral aspects of DNs involved in cryptojacking. We use temporal features such as query frequency and query burst along with graph-based features such as degree and diameter, and non-temporal features such as the string-based to detect if a DNs is suspect to be involved in the in-browser cryptojacking. Then, we use them to train the Machine Learning (ML) algorithms over different temporal granularities such as 2 hours datasets and complete dataset. Our results show DecisionTrees classifier performs the best with 59.5% Recall on cryptojacked DN, while for unsupervised learning, K-Means with K=2 perform the best. Similarity analysis of the features reveals a minimal divergence between the cryptojacking DNs and other already known malicious DNs. It also reveals the need for improvements in the feature set of state-of-the-art methods to improve their accuracy in detecting in-browser cryptojacking. As added analysis, our signature-based analysis identifies that none-of-the Indian Government websites were involved in cryptojacking during October-December 2021. However, based on the resource utilization, we identify 10 DNs with different properties than others.

preprint2022arXiv

EPASAD: Ellipsoid decision boundary based Process-Aware Stealthy Attack Detector

Due to the importance of Critical Infrastructure (CI) in a nation's economy, they have been lucrative targets for cyber attackers. These critical infrastructures are usually Cyber-Physical Systems (CPS) such as power grids, water, and sewage treatment facilities, oil and gas pipelines, etc. In recent times, these systems have suffered from cyber attacks numerous times. Researchers have been developing cyber security solutions for CIs to avoid lasting damages. According to standard frameworks, cyber security based on identification, protection, detection, response, and recovery are at the core of these research. Detection of an ongoing attack that escapes standard protection such as firewall, anti-virus, and host/network intrusion detection has gained importance as such attacks eventually affect the physical dynamics of the system. Therefore, anomaly detection in physical dynamics proves an effective means to implement defense-in-depth. PASAD is one example of anomaly detection in the sensor/actuator data, representing such systems' physical dynamics. We present EPASAD, which improves the detection technique used in PASAD to detect these micro-stealthy attacks, as our experiments show that PASAD's spherical boundary-based detection fails to detect. Our method EPASAD overcomes this by using Ellipsoid boundaries, thereby tightening the boundaries in various dimensions, whereas a spherical boundary treats all dimensions equally. We validate EPASAD using the dataset produced by the TE-process simulator and the C-town datasets. The results show that EPASAD improves PASAD's average recall by 5.8% and 9.5% for the two datasets, respectively.

preprint2022arXiv

Incentives in Dominant Resource Fair Allocation under Dynamic Demands

Every computer system -- from schedulers in clouds (e.g. Amazon) to computer networks to operating systems -- performs resource allocation across system users. The defacto allocation policies are max-min fairness (MMF) for single resources and dominant resource fairness (DRF) for multiple resources which guarantee properties like incentive compatibility, envy-freeness, and Pareto efficiency, assuming user demands are static (time-independent). However, in real-world systems, user demands are dynamic, i.e. time-dependant. As a result, there is now a fundamental mismatch between the goals of computer systems and the properties enabled by classic resource allocation policies. We aim to bridge this mismatch. When demands are dynamic, instant-by-instant MMF can be extremely unfair over longer periods of time, i.e. lead to unbalanced user allocations as previous allocations have no effect in the present. We consider a natural generalization of MMF and DRF for multiple resources and users with dynamic demands: this algorithm ensures that user allocations are as max-min fair as possible up to any time instant, given past allocations. This dynamic mechanism remains Pareto optimal and envy-free, but not incentive compatible. However, our results show that the possible increase in utility by misreporting is bounded and, since this can lead to significant decrease in overall useful allocation, this suggests that it is not a useful strategy. Our main result is to show that our dynamic DRF algorithm is $(1+ρ)$-incentive compatible, where $ρ$ quantifies the relative importance of a resource for different users; we show that this factor is tight even with only two resources. We also present a $3/2$ upper bound and a $\sqrt 2$ lower bound for incentive compatibility when there is only one resource. We also offer extensions for the case when users are weighted to prioritize them differently.

preprint2021arXiv

Detecting Malicious Accounts in Permissionless Blockchains using Temporal Graph Properties

The temporal nature of modeling accounts as nodes and transactions as directed edges in a directed graph -- for a blockchain, enables us to understand the behavior (malicious or benign) of the accounts. Predictive classification of accounts as malicious or benign could help users of the permissionless blockchain platforms to operate in a secure manner. Motivated by this, we introduce temporal features such as burst and attractiveness on top of several already used graph properties such as the node degree and clustering coefficient. Using identified features, we train various Machine Learning (ML) algorithms and identify the algorithm that performs the best in detecting which accounts are malicious. We then study the behavior of the accounts over different temporal granularities of the dataset before assigning them malicious tags. For Ethereum blockchain, we identify that for the entire dataset - the ExtraTreesClassifier performs the best among supervised ML algorithms. On the other hand, using cosine similarity on top of the results provided by unsupervised ML algorithms such as K-Means on the entire dataset, we were able to detect 554 more suspicious accounts. Further, using behavior change analysis for accounts, we identify 814 unique suspicious accounts across different temporal granularities.

preprint2021arXiv

Detecting Malicious Accounts showing Adversarial Behavior in Permissionless Blockchains

Different types of malicious activities have been flagged in multiple permissionless blockchains such as bitcoin, Ethereum etc. While some malicious activities exploit vulnerabilities in the infrastructure of the blockchain, some target its users through social engineering techniques. To address these problems, we aim at automatically flagging blockchain accounts that originate such malicious exploitation of accounts of other participants. To that end, we identify a robust supervised machine learning (ML) algorithm that is resistant to any bias induced by an over representation of certain malicious activity in the available dataset, as well as is robust against adversarial attacks. We find that most of the malicious activities reported thus far, for example, in Ethereum blockchain ecosystem, behaves statistically similar. Further, the previously used ML algorithms for identifying malicious accounts show bias towards a particular malicious activity which is over-represented. In the sequel, we identify that Neural Networks (NN) holds up the best in the face of such bias inducing dataset at the same time being robust against certain adversarial attacks.

preprint2020arXiv

Wide-Area Data Analytics

We increasingly live in a data-driven world, with diverse kinds of data distributed across many locations. In some cases, the datasets are collected from multiple locations, such as sensors (e.g., mobile phones and street cameras) spread throughout a geographic region. The data may need to be analyzed close to where they are produced, particularly when the applications require low latency, high, low cost, user privacy, and regulatory constraints. In other cases, large datasets are distributed across public clouds, private clouds, or edge-cloud computing sites with more plentiful computation, storage, bandwidth, and energy resources. Often, some portion of the analysis may take place on the end-host or edge cloud (to respect user privacy and reduce the volume of data) while relying on remote clouds to complete the analysis (to leverage greater computation and storage resources). Wide-area data analytics is any analysis of data that is generated by, or stored at, geographically dispersed entities. Over the past few years, several parts of the computer science research community have started to explore effective ways to analyze data spread over multiple locations. In particular, several areas of "systems" research - including databases, distributed systems, computer networking, and security and privacy - have delved into these topics. These research subcommunities often focus on different aspects of the problem, consider different motivating applications and use cases, and design and evaluate their solutions differently. To address these challenges the Computing Community Consortium (CCC) convened a 1.5-day workshop focused on wide-area data analytics in October 2019. This report summarizes the challenges discussed and the conclusions generated at the workshop.