Source author record

Rachit Agarwal

Rachit Agarwal appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Networking and Internet Architecture Machine Learning Social and Information Networks Cryptography and Security Data Structures and Algorithms Databases Distributed, Parallel, and Cluster Computing physics.soc-ph Computer Science and Game Theory cs.CY Information Theory math.IT

Catalog footprint

What is connected

18works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

DNS based In-Browser Cryptojacking Detection

The metadata aspect of Domain Names (DNs) enables us to perform a behavioral study of DNs and detect if a DN is involved in in-browser cryptojacking. Thus, we are motivated to study different temporal and behavioral aspects of DNs involved in cryptojacking. We use temporal features such as query frequency and query burst along with graph-based features such as degree and diameter, and non-temporal features such as the string-based to detect if a DNs is suspect to be involved in the in-browser cryptojacking. Then, we use them to train the Machine Learning (ML) algorithms over different temporal granularities such as 2 hours datasets and complete dataset. Our results show DecisionTrees classifier performs the best with 59.5% Recall on cryptojacked DN, while for unsupervised learning, K-Means with K=2 perform the best. Similarity analysis of the features reveals a minimal divergence between the cryptojacking DNs and other already known malicious DNs. It also reveals the need for improvements in the feature set of state-of-the-art methods to improve their accuracy in detecting in-browser cryptojacking. As added analysis, our signature-based analysis identifies that none-of-the Indian Government websites were involved in cryptojacking during October-December 2021. However, based on the resource utilization, we identify 10 DNs with different properties than others.

preprint2022arXiv

EPASAD: Ellipsoid decision boundary based Process-Aware Stealthy Attack Detector

Due to the importance of Critical Infrastructure (CI) in a nation's economy, they have been lucrative targets for cyber attackers. These critical infrastructures are usually Cyber-Physical Systems (CPS) such as power grids, water, and sewage treatment facilities, oil and gas pipelines, etc. In recent times, these systems have suffered from cyber attacks numerous times. Researchers have been developing cyber security solutions for CIs to avoid lasting damages. According to standard frameworks, cyber security based on identification, protection, detection, response, and recovery are at the core of these research. Detection of an ongoing attack that escapes standard protection such as firewall, anti-virus, and host/network intrusion detection has gained importance as such attacks eventually affect the physical dynamics of the system. Therefore, anomaly detection in physical dynamics proves an effective means to implement defense-in-depth. PASAD is one example of anomaly detection in the sensor/actuator data, representing such systems' physical dynamics. We present EPASAD, which improves the detection technique used in PASAD to detect these micro-stealthy attacks, as our experiments show that PASAD's spherical boundary-based detection fails to detect. Our method EPASAD overcomes this by using Ellipsoid boundaries, thereby tightening the boundaries in various dimensions, whereas a spherical boundary treats all dimensions equally. We validate EPASAD using the dataset produced by the TE-process simulator and the C-town datasets. The results show that EPASAD improves PASAD's average recall by 5.8% and 9.5% for the two datasets, respectively.

preprint2022arXiv

Incentives in Dominant Resource Fair Allocation under Dynamic Demands

Every computer system -- from schedulers in clouds (e.g. Amazon) to computer networks to operating systems -- performs resource allocation across system users. The defacto allocation policies are max-min fairness (MMF) for single resources and dominant resource fairness (DRF) for multiple resources which guarantee properties like incentive compatibility, envy-freeness, and Pareto efficiency, assuming user demands are static (time-independent). However, in real-world systems, user demands are dynamic, i.e. time-dependant. As a result, there is now a fundamental mismatch between the goals of computer systems and the properties enabled by classic resource allocation policies. We aim to bridge this mismatch. When demands are dynamic, instant-by-instant MMF can be extremely unfair over longer periods of time, i.e. lead to unbalanced user allocations as previous allocations have no effect in the present. We consider a natural generalization of MMF and DRF for multiple resources and users with dynamic demands: this algorithm ensures that user allocations are as max-min fair as possible up to any time instant, given past allocations. This dynamic mechanism remains Pareto optimal and envy-free, but not incentive compatible. However, our results show that the possible increase in utility by misreporting is bounded and, since this can lead to significant decrease in overall useful allocation, this suggests that it is not a useful strategy. Our main result is to show that our dynamic DRF algorithm is $(1+ρ)$-incentive compatible, where $ρ$ quantifies the relative importance of a resource for different users; we show that this factor is tight even with only two resources. We also present a $3/2$ upper bound and a $\sqrt 2$ lower bound for incentive compatibility when there is only one resource. We also offer extensions for the case when users are weighted to prioritize them differently.

preprint2021arXiv

Detecting Malicious Accounts in Permissionless Blockchains using Temporal Graph Properties

The temporal nature of modeling accounts as nodes and transactions as directed edges in a directed graph -- for a blockchain, enables us to understand the behavior (malicious or benign) of the accounts. Predictive classification of accounts as malicious or benign could help users of the permissionless blockchain platforms to operate in a secure manner. Motivated by this, we introduce temporal features such as burst and attractiveness on top of several already used graph properties such as the node degree and clustering coefficient. Using identified features, we train various Machine Learning (ML) algorithms and identify the algorithm that performs the best in detecting which accounts are malicious. We then study the behavior of the accounts over different temporal granularities of the dataset before assigning them malicious tags. For Ethereum blockchain, we identify that for the entire dataset - the ExtraTreesClassifier performs the best among supervised ML algorithms. On the other hand, using cosine similarity on top of the results provided by unsupervised ML algorithms such as K-Means on the entire dataset, we were able to detect 554 more suspicious accounts. Further, using behavior change analysis for accounts, we identify 814 unique suspicious accounts across different temporal granularities.

preprint2021arXiv

Detecting Malicious Accounts showing Adversarial Behavior in Permissionless Blockchains

Different types of malicious activities have been flagged in multiple permissionless blockchains such as bitcoin, Ethereum etc. While some malicious activities exploit vulnerabilities in the infrastructure of the blockchain, some target its users through social engineering techniques. To address these problems, we aim at automatically flagging blockchain accounts that originate such malicious exploitation of accounts of other participants. To that end, we identify a robust supervised machine learning (ML) algorithm that is resistant to any bias induced by an over representation of certain malicious activity in the available dataset, as well as is robust against adversarial attacks. We find that most of the malicious activities reported thus far, for example, in Ethereum blockchain ecosystem, behaves statistically similar. Further, the previously used ML algorithms for identifying malicious accounts show bias towards a particular malicious activity which is over-represented. In the sequel, we identify that Neural Networks (NN) holds up the best in the face of such bias inducing dataset at the same time being robust against certain adversarial attacks.

preprint2020arXiv

Wide-Area Data Analytics

We increasingly live in a data-driven world, with diverse kinds of data distributed across many locations. In some cases, the datasets are collected from multiple locations, such as sensors (e.g., mobile phones and street cameras) spread throughout a geographic region. The data may need to be analyzed close to where they are produced, particularly when the applications require low latency, high, low cost, user privacy, and regulatory constraints. In other cases, large datasets are distributed across public clouds, private clouds, or edge-cloud computing sites with more plentiful computation, storage, bandwidth, and energy resources. Often, some portion of the analysis may take place on the end-host or edge cloud (to respect user privacy and reduce the volume of data) while relying on remote clouds to complete the analysis (to leverage greater computation and storage resources). Wide-area data analytics is any analysis of data that is generated by, or stored at, geographically dispersed entities. Over the past few years, several parts of the computer science research community have started to explore effective ways to analyze data spread over multiple locations. In particular, several areas of "systems" research - including databases, distributed systems, computer networking, and security and privacy - have delved into these topics. These research subcommunities often focus on different aspects of the problem, consider different motivating applications and use cases, and design and evaluate their solutions differently. To address these challenges the Computing Community Consortium (CCC) convened a 1.5-day workshop focused on wide-area data analytics in October 2019. This report summarizes the challenges discussed and the conclusions generated at the workshop.

preprint2015arXiv

Universal Packet Scheduling

In this paper we address a seemingly simple question: Is there a universal packet scheduling algorithm? More precisely, we analyze (both theoretically and empirically) whether there is a single packet scheduling algorithm that, at a network-wide level, can match the results of any given scheduling algorithm. We find that in general the answer is "no". However, we show theoretically that the classical Least Slack Time First (LSTF) scheduling algorithm comes closest to being universal and demonstrate empirically that LSTF can closely, though not perfectly, replay a wide range of scheduling algorithms in realistic network settings. We then evaluate whether LSTF can be used {\em in practice} to meet various network-wide objectives by looking at three popular performance metrics (mean FCT, tail packet delays, and fairness); we find that LSTF performs comparable to the state-of-the-art for each of them.

preprint2014arXiv

Large Scale Model for Information Dissemination with Device to Device Communication using Call Details Records

In a network of devices in close proximity such as Device to Device ($D2D$) communication, we study the dissemination of public safety information at country scale level. In order to provide a realistic model for the information dissemination, we extract a spatial distribution of the population of Ivory Coast from census data and determine migration pattern from the Call Detail Records ($CDR$) obtained during the Data for Development ($D4D$) challenge. We later apply epidemic model towards the information dissemination process based on the spatial properties of the user mobility extracted from the provided $CDR$. We then propose enhancements by adding latent states to the epidemic model in order to model more realistic user dynamics. Finally, we study dynamics of the evolution of the information spreading through the population.

preprint2013arXiv

Shortest Paths in Microseconds

Computing shortest paths is a fundamental primitive for several social network applications including socially-sensitive ranking, location-aware search, social auctions and social network privacy. Since these applications compute paths in response to a user query, the goal is to minimize latency while maintaining feasible memory requirements. We present ASAP, a system that achieves this goal by exploiting the structure of social networks. ASAP preprocesses a given network to compute and store a partial shortest path tree (PSPT) for each node. The PSPTs have the property that for any two nodes, each edge along the shortest path is with high probability contained in the PSPT of at least one of the nodes. We show that the structure of social networks enable the PSPT of each node to be an extremely small fraction of the entire network; hence, PSPTs can be stored efficiently and each shortest path can be computed extremely quickly. For a real network with 5 million nodes and 69 million edges, ASAP computes a shortest path for most node pairs in less than 49 microseconds per pair. ASAP, unlike any previous technique, also computes hundreds of paths (along with corresponding distances) between any node pair in less than 100 microseconds. Finally, ASAP admits efficient implementation on distributed programming frameworks like MapReduce.

preprint2012arXiv

A Self-Organization Framework for Wireless Ad Hoc Networks as Small Worlds

Motivated by the benefits of small world networks, we propose a self-organization framework for wireless ad hoc networks. We investigate the use of directional beamforming for creating long-range short cuts between nodes. Using simulation results for randomized beamforming as a guideline, we identify crucial design issues for algorithm design. Our results show that, while significant path length reduction is achievable, this is accompanied by the problem of asymmetric paths between nodes. Subsequently, we propose a distributed algorithm for small world creation that achieves path length reduction while maintaining connectivity. We define a new centrality measure that estimates the structural importance of nodes based on traffic flow in the network, which is used to identify the optimum nodes for beamforming. We show, using simulations, that this leads to significant reduction in path length while maintaining connectivity.

preprint2012arXiv

Achieving Small World Properties using Bio-Inspired Techniques in Wireless Networks

It is highly desirable and challenging for a wireless ad hoc network to have self-organization properties in order to achieve network wide characteristics. Studies have shown that Small World properties, primarily low average path length and high clustering coefficient, are desired properties for networks in general. However, due to the spatial nature of the wireless networks, achieving small world properties remains highly challenging. Studies also show that, wireless ad hoc networks with small world properties show a degree distribution that lies between geometric and power law. In this paper, we show that in a wireless ad hoc network with non-uniform node density with only local information, we can significantly reduce the average path length and retain the clustering coefficient. To achieve our goal, our algorithm first identifies logical regions using Lateral Inhibition technique, then identifies the nodes that beamform and finally the beam properties using Flocking. We use Lateral Inhibition and Flocking because they enable us to use local state information as opposed to other techniques. We support our work with simulation results and analysis, which show that a reduction of up to 40% can be achieved for a high-density network. We also show the effect of hopcount used to create regions on average path length, clustering coefficient and connectivity.

preprint2012arXiv

Enhancing Information Dissemination in Dynamic Wireless Network using Stability and Beamforming

Mobility causes network structures to change. In PSNs where underlying network structure is changing rapidly, we are interested in studying how information dissemination can be enhanced in a sparse disconnected network where nodes lack the global knowledge about the network. We use beamforming to study the enhancement in the information dissemination process. In order to identify potential beamformers and nodes to which beams should be directed we use the concept of stability. We first predict the stability of a node in the dynamic network using truncated levy walk nature of jump lengths of human mobility and then use this measure to identify beamforming nodes and the nodes to which the beams are directed. We also develop our algorithm such that it does not require any global knowledge about the network and works in a distributed manner. We also show the effect of various parameters such as number of sources, number of packets, mobility parameters, antenna parameters, type of stability used and density of the network on information dissemination in the network. We validate our findings with three validation model, no beamforming, beamforming using different stability measure and when no stability measure is associated but same number of node beamform and the selection of the beamforming nodes is random. Our simulation results show that information dissemination can be enhanced using our algorithm over other models.

preprint2012arXiv

Faster Approximate Distance Queries and Compact Routing in Sparse Graphs

A distance oracle is a compact representation of the shortest distance matrix of a graph. It can be queried to approximate shortest paths between any pair of vertices. Any distance oracle that returns paths of worst-case stretch (2k-1) must require space $Ω(n^{1 + 1/k})$ for graphs of n nodes. The hard cases that enforce this lower bound are, however, rather dense graphs with average degree Ω(n^{1/k}). We present distance oracles that, for sparse graphs, substantially break the lower bound barrier at the expense of higher query time. For any 1 \leq α\leq n, our distance oracles can return stretch 2 paths using O(m + n^2/α) space and stretch 3 paths using O(m + n^2/α^2) space, at the expense of O(αm/n) query time. By setting appropriate values of α, we get the first distance oracles that have size linear in the size of the graph, and return constant stretch paths in non-trivial query time. The query time can be further reduced to O(α), by using an additional O(m α) space for all our distance oracles, or at the cost of a small constant additive stretch. We use our stretch 2 distance oracle to present the first compact routing scheme with worst-case stretch 2. Any compact routing scheme with stretch less than 2 must require linear memory at some nodes even for sparse graphs; our scheme, hence, achieves the optimal stretch with non-trivial memory requirements. Moreover, supported by large-scale simulations on graphs including the AS-level Internet graph, we argue that our stretch-2 scheme would be simple and efficient to implement as a distributed compact routing protocol.

preprint2012arXiv

Shortest Paths in Less Than a Millisecond

We consider the problem of answering point-to-point shortest path queries on massive social networks. The goal is to answer queries within tens of milliseconds while minimizing the memory requirements. We present a technique that achieves this goal for an extremely large fraction of path queries by exploiting the structure of the social networks. Using evaluations on real-world datasets, we argue that our technique offers a unique trade-off between latency, memory and accuracy. For instance, for the LiveJournal social network (roughly 5 million nodes and 69 million edges), our technique can answer 99.9% of the queries in less than a millisecond. In comparison to storing all pair shortest paths, our technique requires at least 550x less memory; the average query time is roughly 365 microseconds --- 430x faster than the state-of-the-art shortest path algorithm. Furthermore, the relative performance of our technique improves with the size (and density) of the network. For the Orkut social network (3 million nodes and 220 million edges), for instance, our technique is roughly 2588x faster than the state-of-the-art algorithm for computing shortest paths.

preprint2012arXiv

Slick Packets

Source-controlled routing has been proposed as a way to improve flexibility of future network architectures, as well as simplifying the data plane. However, if a packet specifies its path, this precludes fast local re-routing within the network. We propose SlickPackets, a novel solution that allows packets to slip around failures by specifying alternate paths in their headers, in the form of compactly-encoded directed acyclic graphs. We show that this can be accomplished with reasonably small packet headers for real network topologies, and results in responsiveness to failures that is competitive with past approaches that require much more state within the network. Our approach thus enables fast failure response while preserving the benefits of source-controlled routing.

preprint2011arXiv

Self-organization of Nodes using Bio-Inspired Techniques for Achieving Small World Properties

In an autonomous wireless sensor network, self-organization of the nodes is essential to achieve network wide characteristics. We believe that connectivity in wireless autonomous networks can be increased and overall average path length can be reduced by using beamforming and bio-inspired algorithms. Recent works on the use of beamforming in wireless networks mostly assume the knowledge of the network in aggregation to either heterogeneous or hybrid deployment. We propose that without the global knowledge or the introduction of any special feature, the average path length can be reduced with the help of inspirations from the nature and simple interactions between neighboring nodes. Our algorithm also reduces the number of disconnected components within the network. Our results show that reduction in the average path length and the number of disconnected components can be achieved using very simple local rules and without the full network knowledge.

preprint2011arXiv

Self-Organization of Wireless Ad Hoc Networks as Small Worlds Using Long Range Directional Beams

We study how long range directional beams can be used for self-organization of a wireless network to exhibit small world properties. Using simulation results for randomized beamforming as a guideline, we identify crucial design issues for algorithm design. Subsequently, we propose an algorithm for deterministic creation of small worlds. We define a new centrality measure that estimates the structural importance of nodes based on traffic flow in the network, which is used to identify the optimum nodes for beamforming. This results in significant reduction in path length while maintaining connectivity.

preprint2007arXiv

A Low Complexity Algorithm and Architecture for Systematic Encoding of Hermitian Codes

We present an algorithm for systematic encoding of Hermitian codes. For a Hermitian code defined over GF(q^2), the proposed algorithm achieves a run time complexity of O(q^2) and is suitable for VLSI implementation. The encoder architecture uses as main blocks q varying-rate Reed-Solomon encoders and achieves a space complexity of O(q^2) in terms of finite field multipliers and memory elements.

Rachit Agarwal

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

DNS based In-Browser Cryptojacking Detection

EPASAD: Ellipsoid decision boundary based Process-Aware Stealthy Attack Detector

Incentives in Dominant Resource Fair Allocation under Dynamic Demands

Detecting Malicious Accounts in Permissionless Blockchains using Temporal Graph Properties

Detecting Malicious Accounts showing Adversarial Behavior in Permissionless Blockchains

Wide-Area Data Analytics

Universal Packet Scheduling

Large Scale Model for Information Dissemination with Device to Device Communication using Call Details Records

Shortest Paths in Microseconds

A Self-Organization Framework for Wireless Ad Hoc Networks as Small Worlds

Achieving Small World Properties using Bio-Inspired Techniques in Wireless Networks

Enhancing Information Dissemination in Dynamic Wireless Network using Stability and Beamforming

Faster Approximate Distance Queries and Compact Routing in Sparse Graphs

Shortest Paths in Less Than a Millisecond

Slick Packets

Self-organization of Nodes using Bio-Inspired Techniques for Achieving Small World Properties

Self-Organization of Wireless Ad Hoc Networks as Small Worlds Using Long Range Directional Beams

A Low Complexity Algorithm and Architecture for Systematic Encoding of Hermitian Codes