Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2022arXiv

Cascading Failures in Smart Grids under Random, Targeted and Adaptive Attacks

We study cascading failures in smart grids, where an attacker selectively compromises the nodes with probabilities proportional to their degrees, betweenness, or clustering coefficient. This implies that nodes with high degrees, betweenness, or clustering coefficients are attacked with higher probability. We mathematically and experimentally analyze the sizes of the giant components of the networks under different types of targeted attacks, and compare the results with the corresponding sizes under random attacks. We show that networks disintegrate faster for targeted attacks compared to random attacks. A targeted attack on a small fraction of high degree nodes disintegrates one or both of the networks, whereas both the networks contain giant components for random attack on the same fraction of nodes. An important observation is that an attacker has an advantage if it compromises nodes based on their betweenness, rather than based on degree or clustering coefficient. We next study adaptive attacks, where an attacker compromises nodes in rounds. Here, some nodes are compromised in each round based on their degree, betweenness or clustering coefficients, instead of compromising all nodes together. In this case, the degree, betweenness, or clustering coefficient is calculated before the start of each round, instead of at the beginning. We show experimentally that an adversary has an advantage in this adaptive approach, compared to compromising the same number of nodes all at once.

preprint2022arXiv

PhishSim: Aiding Phishing Website Detection with a Feature-Free Tool

In this paper, we propose a feature-free method for detecting phishing websites using the Normalized Compression Distance (NCD), a parameter-free similarity measure which computes the similarity of two websites by compressing them, thus eliminating the need to perform any feature extraction. It also removes any dependence on a specific set of website features. This method examines the HTML of webpages and computes their similarity with known phishing websites, in order to classify them. We use the Furthest Point First algorithm to perform phishing prototype extractions, in order to select instances that are representative of a cluster of phishing webpages. We also introduce the use of an incremental learning algorithm as a framework for continuous and adaptive detection without extracting new features when concept drift occurs. On a large dataset, our proposed method significantly outperforms previous methods in detecting phishing websites, with an AUC score of 98.68%, a high true positive rate (TPR) of around 90%, while maintaining a low false positive rate (FPR) of 0.58%. Our approach uses prototypes, eliminating the need to retain long term data in the future, and is feasible to deploy in real systems with a processing time of roughly 0.3 seconds.

preprint2022arXiv

Task Allocation using a Team of Robots

Task allocation using a team or coalition of robots is one of the most important problems in robotics, computer science, operational research, and artificial intelligence. In recent work, research has focused on handling complex objectives and feasibility constraints amongst other variations of the multi-robot task allocation problem. There are many examples of important research progress in these directions. We present a general formulation of the task allocation problem that generalizes several versions that are well-studied. Our formulation includes the states of robots, tasks, and the surrounding environment in which they operate. We describe how the problem can vary depending on the feasibility constraints, objective functions, and the level of dynamically changing information. In addition, we discuss existing solution approaches for the problem including optimization-based approaches, and market-based approaches.

preprint2020arXiv

BB_Evac: Fast Location-Sensitive Behavior-Based Building Evacuation

Past work on evacuation planning assumes that evacuees will follow instructions -- however, there is ample evidence that this is not the case. While some people will follow instructions, others will follow their own desires. In this paper, we present a formal definition of a behavior-based evacuation problem (BBEP) in which a human behavior model is taken into account when planning an evacuation. We show that a specific form of constraints can be used to express such behaviors. We show that BBEPs can be solved exactly via an integer program called BB_IP, and inexactly by a much faster algorithm that we call BB_Evac. We conducted a detailed experimental evaluation of both algorithms applied to buildings (though in principle the algorithms can be applied to any graphs) and show that the latter is an order of magnitude faster than BB_IP while producing results that are almost as good on one real-world building graph and as well as on several synthetically generated graphs.

preprint2020arXiv

Hier-SPCNet: A Legal Statute Hierarchy-based Heterogeneous Network for Computing Legal Case Document Similarity

Computing similarity between two legal case documents is an important and challenging task in Legal IR, for which text-based and network-based measures have been proposed in literature. All prior network-based similarity methods considered a precedent citation network among case documents only (PCNet). However, this approach misses an important source of legal knowledge -- the hierarchy of legal statutes that are applicable in a given legal jurisdiction (e.g., country). We propose to augment the PCNet with the hierarchy of legal statutes, to form a heterogeneous network Hier-SPCNet, having citation links between case documents and statutes, as well as citation and hierarchy links among the statutes. Experiments over a set of Indian Supreme Court case documents show that our proposed heterogeneous network enables significantly better document similarity estimation, as compared to existing approaches using PCNet. We also show that the proposed network-based method can complement text-based measures for better estimation of legal document similarity.

preprint2020arXiv

HushRelay: A Privacy-Preserving, Efficient, and Scalable Routing Algorithm for Off-Chain Payments

Payment channel networks (PCN) are used in cryptocurrencies to enhance the performance and scalability of off-chain transactions. Except for opening and closing of a payment channel, no other transaction requests accepted by a PCN are recorded in the Blockchain. Only the parties which have opened the channel will know the exact amount of fund left at a given instant. In real scenarios, there might not exist a single path which can enable transfer of high value payments. For such cases, splitting up the transaction value across multiple paths is a better approach. While there exists several approaches which route transactions via several paths, such techniques are quite inefficient, as the decision on the number of splits must be taken at the initial phase of the routing algorithm (e.g., SpeedyMurmur [42]). Algorithms which do not consider the residual capacity of each channel in the network are susceptible to failure. Other approaches leak sensitive information, and are quite computationally expensive [28]. To the best of our knowledge, our proposed scheme HushRelay is an efficient privacy preserving routing algorithm, taking into account the funds left in each channel, while splitting the transaction value across several paths. Comparing the performance of our algorithm with existing routing schemes on real instances (e.g., Ripple Network), we observed that HushRelay attains a success ratio of 1, with an execution time of 2.4 sec. However, SpeedyMurmur [42] attains a success ratio of 0.98 and takes 4.74 sec when the number of landmarks is 6. On testing our proposed routing algorithm on the Lightning Network, a success ratio of 0.99 is observed, having an execution time of 0.15 sec, which is 12 times smaller than the time taken by SpeedyMurmur.

preprint2020arXiv

Identification, Tracking and Impact: Understanding the trade secret of catchphrases

Understanding the topical evolution in industrial innovation is a challenging problem. With the advancement in the digital repositories in the form of patent documents, it is becoming increasingly more feasible to understand the innovation secrets -- "catchphrases" of organizations. However, searching and understanding this enormous textual information is a natural bottleneck. In this paper, we propose an unsupervised method for the extraction of catchphrases from the abstracts of patents granted by the U.S. Patent and Trademark Office over the years. Our proposed system achieves substantial improvement, both in terms of precision and recall, against state-of-the-art techniques. As a second objective, we conduct an extensive empirical study to understand the temporal evolution of the catchphrases across various organizations. We also show how the overall innovation evolution in the form of introduction of newer catchphrases in an organization's patents correlates with the future citations received by the patents filed by that organization. Our code and data sets will be placed in the public domain soon.

preprint2020arXiv

Innovation and Revenue: Deep Diving into the Temporal Rank-shifts of Fortune 500 Companies

Research and innovation is important agenda for any company to remain competitive in the market. The relationship between innovation and revenue is a key metric for companies to decide on the amount to be invested for future research. Two important parameters to evaluate innovation are the quantity and quality of scientific papers and patents. Our work studies the relationship between innovation and patenting activities for several Fortune 500 companies over a period of time. We perform a comprehensive study of the patent citation dataset available in the Reed Technology Index collected from the US Patent Office. We observe several interesting relations between parameters like the number of (i) patent applications, (ii) patent grants, (iii) patent citations and Fortune 500 ranks of companies. We also study the trends of these parameters varying over the years and derive causal explanations for these with qualitative and intuitive reasoning. To facilitate reproducible research, we make all the processed patent dataset publicly available at https://github.com/mayank4490/Innovation-and-revenue.

preprint2020arXiv

Methods for Computing Legal Document Similarity: A Comparative Study

Computing similarity between two legal documents is an important and challenging task in the domain of Legal Information Retrieval. Finding similar legal documents has many applications in downstream tasks, including prior-case retrieval, recommendation of legal articles, and so on. Prior works have proposed two broad ways of measuring similarity between legal documents - analyzing the precedent citation network, and measuring similarity based on textual content similarity measures. But there has not been a comprehensive comparison of these existing methods on a common platform. In this paper, we perform the first systematic analysis of the existing methods. In addition, we explore two promising new similarity computation methods - one text-based and the other based on network embeddings, which have not been considered till now.

preprint2020arXiv

PhishZip: A New Compression-based Algorithm for Detecting Phishing Websites

Phishing has grown significantly in the past few years and is predicted to further increase in the future. The dynamics of phishing introduce challenges in implementing a robust phishing detection system and selecting features which can represent phishing despite the change of attack. In this paper, we propose PhishZip which is a novel phishing detection approach using a compression algorithm to perform website classification and demonstrate a systematic way to construct the word dictionaries for the compression models using word occurrence likelihood analysis. PhishZip outperforms the use of best-performing HTML-based features in past studies, with a true positive rate of 80.04%. We also propose the use of compression ratio as a novel machine learning feature which significantly improves machine learning based phishing detection over previous studies. Using compression ratios as additional features, the true positive rate significantly improves by 30.3% (from 51.47% to 81.77%), while the accuracy increases by 11.84% (from 71.20% to 83.04%).