Source author record

Arindam Pal

Arindam Pal appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Machine Learning Social and Information Networks Discrete Mathematics Information Retrieval Artificial Intelligence Cryptography and Security Computation and Language Digital Libraries cs.CY math.CO Multiagent Systems Networking and Internet Architecture physics.soc-ph Robotics

Catalog footprint

What is connected

14works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Cascading Failures in Smart Grids under Random, Targeted and Adaptive Attacks

We study cascading failures in smart grids, where an attacker selectively compromises the nodes with probabilities proportional to their degrees, betweenness, or clustering coefficient. This implies that nodes with high degrees, betweenness, or clustering coefficients are attacked with higher probability. We mathematically and experimentally analyze the sizes of the giant components of the networks under different types of targeted attacks, and compare the results with the corresponding sizes under random attacks. We show that networks disintegrate faster for targeted attacks compared to random attacks. A targeted attack on a small fraction of high degree nodes disintegrates one or both of the networks, whereas both the networks contain giant components for random attack on the same fraction of nodes. An important observation is that an attacker has an advantage if it compromises nodes based on their betweenness, rather than based on degree or clustering coefficient. We next study adaptive attacks, where an attacker compromises nodes in rounds. Here, some nodes are compromised in each round based on their degree, betweenness or clustering coefficients, instead of compromising all nodes together. In this case, the degree, betweenness, or clustering coefficient is calculated before the start of each round, instead of at the beginning. We show experimentally that an adversary has an advantage in this adaptive approach, compared to compromising the same number of nodes all at once.

preprint2022arXiv

PhishSim: Aiding Phishing Website Detection with a Feature-Free Tool

In this paper, we propose a feature-free method for detecting phishing websites using the Normalized Compression Distance (NCD), a parameter-free similarity measure which computes the similarity of two websites by compressing them, thus eliminating the need to perform any feature extraction. It also removes any dependence on a specific set of website features. This method examines the HTML of webpages and computes their similarity with known phishing websites, in order to classify them. We use the Furthest Point First algorithm to perform phishing prototype extractions, in order to select instances that are representative of a cluster of phishing webpages. We also introduce the use of an incremental learning algorithm as a framework for continuous and adaptive detection without extracting new features when concept drift occurs. On a large dataset, our proposed method significantly outperforms previous methods in detecting phishing websites, with an AUC score of 98.68%, a high true positive rate (TPR) of around 90%, while maintaining a low false positive rate (FPR) of 0.58%. Our approach uses prototypes, eliminating the need to retain long term data in the future, and is feasible to deploy in real systems with a processing time of roughly 0.3 seconds.

preprint2022arXiv

Task Allocation using a Team of Robots

Task allocation using a team or coalition of robots is one of the most important problems in robotics, computer science, operational research, and artificial intelligence. In recent work, research has focused on handling complex objectives and feasibility constraints amongst other variations of the multi-robot task allocation problem. There are many examples of important research progress in these directions. We present a general formulation of the task allocation problem that generalizes several versions that are well-studied. Our formulation includes the states of robots, tasks, and the surrounding environment in which they operate. We describe how the problem can vary depending on the feasibility constraints, objective functions, and the level of dynamically changing information. In addition, we discuss existing solution approaches for the problem including optimization-based approaches, and market-based approaches.

preprint2020arXiv

BB_Evac: Fast Location-Sensitive Behavior-Based Building Evacuation

Past work on evacuation planning assumes that evacuees will follow instructions -- however, there is ample evidence that this is not the case. While some people will follow instructions, others will follow their own desires. In this paper, we present a formal definition of a behavior-based evacuation problem (BBEP) in which a human behavior model is taken into account when planning an evacuation. We show that a specific form of constraints can be used to express such behaviors. We show that BBEPs can be solved exactly via an integer program called BB_IP, and inexactly by a much faster algorithm that we call BB_Evac. We conducted a detailed experimental evaluation of both algorithms applied to buildings (though in principle the algorithms can be applied to any graphs) and show that the latter is an order of magnitude faster than BB_IP while producing results that are almost as good on one real-world building graph and as well as on several synthetically generated graphs.

preprint2020arXiv

Hier-SPCNet: A Legal Statute Hierarchy-based Heterogeneous Network for Computing Legal Case Document Similarity

Computing similarity between two legal case documents is an important and challenging task in Legal IR, for which text-based and network-based measures have been proposed in literature. All prior network-based similarity methods considered a precedent citation network among case documents only (PCNet). However, this approach misses an important source of legal knowledge -- the hierarchy of legal statutes that are applicable in a given legal jurisdiction (e.g., country). We propose to augment the PCNet with the hierarchy of legal statutes, to form a heterogeneous network Hier-SPCNet, having citation links between case documents and statutes, as well as citation and hierarchy links among the statutes. Experiments over a set of Indian Supreme Court case documents show that our proposed heterogeneous network enables significantly better document similarity estimation, as compared to existing approaches using PCNet. We also show that the proposed network-based method can complement text-based measures for better estimation of legal document similarity.

preprint2020arXiv

HushRelay: A Privacy-Preserving, Efficient, and Scalable Routing Algorithm for Off-Chain Payments

Payment channel networks (PCN) are used in cryptocurrencies to enhance the performance and scalability of off-chain transactions. Except for opening and closing of a payment channel, no other transaction requests accepted by a PCN are recorded in the Blockchain. Only the parties which have opened the channel will know the exact amount of fund left at a given instant. In real scenarios, there might not exist a single path which can enable transfer of high value payments. For such cases, splitting up the transaction value across multiple paths is a better approach. While there exists several approaches which route transactions via several paths, such techniques are quite inefficient, as the decision on the number of splits must be taken at the initial phase of the routing algorithm (e.g., SpeedyMurmur [42]). Algorithms which do not consider the residual capacity of each channel in the network are susceptible to failure. Other approaches leak sensitive information, and are quite computationally expensive [28]. To the best of our knowledge, our proposed scheme HushRelay is an efficient privacy preserving routing algorithm, taking into account the funds left in each channel, while splitting the transaction value across several paths. Comparing the performance of our algorithm with existing routing schemes on real instances (e.g., Ripple Network), we observed that HushRelay attains a success ratio of 1, with an execution time of 2.4 sec. However, SpeedyMurmur [42] attains a success ratio of 0.98 and takes 4.74 sec when the number of landmarks is 6. On testing our proposed routing algorithm on the Lightning Network, a success ratio of 0.99 is observed, having an execution time of 0.15 sec, which is 12 times smaller than the time taken by SpeedyMurmur.

preprint2020arXiv

Identification, Tracking and Impact: Understanding the trade secret of catchphrases

Understanding the topical evolution in industrial innovation is a challenging problem. With the advancement in the digital repositories in the form of patent documents, it is becoming increasingly more feasible to understand the innovation secrets -- "catchphrases" of organizations. However, searching and understanding this enormous textual information is a natural bottleneck. In this paper, we propose an unsupervised method for the extraction of catchphrases from the abstracts of patents granted by the U.S. Patent and Trademark Office over the years. Our proposed system achieves substantial improvement, both in terms of precision and recall, against state-of-the-art techniques. As a second objective, we conduct an extensive empirical study to understand the temporal evolution of the catchphrases across various organizations. We also show how the overall innovation evolution in the form of introduction of newer catchphrases in an organization's patents correlates with the future citations received by the patents filed by that organization. Our code and data sets will be placed in the public domain soon.

preprint2020arXiv

Innovation and Revenue: Deep Diving into the Temporal Rank-shifts of Fortune 500 Companies

Research and innovation is important agenda for any company to remain competitive in the market. The relationship between innovation and revenue is a key metric for companies to decide on the amount to be invested for future research. Two important parameters to evaluate innovation are the quantity and quality of scientific papers and patents. Our work studies the relationship between innovation and patenting activities for several Fortune 500 companies over a period of time. We perform a comprehensive study of the patent citation dataset available in the Reed Technology Index collected from the US Patent Office. We observe several interesting relations between parameters like the number of (i) patent applications, (ii) patent grants, (iii) patent citations and Fortune 500 ranks of companies. We also study the trends of these parameters varying over the years and derive causal explanations for these with qualitative and intuitive reasoning. To facilitate reproducible research, we make all the processed patent dataset publicly available at https://github.com/mayank4490/Innovation-and-revenue.

preprint2020arXiv

Methods for Computing Legal Document Similarity: A Comparative Study

Computing similarity between two legal documents is an important and challenging task in the domain of Legal Information Retrieval. Finding similar legal documents has many applications in downstream tasks, including prior-case retrieval, recommendation of legal articles, and so on. Prior works have proposed two broad ways of measuring similarity between legal documents - analyzing the precedent citation network, and measuring similarity based on textual content similarity measures. But there has not been a comprehensive comparison of these existing methods on a common platform. In this paper, we perform the first systematic analysis of the existing methods. In addition, we explore two promising new similarity computation methods - one text-based and the other based on network embeddings, which have not been considered till now.

preprint2020arXiv

PhishZip: A New Compression-based Algorithm for Detecting Phishing Websites

Phishing has grown significantly in the past few years and is predicted to further increase in the future. The dynamics of phishing introduce challenges in implementing a robust phishing detection system and selecting features which can represent phishing despite the change of attack. In this paper, we propose PhishZip which is a novel phishing detection approach using a compression algorithm to perform website classification and demonstrate a systematic way to construct the word dictionaries for the compression models using word occurrence likelihood analysis. PhishZip outperforms the use of best-performing HTML-based features in past studies, with a true positive rate of 80.04%. We also propose the use of compression ratio as a novel machine learning feature which significantly improves machine learning based phishing detection over previous studies. Using compression ratios as additional features, the true positive rate significantly improves by 30.3% (from 51.47% to 81.77%), while the accuracy increases by 11.84% (from 71.20% to 83.04%).

preprint2016arXiv

Improved Algorithms for the Evacuation Route Planning Problem

Emergency evacuation is the process of movement of people away from the threat or actual occurrence of hazards such as natural disasters, terrorist attacks, fires and bombs. In this paper, we focus on evacuation from a building, but the ideas can be applied to city and region evacuation. We define the problem and show how it can be modeled using graphs. The resulting optimization problem can be formulated as an integer linear program. Though this can be solved exactly, this approach does not scale well for graphs with thousands of nodes and several hundred thousands of edges. This is impractical for large graphs. We study a special case of this problem, where there is only a single source and a single sink. For this case, we give an improved algorithm \emph{Single Source Single Sink Evacuation Route Planner (SSEP)}, whose evacuation time is always at most that of a famous algorithm \emph{Capacity Constrained Route Planner (CCRP)}, and whose running time is strictly less than that of CCRP. We prove this mathematically and give supporting results by extensive experiments. We also study randomized behavior model of people and give some interesting results.

preprint2016arXiv

Preferential Attachment Model with Degree Bound and its Application to Key Predistribution in WSN

Preferential attachment models have been widely studied in complex networks, because they can explain the formation of many networks like social networks, citation networks, power grids, and biological networks, to name a few. Motivated by the application of key predistribution in wireless sensor networks (WSN), we initiate the study of preferential attachment with degree bound. Our paper has two important contributions to two different areas. The first is a contribution in the study of complex networks. We propose preferential attachment model with degree bound for the first time. In the normal preferential attachment model, the degree distribution follows a power law, with many nodes of low degree and a few nodes of high degree. In our scheme, the nodes can have a maximum degree $d_{\max}$, where $d_{\max}$ is an integer chosen according to the application. The second is in the security of wireless sensor networks. We propose a new key predistribution scheme based on the above model. The important features of this model are that the network is fully connected, it has fewer keys, has larger size of the giant component and lower average path length compared with traditional key predistribution schemes and comparable resilience to random node attacks. We argue that in many networks like key predistribution and Internet of Things, having nodes of very high degree will be a bottle-neck in communication. Thus, studying preferential attachment model with degree bound will open up new directions in the study of complex networks, and will have many applications in real world scenarios.

preprint2012arXiv

Scheduling Resources for Executing a Partial Set of Jobs

In this paper, we consider the problem of choosing a minimum cost set of resources for executing a specified set of jobs. Each input job is an interval, determined by its start-time and end-time. Each resource is also an interval determined by its start-time and end-time; moreover, every resource has a capacity and a cost associated with it. We consider two versions of this problem. In the partial covering version, we are also given as input a number k, specifying the number of jobs that must be performed. The goal is to choose k jobs and find a minimum cost set of resources to perform the chosen k jobs (at any point of time the capacity of the chosen set of resources should be sufficient to execute the jobs active at that time). We present an O(log n)-factor approximation algorithm for this problem. We also consider the prize collecting version, wherein every job also has a penalty associated with it. The feasible solution consists of a subset of the jobs, and a set of resources, to perform the chosen subset of jobs. The goal is to find a feasible solution that minimizes the sum of the costs of the selected resources and the penalties of the jobs that are not selected. We present a constant factor approximation algorithm for this problem

preprint2011arXiv

Minimum multicuts and Steiner forests for Okamura-Seymour graphs

We study the problem of finding minimum multicuts for an undirected planar graph, where all the terminal vertices are on the boundary of the outer face. This is known as an Okamura-Seymour instance. We show that for such an instance, the minimum multicut problem can be reduced to the minimum-cost Steiner forest problem on a suitably defined dual graph. The minimum-cost Steiner forest problem has a 2-approximation algorithm. Hence, the minimum multicut problem has a 2-approximation algorithm for an Okamura-Seymour instance.

Arindam Pal

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Cascading Failures in Smart Grids under Random, Targeted and Adaptive Attacks

PhishSim: Aiding Phishing Website Detection with a Feature-Free Tool

Task Allocation using a Team of Robots

BB_Evac: Fast Location-Sensitive Behavior-Based Building Evacuation

Hier-SPCNet: A Legal Statute Hierarchy-based Heterogeneous Network for Computing Legal Case Document Similarity

HushRelay: A Privacy-Preserving, Efficient, and Scalable Routing Algorithm for Off-Chain Payments

Identification, Tracking and Impact: Understanding the trade secret of catchphrases

Innovation and Revenue: Deep Diving into the Temporal Rank-shifts of Fortune 500 Companies

Methods for Computing Legal Document Similarity: A Comparative Study

PhishZip: A New Compression-based Algorithm for Detecting Phishing Websites

Improved Algorithms for the Evacuation Route Planning Problem

Preferential Attachment Model with Degree Bound and its Application to Key Predistribution in WSN

Scheduling Resources for Executing a Partial Set of Jobs

Minimum multicuts and Steiner forests for Okamura-Seymour graphs