Source author record

Paulo Shakarian

Paulo Shakarian appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Social and Information Networks physics.soc-ph Artificial Intelligence Cryptography and Security cs.CY Logic in Computer Science Machine Learning Multiagent Systems Computer Science and Game Theory Data Structures and Algorithms Populations and Evolution

Catalog footprint

What is connected

37works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Machine Learning Model Integration with Open World Temporal Logic for Process Automation

Recent advances in Machine Learning (ML) have produced models that extract structured information from complex data. However, a significant challenge lies in translating these perceptual or extractive outputs into actionable and explainable decisions within complex operational workflows. To address these challenges, this paper introduces a novel approach that integrates the outputs of various machine learning models directly with the PyReason framework, an open-world temporal logic programming reasoning engine. PyReason's foundation in generalized annotated logic allows for the incorporation of real-valued outputs (e.g., probabilities, confidence scores) from a diverse set of ML models, treating them as truth intervals within its logical framework. Crucially, PyReason provides mechanisms, implemented in Python, to continuously poll ML model outputs, convert them into logical facts, and dynamically recompute the minimal model to enable decision-making in real-time. Furthermore, its native support for temporal reasoning, knowledge graph integration, and fully explainable interface traces enables an analysis of time-sensitive process data and existing organizational knowledge. By combining the strengths of perception and extraction from ML models with the logical deduction and transparency of PyReason, we aim to create a powerful system for automating complex processes. This integration is well suited for use cases in numerous domains, including manufacturing, healthcare, and business operations.

preprint2026arXiv

Position: Artificial Intelligence Needs Meta Intelligence -- the Case for Metacognitive AI

This position paper argues for metacognition as a general design principle for creating more accurate, secure, and efficient AI. The metacognitive solution involves systems monitoring their own states and judiciously allocating resources depending on each problem instance's difficulty or cost of mistakes. Drawing inspiration both from past work on resource-rational AI and from well-documented metacognitive strategies in psychology and cognitive science, we identify specific challenges in embedding these strategies into AI design and highlight open theoretical and implementation problems. We showcase these principles through a tangible example of improved learning efficiency, effectiveness, and security in a Federated Learning (FL) case study. We show how these principles can be translated into practice with a novel software framework developed specifically to allow the community to design, deploy, and experiment with metacognition-enabled AI applications.

preprint2026arXiv

Reasoning about Medical Triage Optimization with Logic Programming

We present a logic programming framework that orchestrates multiple variants of an optimization problem and reasons about their results to support high-stakes medical decision-making. The logic programming layer coordinates the construction and evaluation of multiple optimization formulations, translating solutions into logical facts that support further symbolic reasoning and ensure efficient resource allocation -- specifically targeting the "right patient, right platform, right escort, right time, right destination" principle. This capability is integrated into GuardianTwin, a decision support system for Forward Medical Evacuation (MEDEVAC), where rapid and explainable resource allocation is critical. Through a series of experiments, our framework demonstrates an average reduction in casualties by 35.75% compared to standard baselines. Additionally, we explore how users engage with the system via an intuitive interface that delivers explainable insights, ultimately enhancing decision-making in critical situations. This work demonstrates how logic programming can serve as a foundation for modular, interpretable, and operationally effective optimization in mission-critical domains.

preprint2026arXiv

Tokens-per-Parameter Coverage Is Critical for Robust LLM Scaling Law Extrapolation

Neural scaling laws approximate a language model's loss as a power-law function of parameter count $N$ and token count $D$. Following Chinchilla-style compute-optimal training, many studies fit scaling laws from runs performed under a fixed tokens-per-parameter (TPP) ratio $k$ and set $D = kN$. We show that this collinear design, combined with the empirically common near-equality of the exponents governing $N$ and $D$, induces an inherent ill-conditioning in the Gauss-Newton least-squares problem: the condition number of the design grows as the inverse square of the gap between the $N$ and $D$-exponents. The scale coefficients become practically unidentifiable, with confidence intervals inflating by an order of magnitude or more, yielding a ``sloppy'' model whose extrapolations degrade sharply off the training ray. We prove this for four scaling-law formalisms and derive a closed-form TPP-diversity threshold that is necessary and sufficient for well-conditioned estimation. Empirically, non-collinear designs outperform collinear ones on held-out splits with a 97.3\% win rate across four laws, five corpora, multiple floating point precision modes. We further show the degeneracy is rooted in Jacobian geometry and is not an artifact of the loss function: any smooth estimation objective whose curvature involves the Jacobian inherits the same ill-conditioning.

preprint2020arXiv

A Feature-Driven Approach for Identifying Pathogenic Social Media Accounts

Over the past few years, we have observed different media outlets' attempts to shift public opinion by framing information to support a narrative that facilitate their goals. Malicious users referred to as "pathogenic social media" (PSM) accounts are more likely to amplify this phenomena by spreading misinformation to viral proportions. Understanding the spread of misinformation from account-level perspective is thus a pressing problem. In this work, we aim to present a feature-driven approach to detect PSM accounts in social media. Inspired by the literature, we set out to assess PSMs from three broad perspectives: (1) user-related information (e.g., user activity, profile characteristics), (2) source-related information (i.e., information linked via URLs shared by users) and (3) content-related information (e.g., tweets characteristics). For the user-related information, we investigate malicious signals using causality analysis (i.e., if user is frequently a cause of viral cascades) and profile characteristics (e.g., number of followers, etc.). For the source-related information, we explore various malicious properties linked to URLs (e.g., URL address, content of the associated website, etc.). Finally, for the content-related information, we examine attributes (e.g., number of hashtags, suspicious hashtags, etc.) from tweets posted by users. Experiments on real-world Twitter data from different countries demonstrate the effectiveness of the proposed approach in identifying PSM users.

preprint2020arXiv

Leveraging Motifs to Model the Temporal Dynamics of Diffusion Networks

Information diffusion mechanisms based on social influence models are mainly studied using likelihood of adoption when active neighbors expose a user to a message. The problem arises primarily from the fact that for the most part, this explicit information of who-exposed-whom among a group of active neighbors in a social network, before a susceptible node is infected is not available. In this paper, we attempt to understand the diffusion process through information cascades by studying the temporal network structure of the cascades. In doing so, we accommodate the effect of exposures from active neighbors of a node through a network pruning technique that leverages network motifs to identify potential infectors responsible for exposures from among those active neighbors. We attempt to evaluate the effectiveness of the components used in modeling cascade dynamics and especially whether the additional effect of the exposure information is useful. Following this model, we develop an inference algorithm namely InferCut, that uses parameters learned from the model and the exposure information to predict the actual parent node of each potentially susceptible user in a given cascade. Empirical evaluation on a real world dataset from Weibo social network demonstrate the significance of incorporating exposure information in recovering the exact parents of the exposed users at the early stages of the diffusion process.

preprint2020arXiv

Mining user interaction patterns in the darkweb to predict enterprise cyber incidents

With rise in security breaches over the past few years, there has been an increasing need to mine insights from social media platforms to raise alerts of possible attacks in an attempt to defend conflict during competition. In this study, we attempt to build a framework that utilizes unconventional signals from the darkweb forums by leveraging the reply network structure of user interactions with the goal of predicting enterprise related external cyber attacks. We use both unsupervised and supervised learning models that address the challenges that come with the lack of enterprise attack metadata for ground truth validation as well as insufficient data for training the models. We validate our models on a binary classification problem that attempts to predict cyber attacks on a daily basis for an organization. Using several controlled studies on features leveraging the network structure, we measure the extent to which the indicators from the darkweb forums can be successfully used to predict attacks. We use information from 53 forums in the darkweb over a span of 17 months for the task. Our framework to predict real world organization cyber attacks of 3 different security events, suggest that focusing on the reply path structure between groups of users based on random walk transitions and community structures has an advantage in terms of better performance solely relying on forum or user posting statistics prior to attacks.

preprint2020arXiv

Understanding and forecasting lifecycle events in information cascades

Most social network sites allow users to reshare a piece of information posted by a user. As time progresses, the cascade of reshares grows, eventually saturating after a certain time period. While previous studies have focused heavily on one aspect of the cascade phenomenon, specifically predicting when the cascade would go viral, in this paper, we take a more holistic approach by analyzing the occurrence of two events within the cascade lifecycle - the period of maximum growth in terms of surge in reshares and the period where the cascade starts declining in adoption. We address the challenges in identifying these periods and then proceed to make a comparative analysis of these periods from the perspective of network topology. We study the effect of several node-centric structural measures on the reshare responses using Granger causality which helps us quantify the significance of the network measures and understand the extent to which the network topology impacts the growth dynamics. This evaluation is performed on a dataset of 7407 cascades extracted from the Weibo social network. Using our causality framework, we found that an entropy measure based on nodal degree causally affects the occurrence of these events in 93.95% of cascades. Surprisingly, this outperformed clustering coefficient and PageRank which we hypothesized would be more indicative of the growth dynamics based on earlier studies. We also extend the Granger-causality Vector Autoregression (VAR) model to forecast the times at which the events occur in the cascade lifecycle.

preprint2020arXiv

Use of a controlled experiment and computational models to measure the impact of sequential peer exposures on decision making

It is widely believed that one's peers influence product adoption behaviors. This relationship has been linked to the number of signals a decision-maker receives in a social network. But it is unclear if these same principles hold when the pattern by which it receives these signals vary and when peer influence is directed towards choices which are not optimal. To investigate that, we manipulate social signal exposure in an online controlled experiment using a game with human participants. Each participant in the game makes a decision among choices with differing utilities. We observe the following: (1) even in the presence of monetary risks and previously acquired knowledge of the choices, decision-makers tend to deviate from the obvious optimal decision when their peers make similar decision which we call the influence decision, (2) when the quantity of social signals vary over time, the forwarding probability of the influence decision and therefore being responsive to social influence does not necessarily correlate proportionally to the absolute quantity of signals. To better understand how these rules of peer influence could be used in modeling applications of real world diffusion and in networked environments, we use our behavioral findings to simulate spreading dynamics in real world case studies. We specifically try to see how cumulative influence plays out in the presence of user uncertainty and measure its outcome on rumor diffusion, which we model as an example of sub-optimal choice diffusion. Together, our simulation results indicate that sequential peer effects from the influence decision overcomes individual uncertainty to guide faster rumor diffusion over time. However, when the rate of diffusion is slow in the beginning, user uncertainty can have a substantial role compared to peer influence in deciding the adoption trajectory of a piece of questionable information.

preprint2016arXiv

A Comparison of Methods for Cascade Prediction

Information cascades exist in a wide variety of platforms on Internet. A very important real-world problem is to identify which information cascades can go viral. A system addressing this problem can be used in a variety of applications including public health, marketing and counter-terrorism. As a cascade can be considered as compound of the social network and the time series. However, in related literature where methods for solving the cascade prediction problem were proposed, the experimental settings were often limited to only a single metric for a specific problem formulation. Moreover, little attention was paid to the run time of those methods. In this paper, we first formulate the cascade prediction problem as both classification and regression. Then we compare three categories of cascade prediction methods: centrality based, feature based and point process based. We carry out the comparison through evaluation of the methods by both accuracy metrics and run time. The results show that feature based methods can outperform others in terms of prediction accuracy but suffer from heavy overhead especially for large datasets. While point process based methods can also run into issue of long run time when the model can not well adapt to the data. This paper seeks to address issues in order to allow developers of systems for social network analysis to select the most appropriate method for predicting viral information cascades.

preprint2016arXiv

A Non-Parametric Learning Approach to Identify Online Human Trafficking

Human trafficking is among the most challenging law enforcement problems which demands persistent fight against from all over the globe. In this study, we leverage readily available data from the website "Backpage"-- used for classified advertisement-- to discern potential patterns of human trafficking activities which manifest online and identify most likely trafficking related advertisements. Due to the lack of ground truth, we rely on two human analysts --one human trafficking victim survivor and one from law enforcement, for hand-labeling the small portion of the crawled data. We then present a semi-supervised learning approach that is trained on the available labeled and unlabeled data and evaluated on unseen data with further verification of experts.

preprint2016arXiv

An Empirical Evaluation Of Social Influence Metrics

Predicting when an individual will adopt a new behavior is an important problem in application domains such as marketing and public health. This paper examines the perfor- mance of a wide variety of social network based measurements proposed in the literature - which have not been previously compared directly. We study the probability of an individual becoming influenced based on measurements derived from neigh- borhood (i.e. number of influencers, personal network exposure), structural diversity, locality, temporal measures, cascade mea- sures, and metadata. We also examine the ability to predict influence based on choice of classifier and how the ratio of positive to negative samples in both training and testing affect prediction results - further enabling practical use of these concepts for social influence applications.

preprint2016arXiv

Argumentation Models for Cyber Attribution

A major challenge in cyber-threat analysis is combining information from different sources to find the person or the group responsible for the cyber-attack. It is one of the most important technical and policy challenges in cyber-security. The lack of ground truth for an individual responsible for an attack has limited previous studies. In this paper, we take a first step towards overcoming this limitation by building a dataset from the capture-the-flag event held at DEFCON, and propose an argumentation model based on a formal reasoning framework called DeLP (Defeasible Logic Programming) designed to aid an analyst in attributing a cyber-attack. We build models from latent variables to reduce the search space of culprits (attackers), and show that this reduction significantly improves the performance of classification-based approaches from 37% to 62% in identifying the attacker.

preprint2016arXiv

Darknet and Deepnet Mining for Proactive Cybersecurity Threat Intelligence

In this paper, we present an operational system for cyber threat intelligence gathering from various social platforms on the Internet particularly sites on the darknet and deepnet. We focus our attention to collecting information from hacker forum discussions and marketplaces offering products and services focusing on malicious hacking. We have developed an operational system for obtaining information from these sites for the purposes of identifying emerging cyber threats. Currently, this system collects on average 305 high-quality cyber threat warnings each week. These threat warnings include information on newly developed malware and exploits that have not yet been deployed in a cyber-attack. This provides a significant service to cyber-defenders. The system is significantly augmented through the use of various data mining and machine learning techniques. With the use of machine learning models, we are able to recall 92% of products in marketplaces and 80% of discussions on forums relating to malicious hacking with high precision. We perform preliminary analysis on the data collected, demonstrating its application to aid a security expert for better threat analysis.

preprint2016arXiv

MIST: Missing Person Intelligence Synthesis Toolkit

Each day, approximately 500 missing persons cases occur that go unsolved/unresolved in the United States. The non-profit organization known as the Find Me Group (FMG), led by former law enforcement professionals, is dedicated to solving or resolving these cases. This paper introduces the Missing Person Intelligence Synthesis Toolkit (MIST) which leverages a data-driven variant of geospatial abductive inference. This system takes search locations provided by a group of experts and rank-orders them based on the probability assigned to areas based on the prior performance of the experts taken as a group. We evaluate our approach compared to the current practices employed by the Find Me Group and found it significantly reduces the search area - leading to a reduction of 31 square miles over 24 cases we examined in our experiments. Currently, we are using MIST to aid the Find Me Group in an active missing person case.

preprint2016arXiv

Product Offerings in Malicious Hacker Markets

Marketplaces specializing in malicious hacking products - including malware and exploits - have recently become more prominent on the darkweb and deepweb. We scrape 17 such sites and collect information about such products in a unified database schema. Using a combination of manual labeling and unsupervised clustering, we examine a corpus of products in order to understand their various categories and how they become specialized with respect to vendor and marketplace. This initial study presents how we effectively employed unsupervised techniques to this data as well as the types of insights we gained on various categories of malicious hacking products.

preprint2016arXiv

Toward Early and Order-of-Magnitude Cascade Prediction in Social Networks

When a piece of information (microblog, photograph, video, link, etc.) starts to spread in a social network, an important question arises: will it spread to viral proportions - where viral can be defined as an order-of-magnitude increase. However, several previous studies have established that cascade size and frequency are related through a power-law - which leads to a severe imbalance in this classification problem. In this paper, we devise a suite of measurements based on structural diversity - the variety of social contexts (communities) in which individuals partaking in a given cascade engage. We demonstrate these measures are able to distinguish viral from non-viral cascades, despite the severe imbalance of the data for this problem. Further, we leverage these measurements as features in a classification approach, successfully predicting microblogs that grow from 50 to 500 reposts with precision of 0.69 and recall of 0.52 for the viral class - despite this class comprising under 2% of samples. This significantly outperforms our baseline approach as well as the current state-of-the-art. We also show this approach also performs well for identifying if cascades observed for 60 minutes will grow to 500 reposts as well as demonstrate how we can tradeoff between precision and recall.

preprint2015arXiv

Cyber Attacks and Public Embarrassment: A Survey of Some Notable Hacks

We hear it all too often in the media: an organization is attacked, its data, often containing personally identifying information, is made public, and a hacking group emerges to claim credit. In this excerpt, we discuss how such groups operate and describe the details of a few major cyber-attacks of this sort in the wider context of how they occurred. We feel that understanding how such groups have operated in the past will give organizations ideas of how to defend against them in the future.

preprint2015arXiv

Cyber-Deception and Attribution in Capture-the-Flag Exercises

Attributing the culprit of a cyber-attack is widely considered one of the major technical and policy challenges of cyber-security. The lack of ground truth for an individual responsible for a given attack has limited previous studies. Here, we overcome this limitation by leveraging DEFCON capture-the-flag (CTF) exercise data where the actual ground-truth is known. In this work, we use various classification techniques to identify the culprit in a cyberattack and find that deceptive activities account for the majority of misclassified samples. We also explore several heuristics to alleviate some of the misclassification caused by deception.

preprint2015arXiv

Early Identification of Violent Criminal Gang Members

Gang violence is a major problem in the United States accounting for a large fraction of homicides and other violent crime. In this paper, we study the problem of early identification of violent gang members. Our approach relies on modified centrality measures that take into account additional data of the individuals in the social network of co-arrestees which together with other arrest metadata provide a rich set of features for a classification algorithm. We show our approach obtains high precision and recall (0.89 and 0.78 respectively) in the case where the entire network is known and out-performs current approaches used by law-enforcement to the problem in the case where the network is discovered overtime by virtue of new arrests - mimicking real-world law-enforcement operations. Operational issues are also discussed as we are preparing to leverage this method in an operational environment.

preprint2015arXiv

Malware Task Identification: A Data Driven Approach

Identifying the tasks a given piece of malware was designed to perform (e.g. logging keystrokes, recording video, establishing remote access, etc.) is a difficult and time-consuming operation that is largely human-driven in practice. In this paper, we present an automated method to identify malware tasks. Using two different malware collections, we explore various circumstances for each - including cases where the training data differs significantly from test; where the malware being evaluated employs packing to thwart analytical techniques; and conditions with sparse training data. We find that this approach consistently out-performs the current state-of-the art software for malware task identification as well as standard machine learning approaches - often achieving an unbiased F1 score of over 0.9. In the near future, we look to deploy our approach for use by analysts in an operational cyber-security environment.

preprint2015arXiv

Mining for Causal Relationships: A Data-Driven Study of the Islamic State

The Islamic State of Iraq and al-Sham (ISIS) is a dominant insurgent group operating in Iraq and Syria that rose to prominence when it took over Mosul in June, 2014. In this paper, we present a data-driven approach to analyzing this group using a dataset consisting of 2200 incidents of military activity surrounding ISIS and the forces that oppose it (including Iraqi, Syrian, and the American-led coalition). We combine ideas from logic programming and causal reasoning to mine for association rules for which we present evidence of causality. We present relationships that link ISIS vehicle-bourne improvised explosive device (VBIED) activity in Syria with military operations in Iraq, coalition air strikes, and ISIS IED activity, as well as rules that may serve as indicators of spikes in indirect fire, suicide attacks, and arrests.

preprint2015arXiv

Toward Order-of-Magnitude Cascade Prediction

When a piece of information (microblog, photograph, video, link, etc.) starts to spread in a social network, an important question arises: will it spread to "viral" proportions -- where "viral" is defined as an order-of-magnitude increase. However, several previous studies have established that cascade size and frequency are related through a power-law - which leads to a severe imbalance in this classification problem. In this paper, we devise a suite of measurements based on "structural diversity" -- the variety of social contexts (communities) in which individuals partaking in a given cascade engage. We demonstrate these measures are able to distinguish viral from non-viral cascades, despite the severe imbalance of the data for this problem. Further, we leverage these measurements as features in a classification approach, successfully predicting microblogs that grow from 50 to 500 reposts with precision of 0.69 and recall of 0.52 for the viral class - despite this class comprising under 2\% of samples. This significantly outperforms our baseline approach as well as the current state-of-the-art. Our work also demonstrates how we can tradeoff between precision and recall.

preprint2014arXiv

An Argumentation-Based Framework to Address the Attribution Problem in Cyber-Warfare

Attributing a cyber-operation through the use of multiple pieces of technical evidence (i.e., malware reverse-engineering and source tracking) and conventional intelligence sources (i.e., human or signals intelligence) is a difficult problem not only due to the effort required to obtain evidence, but the ease with which an adversary can plant false evidence. In this paper, we introduce a formal reasoning system called the InCA (Intelligent Cyber Attribution) framework that is designed to aid an analyst in the attribution of a cyber-operation even when the available information is conflicting and/or uncertain. Our approach combines argumentation-based reasoning, logic programming, and probabilistic models to not only attribute an operation but also explain to the analyst why the system reaches its conclusions.

preprint2014arXiv

Belief Revision in Structured Probabilistic Argumentation

In real-world applications, knowledge bases consisting of all the information at hand for a specific domain, along with the current state of affairs, are bound to contain contradictory data coming from different sources, as well as data with varying degrees of uncertainty attached. Likewise, an important aspect of the effort associated with maintaining knowledge bases is deciding what information is no longer useful; pieces of information (such as intelligence reports) may be outdated, may come from sources that have recently been discovered to be of low quality, or abundant evidence may be available that contradicts them. In this paper, we propose a probabilistic structured argumentation framework that arises from the extension of Presumptive Defeasible Logic Programming (PreDeLP) with probabilistic models, and argue that this formalism is capable of addressing the basic issues of handling contradictory and uncertain data. Then, to address the last issue, we focus on the study of non-prioritized belief revision operations over probabilistic PreDeLP programs. We propose a set of rationality postulates -- based on well-known ones developed for classical knowledge bases -- that characterize how such operations should behave, and study a class of operators along with theoretical relationships with the proposed postulates, including a representation theorem stating the equivalence between this class and the class of operators characterized by the postulates.

preprint2014arXiv

Power Grid Defense Against Malicious Cascading Failure

An adversary looking to disrupt a power grid may look to target certain substations and sources of power generation to initiate a cascading failure that maximizes the number of customers without electricity. This is particularly an important concern when the enemy has the capability to launch cyber-attacks as practical concerns (i.e. avoiding disruption of service, presence of legacy systems, etc.) may hinder security. Hence, a defender can harden the security posture at certain power stations but may lack the time and resources to do this for the entire power grid. We model a power grid as a graph and introduce the cascading failure game in which both the defender and attacker choose a subset of power stations such as to minimize (maximize) the number of consumers having access to producers of power. We formalize problems for identifying both mixed and deterministic strategies for both players, prove complexity results under a variety of different scenarios, identify tractable cases, and develop algorithms for these problems. We also perform an experimental evaluation of the model and game on a real-world power grid network. Empirically, we noted that the game favors the attacker as he benefits more from increased resources than the defender. Further, the minimax defense produces roughly the same expected payoff as an easy-to-compute deterministic load based (DLB) defense when played against a minimax attack strategy. However, DLB performs more poorly than minimax defense when faced with the attacker's best response to DLB. This is likely due to the presence of low-load yet high-payoff nodes, which we also found in our empirical analysis.

preprint2013arXiv

A Novel Analytical Method for Evolutionary Graph Theory Problems

Evolutionary graph theory studies the evolutionary dynamics of populations structured on graphs. A central problem is determining the probability that a small number of mutants overtake a population. Currently, Monte Carlo simulations are used for estimating such fixation probabilities on general directed graphs, since no good analytical methods exist. In this paper, we introduce a novel deterministic framework for computing fixation probabilities for strongly connected, directed, weighted evolutionary graphs under neutral drift. We show how this framework can also be used to calculate the expected number of mutants at a given time step (even if we relax the assumption that the graph is strongly connected), how it can extend to other related models (e.g. voter model), how our framework can provide non-trivial bounds for fixation probability in the case of an advantageous mutant, and how it can be used to find a non-trivial lower bound on the mean time to fixation. We provide various experimental results determining fixation probabilities and expected number of mutants on different graphs. Among these, we show that our method consistently outperforms Monte Carlo simulations in speed by several orders of magnitude. Finally we show how our approach can provide insight into synaptic competition in neurology.

preprint2013arXiv

A Scalable Heuristic for Viral Marketing Under the Tipping Model

In a "tipping" model, each node in a social network, representing an individual, adopts a property or behavior if a certain number of his incoming neighbors currently exhibit the same. In viral marketing, a key problem is to select an initial "seed" set from the network such that the entire network adopts any behavior given to the seed. Here we introduce a method for quickly finding seed sets that scales to very large networks. Our approach finds a set of nodes that guarantees spreading to the entire network under the tipping model. After experimentally evaluating 31 real-world networks, we found that our approach often finds seed sets that are several orders of magnitude smaller than the population size and outperform nodal centrality measures in most cases. In addition, our approach scales well - on a Friendster social network consisting of 5.6 million nodes and 28 million edges we found a seed set in under 3.6 hours. Our experiments also indicate that our algorithm provides small seed sets even if high-degree nodes are removed. Lastly, we find that highly clustered local neighborhoods, together with dense network-wide community structures, suppress a trend's ability to spread under the tipping model.

preprint2013arXiv

Geospatial Optimization Problems

There are numerous applications which require the ability to take certain actions (e.g. distribute money, medicines, people etc.) over a geographic region. A disaster relief organization must allocate people and supplies to parts of a region after a disaster. A public health organization must allocate limited vaccine to people across a region. In both cases, the organization is trying to optimize something (e.g. minimize expected number of people with a disease). We introduce "geospatial optimization problems" (GOPs) where an organization has limited resources and budget to take actions in a geographic area. The actions result in one or more properties changing for one or more locations. There are also certain constraints on the combinations of actions that can be taken. We study two types of GOPs - goal-based and benefit-maximizing (GBGOP and BMGOP respectively). A GBGOP ensures that certain properties must be true at specified locations after the actions are taken while a BMGOP optimizes a linear benefit function. We show both problems to be NP-hard (with membership in NP for the associated decision problems). Additionally, we prove limits on approximation for both problems. We present integer programs for both GOPs that provide exact solutions. We also correctly reduce the number of variables in for the GBGOP integer constraints. For BMGOP, we present the BMGOP-Compute algorithm that runs in PTIME and provides a reasonable approximation guarantee in most cases.

preprint2013arXiv

Large Social Networks can be Targeted for Viral Marketing with Small Seed Sets

In a "tipping" model, each node in a social network, representing an individual, adopts a behavior if a certain number of his incoming neighbors previously held that property. A key problem for viral marketers is to determine an initial "seed" set in a network such that if given a property then the entire network adopts the behavior. Here we introduce a method for quickly finding seed sets that scales to very large networks. Our approach finds a set of nodes that guarantees spreading to the entire network under the tipping model. After experimentally evaluating 31 real-world networks, we found that our approach often finds such sets that are several orders of magnitude smaller than the population size. Our approach also scales well - on a Friendster social network consisting of 5.6 million nodes and 28 million edges we found a seed sets in under 3.6 hours. We also find that highly clustered local neighborhoods and dense network-wide community structure together suppress the ability of a trend to spread under the tipping model.

preprint2013arXiv

MANCaLog: A Logic for Multi-Attribute Network Cascades (Technical Report)

The modeling of cascade processes in multi-agent systems in the form of complex networks has in recent years become an important topic of study due to its many applications: the adoption of commercial products, spread of disease, the diffusion of an idea, etc. In this paper, we begin by identifying a desiderata of seven properties that a framework for modeling such processes should satisfy: the ability to represent attributes of both nodes and edges, an explicit representation of time, the ability to represent non-Markovian temporal relationships, representation of uncertain information, the ability to represent competing cascades, allowance of non-monotonic diffusion, and computational tractability. We then present the MANCaLog language, a formalism based on logic programming that satisfies all these desiderata, and focus on algorithms for finding minimal models (from which the outcome of cascades can be obtained) as well as how this formalism can be applied in real world scenarios. We are not aware of any other formalism in the literature that meets all of the above requirements.

preprint2013arXiv

Mining for Geographically Disperse Communities in Social Networks by Leveraging Distance Modularity

Social networks where the actors occupy geospatial locations are prevalent in military, intelligence, and policing operations such as counter-terrorism, counter-insurgency, and combating organized crime. These networks are often derived from a variety of intelligence sources. The discovery of communities that are geographically disperse stems from the requirement to identify higher-level organizational structures, such as a logistics group that provides support to various geographically disperse terrorist cells. We apply a variant of Newman-Girvan modularity to this problem known as distance modularity. To address the problem of finding geographically disperse communities, we modify the well-known Louvain algorithm to find partitions of networks that provide near-optimal solutions to this quantity. We apply this algorithm to numerous samples from two real-world social networks and a terrorism network data set whose nodes have associated geospatial locations. Our experiments show this to be an effective approach and highlight various practical considerations when applying the algorithm to distance modularity maximization. Several military, intelligence, and law-enforcement organizations are working with us to further test and field software for this emerging application.

preprint2013arXiv

Mining for Spatially-Near Communities in Geo-Located Social Networks

Current approaches to community detection in social networks often ignore the spatial location of the nodes. In this paper, we look to extract spatially-near communities in a social network. We introduce a new metric to measure the quality of a community partition in a geolocated social networks called "spatially-near modularity" a value that increases based on aspects of the network structure but decreases based on the distance between nodes in the communities. We then look to find an optimal partition with respect to this measure - which should be an "ideal" community with respect to both social ties and geographic location. Though an NP-hard problem, we introduce two heuristic algorithms that attempt to maximize this measure and outperform non-geographic community finding by an order of magnitude. Applications to counter-terrorism are also discussed.

preprint2013arXiv

Social Network Intelligence Analysis to Combat Street Gang Violence

In this paper we introduce the Organization, Relationship, and Contact Analyzer (ORCA) that is designed to aide intelligence analysis for law enforcement operations against violent street gangs. ORCA is designed to address several police analytical needs concerning street gangs using new techniques in social network analysis. Specifically, it can determine "degree of membership" for individuals who do not admit to membership in a street gang, quickly identify sets of influential individuals (under the tipping model), and identify criminal ecosystems by decomposing gangs into sub-groups. We describe this software and the design decisions considered in building an intelligence analysis tool created specifically for countering violent street gangs as well as provide results based on conducting analysis on real-world police data provided by a major American metropolitan police department who is partnering with us and currently deploying this system for real-world use.

preprint2013arXiv

The Dragon and the Computer: Why Intellectual Property Theft is Compatible with Chinese Cyber-Warfare Doctrine

Along with the USA and Russia, China is often considered one of the leading cyber-powers in the world. In this excerpt, we explore how Chinese military thought, developed in the 1990s, influenced their cyber-operations in the early 2000s. In particular, we examine the ideas of "Unrestricted Warfare" and "Active Offense" and discuss how they can permit for the theft of intellectual property. We then specifically look at how the case study of Operation Aurora, a cyber-operation directed against many major U.S. technology and defense firms, reflects some of these ideas.

preprint2012arXiv

Shaping Operations to Attack Robust Terror Networks

Security organizations often attempt to disrupt terror or insurgent networks by targeting "high value targets" (HVT's). However, there have been numerous examples that illustrate how such networks are able to quickly re-generate leadership after such an operation. Here, we introduce the notion of a "shaping" operation in which the terrorist network is first targeted for the purpose of reducing its leadership re-generation ability before targeting HVT's. We look to conduct shaping by maximizing the network-wide degree centrality through node removal. We formally define this problem and prove solving it is NP-Complete. We introduce a mixed integer-linear program that solves this problem exactly as well as a greedy heuristic for more practical use. We implement the greedy heuristic and found in examining five real-world terrorist networks that removing only 12% of nodes can increase the network-wide centrality between 17% and 45%. We also show our algorithm can scale to large social networks of 1,133 nodes and 5,541 edges on commodity hardware.

preprint2012arXiv

Spreaders in the Network SIR Model: An Empirical Study

We use the susceptible-infected-recovered (SIR) model for disease spread over a network, and empirically study how well various centrality measures perform at identifying which nodes in a network will be the best spreaders of disease on 10 real-world networks. We find that the relative performance of degree, shell number and other centrality measures can be sensitive to B, the probability that an infected node will transmit the disease to a susceptible node. We also find that eigenvector centrality performs very well in general for values of B above the epidemic threshold.

Paulo Shakarian

What is connected

Connect this record

See the researcher in context

Building this map preview

37 published item(s)

Machine Learning Model Integration with Open World Temporal Logic for Process Automation

Position: Artificial Intelligence Needs Meta Intelligence -- the Case for Metacognitive AI

Reasoning about Medical Triage Optimization with Logic Programming

Tokens-per-Parameter Coverage Is Critical for Robust LLM Scaling Law Extrapolation

A Feature-Driven Approach for Identifying Pathogenic Social Media Accounts

Leveraging Motifs to Model the Temporal Dynamics of Diffusion Networks

Mining user interaction patterns in the darkweb to predict enterprise cyber incidents

Understanding and forecasting lifecycle events in information cascades

Use of a controlled experiment and computational models to measure the impact of sequential peer exposures on decision making

A Comparison of Methods for Cascade Prediction

A Non-Parametric Learning Approach to Identify Online Human Trafficking

An Empirical Evaluation Of Social Influence Metrics

Argumentation Models for Cyber Attribution

Darknet and Deepnet Mining for Proactive Cybersecurity Threat Intelligence

MIST: Missing Person Intelligence Synthesis Toolkit

Product Offerings in Malicious Hacker Markets

Toward Early and Order-of-Magnitude Cascade Prediction in Social Networks

Cyber Attacks and Public Embarrassment: A Survey of Some Notable Hacks

Cyber-Deception and Attribution in Capture-the-Flag Exercises

Early Identification of Violent Criminal Gang Members

Malware Task Identification: A Data Driven Approach

Mining for Causal Relationships: A Data-Driven Study of the Islamic State

Toward Order-of-Magnitude Cascade Prediction

An Argumentation-Based Framework to Address the Attribution Problem in Cyber-Warfare

Belief Revision in Structured Probabilistic Argumentation

Power Grid Defense Against Malicious Cascading Failure

A Novel Analytical Method for Evolutionary Graph Theory Problems

A Scalable Heuristic for Viral Marketing Under the Tipping Model

Geospatial Optimization Problems

Large Social Networks can be Targeted for Viral Marketing with Small Seed Sets

MANCaLog: A Logic for Multi-Attribute Network Cascades (Technical Report)

Mining for Geographically Disperse Communities in Social Networks by Leveraging Distance Modularity

Mining for Spatially-Near Communities in Geo-Located Social Networks

Social Network Intelligence Analysis to Combat Street Gang Violence

The Dragon and the Computer: Why Intellectual Property Theft is Compatible with Chinese Cyber-Warfare Doctrine

Shaping Operations to Attack Robust Terror Networks

Spreaders in the Network SIR Model: An Empirical Study