Source author record

Alexander Gutfraind

Alexander Gutfraind appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Discrete Mathematics math.CO math.DS math.OC nlin.AO physics.soc-ph Social and Information Networks Computer Science and Game Theory cond-mat.stat-mech Data Structures and Algorithms Machine Learning math.PR physics.data-an

Catalog footprint

What is connected

11works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2019arXiv

Using massive health insurance claims data to predict very high-cost claimants: a machine learning approach

Due to escalating healthcare costs, accurately predicting which patients will incur high costs is an important task for payers and providers of healthcare. High-cost claimants (HiCCs) are patients who have annual costs above $\$250,000$ and who represent just 0.16% of the insured population but currently account for 9% of all healthcare costs. In this study, we aimed to develop a high-performance algorithm to predict HiCCs to inform a novel care management system. Using health insurance claims from 48 million people and augmented with census data, we applied machine learning to train binary classification models to calculate the personal risk of HiCC. To train the models, we developed a platform starting with 6,006 variables across all clinical and demographic dimensions and constructed over one hundred candidate models. The best model achieved an area under the receiver operating characteristic curve of 91.2%. The model exceeds the highest published performance (84%) and remains high for patients with no prior history of high-cost status (89%), who have less than a full year of enrollment (87%), or lack pharmacy claims data (88%). It attains an area under the precision-recall curve of 23.1%, and precision of 74% at a threshold of 0.99. A care management program enrolling 500 people with the highest HiCC risk is expected to treat 199 true HiCCs and generate a net savings of $\$7.3$ million per year. Our results demonstrate that high-performing predictive models can be constructed using claims data and publicly available data alone, even for rare high-cost claimants exceeding $\$250,000$. Our model demonstrates the transformational power of machine learning and artificial intelligence in care management, which would allow healthcare payers and providers to introduce the next generation of care management programs.

preprint2014arXiv

Network installation and recovery: approximation lower bounds and faster exact formulations

We study the Neighbor Aided Network Installation Problem (NANIP) introduced previously which asks for a minimal cost ordering of the vertices of a graph, where the cost of visiting a node is a function of the number of neighbors that have already been visited. This problem has applications in resource management and disaster recovery. In this paper we analyze the computational hardness of NANIP. In particular we show that this problem is NP-hard even when restricted to convex decreasing cost functions, give a linear approximation lower bound for the greedy algorithm, and prove a general sub-constant approximation lower bound. Then we give a new integer programming formulation of NANIP and empirically observe its speedup over the original integer program.

preprint2012arXiv

Multiscale Network Generation

Networks are widely used in science and technology to represent relationships between entities, such as social or ecological links between organisms, enzymatic interactions in metabolic systems, or computer infrastructure. Statistical analyses of networks can provide critical insights into the structure, function, dynamics, and evolution of those systems. However, the structures of real-world networks are often not known completely, and they may exhibit considerable variation so that no single network is sufficiently representative of a system. In such situations, researchers may turn to proxy data from related systems, sophisticated methods for network inference, or synthetic networks. Here, we introduce a flexible method for synthesizing realistic ensembles of networks starting from a known network, through a series of mappings that coarsen and later refine the network structure by randomized editing. The method, MUSKETEER, preserves structural properties with minimal bias, including unknown or unspecified features, while introducing realistic variability at multiple scales. Using examples from several domains, we show that MUSKETEER produces the intended stochasticity while achieving greater fidelity across a suite of network properties than do other commonly used network generation algorithms.

preprint2012arXiv

Optimal recovery of damaged infrastructure network

Natural disasters or attacks may disrupt infrastructure networks on a vast scale. Parts of the damaged network are interdependent, making it difficult to plan and optimally execute the recovery operations. To study how interdependencies affect the recovery schedule, we introduce a new discrete optimization problem where the goal is to minimize the total cost of installing (or recovering) a given network. This cost is determined by the structure of the network and the sequence in which the nodes are installed. Namely, the cost of installing a node is a function of the number of its neighbors that have been installed before it. We analyze the natural case where the cost function is decreasing and convex, and provide bounds on the cost of the optimal solution. We also show that all sequences have the same cost when the cost function is linear and provide an upper bound on the cost of a random solution for an Erdős-Rényi random graph. Examining the computational complexity, we show that the problem is NP-hard when the cost function is arbitrary. Finally, we provide a formulation as an integer program, an exact dynamic programming algorithm, and a greedy heuristic which gives high quality solutions.

preprint2011arXiv

Evader Interdiction and Collateral Damage

In network interdiction problems, evaders (e.g., hostile agents or data packets) may be moving through a network towards targets and we wish to choose locations for sensors in order to intercept the evaders before they reach their destinations. The evaders might follow deterministic routes or Markov chains, or they may be reactive}, i.e., able to change their routes in order to avoid sensors placed to detect them. The challenge in such problems is to choose sensor locations economically, balancing security gains with costs, including the inconvenience sensors inflict upon innocent travelers. We study the objectives of 1) maximizing the number of evaders captured when limited by a budget on sensing cost and 2) capturing all evaders as cheaply as possible. We give optimal sensor placement algorithms for several classes of special graphs and hardness and approximation results for general graphs, including for deterministic or Markov chain-based and reactive or oblivious evaders. In a similar-sounding but fundamentally different problem setting posed by Rubinstein and Glazer where both evaders and innocent travelers are reactive, we again give optimal algorithms for special cases and hardness and approximation results on general graphs.

preprint2011arXiv

Lanchester Theory and the Fate of Armed Revolts

Major revolts have recently erupted in parts of the Middle East with substantial international repercussions. Predicting, coping with and winning those revolts have become a grave problem for many regimes and for world powers. We propose a new model of such revolts that describes their evolution by building on the classic Lanchester theory of combat. The model accounts for the split in the population between those loyal to the regime and those favoring the rebels. We show that, contrary to classical Lanchesterian insights regarding traditional force-on-force engagements, the outcome of a revolt is independent of the initial force sizes; it only depends on the fraction of the population supporting each side and their combat effectiveness. We also consider the effects of foreign intervention and of shifting loyalties of the two populations during the conflict. The model's predictions are consistent with the situations currently observed in Afghanistan, Libya and Syria (Spring 2011) and it offers tentative guidance on policy.

preprint2011arXiv

Monotonic and Non-Monotonic Epidemiological Models on Networks

Contact networks can significantly change the course of epidemics, affecting the rate of new infections and the mean size of an outbreak. Despite this dependence, some characteristics of epidemics are not contingent on the contact network and are probably predictable based only on the pathogen. Here we consider SIR-like pathogens and give an elementary proof that for any network increasing the probability of transmission increases the mean outbreak size. We also introduce a simple model, termed 2FleeSIR, in which susceptibles protect themselves by avoiding contacts with infectees. The 2FleeSIR model is non-monotonic: for some networks, increasing transmissibility actually decreases the final extent. The dynamics of 2FleeSIR are fundamentally different from SIR because 2FleeSIR exhibits no outbreak transition in densely-connected networks. We show that in non-monotonic epidemics, public health officials might be able to intervene in a fundamentally new way to change the network so as to control the effect of unexpectedly-high virulence. However, interventions that decrease transmissibility might actually cause more people to become infected.

preprint2010arXiv

Interdiction of a Markovian Evader

Shortest path network interdiction is a combinatorial optimization problem on an activity network arising in a number of important security-related applications. It is classically formulated as a bilevel maximin problem representing an "interdictor" and an "evader". The evader tries to move from a source node to the target node along a path of the least cost while the interdictor attempts to frustrate this motion by cutting edges or nodes. The interdiction objective is to find the optimal set of edges to cut given that there is a finite interdiction budget and the interdictor must move first. We reformulate the interdiction problem for stochastic evaders by introducing a model in which the evader follows a Markovian random walk guided by the least-cost path to the target. This model can represent incomplete knowledge about the evader, and the resulting model is a nonlinear 0-1 optimization problem. We then introduce an optimization heuristic based on betweenness centrality that can rapidly find high-quality interdiction solutions by providing a global view of the network.

preprint2010arXiv

Optimizing topological cascade resilience based on the structure of terrorist networks

Complex socioeconomic networks such as information, finance and even terrorist networks need resilience to cascades - to prevent the failure of a single node from causing a far-reaching domino effect. We show that terrorist and guerrilla networks are uniquely cascade-resilient while maintaining high efficiency, but they become more vulnerable beyond a certain threshold. We also introduce an optimization method for constructing networks with high passive cascade resilience. The optimal networks are found to be based on cells, where each cell has a star topology. Counterintuitively, we find that there are conditions where networks should not be modified to stop cascades because doing so would come at a disproportionate loss of efficiency. Implementation of these findings can lead to more cascade-resilient networks in many diverse areas.

preprint2010arXiv

Targeting by Transnational Terrorist Groups

Many successful terrorist groups operate across international borders where different countries host different stages of terrorist operations. Often the recruits for the group come from one country or countries, while the targets of the operations are in another. Stopping such attacks is difficult because intervention in any region or route might merely shift the terrorists elsewhere. Here we propose a model of transnational terrorism based on the theory of activity networks. The model represents attacks on different countries as paths in a network. The group is assumed to prefer paths of lowest cost (or risk) and maximal yield from attacks. The parameters of the model are computed for the Islamist-Salafi terrorist movement based on open source data and then used for estimation of risks of future attacks. The central finding is that the USA has an enduring appeal as a target, due to lack of other nations of matching geopolitical weight or openness. It is also shown that countries in Africa and Asia that have been overlooked as terrorist bases may become highly significant threats in the future. The model quantifies the dilemmas facing countries in the effort to cut such networks, and points to a limitation of deterrence against transnational terrorists.

preprint2009arXiv

Understanding Terrorist Organizations with a Dynamic Model

Terrorist organizations change over time because of processes such as recruitment and training as well as counter-terrorism (CT) measures, but the effects of these processes are typically studied qualitatively and in separation from each other. Seeking a more quantitative and integrated understanding, we constructed a simple dynamic model where equations describe how these processes change an organization's membership. Analysis of the model yields a number of intuitive as well as novel findings. Most importantly it becomes possible to predict whether counter-terrorism measures would be sufficient to defeat the organization. Furthermore, we can prove in general that an organization would collapse if its strength and its pool of foot soldiers decline simultaneously. In contrast, a simultaneous decline in its strength and its pool of leaders is often insufficient and short-termed. These results and other like them demonstrate the great potential of dynamic models for informing terrorism scholarship and counter-terrorism policy making.

Alexander Gutfraind

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Using massive health insurance claims data to predict very high-cost claimants: a machine learning approach

Network installation and recovery: approximation lower bounds and faster exact formulations

Multiscale Network Generation

Optimal recovery of damaged infrastructure network

Evader Interdiction and Collateral Damage

Lanchester Theory and the Fate of Armed Revolts

Monotonic and Non-Monotonic Epidemiological Models on Networks

Interdiction of a Markovian Evader

Optimizing topological cascade resilience based on the structure of terrorist networks

Targeting by Transnational Terrorist Groups

Understanding Terrorist Organizations with a Dynamic Model