Source author record

Arunesh Sinha

Arunesh Sinha appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Science and Game Theory Machine Learning Cryptography and Security Artificial Intelligence Computational Engineering, Finance, and Science cs.CY econ.GN Human-Computer Interaction math.OC q-fin.EC q-fin.ST

Catalog footprint

What is connected

17works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Handling Long and Richly Constrained Tasks through Constrained Hierarchical Reinforcement Learning

Safety in goal directed Reinforcement Learning (RL) settings has typically been handled through constraints over trajectories and have demonstrated good performance in primarily short horizon tasks. In this paper, we are specifically interested in the problem of solving temporally extended decision making problems such as robots cleaning different areas in a house while avoiding slippery and unsafe areas (e.g., stairs) and retaining enough charge to move to a charging dock; in the presence of complex safety constraints. Our key contribution is a (safety) Constrained Search with Hierarchical Reinforcement Learning (CoSHRL) mechanism that combines an upper level constrained search agent (which computes a reward maximizing policy from a given start to a far away goal state while satisfying cost constraints) with a low-level goal conditioned RL agent (which estimates cost and reward values to move between nearby states). A major advantage of CoSHRL is that it can handle constraints on the cost value distribution (e.g., on Conditional Value at Risk, CVaR) and can adjust to flexible constraint thresholds without retraining. We perform extensive experiments with different types of safety constraints to demonstrate the utility of our approach over leading approaches in constrained and hierarchical RL.

preprint2022arXiv

AI for Social Impact: Learning and Planning in the Data-to-Deployment Pipeline

With the maturing of AI and multiagent systems research, we have a tremendous opportunity to direct these advances towards addressing complex societal problems. In pursuit of this goal of AI for Social Impact, we as AI researchers must go beyond improvements in computational methodology; it is important to step out in the field to demonstrate social impact. To this end, we focus on the problems of public safety and security, wildlife conservation, and public health in low-resource communities, and present research advances in multiagent systems to address one key cross-cutting challenge: how to effectively deploy our limited intervention resources in these problem domains. We present case studies from our deployments around the world as well as lessons learned that we hope are of use to researchers who are interested in AI for Social Impact. In pushing this research agenda, we believe AI can indeed play an important role in fighting social injustice and improving society.

preprint2022arXiv

Multiscale Generative Models: Improving Performance of a Generative Model Using Feedback from Other Dependent Generative Models

Realistic fine-grained multi-agent simulation of real-world complex systems is crucial for many downstream tasks such as reinforcement learning. Recent work has used generative models (GANs in particular) for providing high-fidelity simulation of real-world systems. However, such generative models are often monolithic and miss out on modeling the interaction in multi-agent systems. In this work, we take a first step towards building multiple interacting generative models (GANs) that reflects the interaction in real world. We build and analyze a hierarchical set-up where a higher-level GAN is conditioned on the output of multiple lower-level GANs. We present a technique of using feedback from the higher-level GAN to improve performance of lower-level GANs. We mathematically characterize the conditions under which our technique is impactful, including understanding the transfer learning nature of our set-up. We present three distinct experiments on synthetic data, time series data, and image domain, revealing the wide applicability of our technique.

preprint2022arXiv

Proceedings of the Artificial Intelligence for Cyber Security (AICS) Workshop at AAAI 2022

The workshop will focus on the application of AI to problems in cyber security. Cyber systems generate large volumes of data, utilizing this effectively is beyond human capabilities. Additionally, adversaries continue to develop new attacks. Hence, AI methods are required to understand and protect the cyber domain. These challenges are widely studied in enterprise networks, but there are many gaps in research and practice as well as novel problems in other domains. In general, AI techniques are still not widely adopted in the real world. Reasons include: (1) a lack of certification of AI for security, (2) a lack of formal study of the implications of practical constraints (e.g., power, memory, storage) for AI systems in the cyber domain, (3) known vulnerabilities such as evasion, poisoning attacks, (4) lack of meaningful explanations for security analysts, and (5) lack of analyst trust in AI solutions. There is a need for the research community to develop novel solutions for these practical issues.

preprint2022arXiv

Safe Delivery of Critical Services in Areas with Volatile Security Situation via a Stackelberg Game Approach

Vaccine delivery in under-resourced locations with security risks is not just challenging but also life threatening. The current COVID pandemic and the need to vaccinate have added even more urgency to this issue. Motivated by this problem, we propose a general framework to set-up limited temporary (vaccination) centers that balance physical security and desired (vaccine) service coverage with limited resources. We set-up the problem as a Stackelberg game between the centers operator (defender) and an adversary, where the set of centers is not fixed a priori but is part of the decision output. This results in a mixed combinatorial and continuous optimization problem. As part of our scalable approximation of this problem, we provide a fundamental contribution by identifying general duality conditions of switching max and min when both discrete and continuous variables are involved. We perform detailed experiments to show that the solution proposed is scalable in practice.

preprint2022arXiv

Scalable Distributional Robustness in a Class of Non Convex Optimization with Guarantees

Distributionally robust optimization (DRO) has shown lot of promise in providing robustness in learning as well as sample based optimization problems. We endeavor to provide DRO solutions for a class of sum of fractionals, non-convex optimization which is used for decision making in prominent areas such as facility location and security games. In contrast to previous work, we find it more tractable to optimize the equivalent variance regularized form of DRO rather than the minimax form. We transform the variance regularized form to a mixed-integer second order cone program (MISOCP), which, while guaranteeing near global optimality, does not scale enough to solve problems with real world data-sets. We further propose two abstraction approaches based on clustering and stratified sampling to increase scalability, which we then use for real world data-sets. Importantly, we provide near global optimality guarantees for our approach and show experimentally that our solution quality is better than the locally optimal ones achieved by state-of-the-art gradient-based methods. We experimentally compare our different approaches and baselines, and reveal nuanced properties of a DRO solution.

preprint2022arXiv

The Art of Manipulation: Threat of Multi-Step Manipulative Attacks in Security Games

This paper studies the problem of multi-step manipulative attacks in Stackelberg security games, in which a clever attacker attempts to orchestrate its attacks over multiple time steps to mislead the defender's learning of the attacker's behavior. This attack manipulation eventually influences the defender's patrol strategy towards the attacker's benefit. Previous work along this line of research only focuses on one-shot games in which the defender learns the attacker's behavior and then designs a corresponding strategy only once. Our work, on the other hand, investigates the long-term impact of the attacker's manipulation in which current attack and defense choices of players determine the future learning and patrol planning of the defender. This paper has three key contributions. First, we introduce a new multi-step manipulative attack game model that captures the impact of sequential manipulative attacks carried out by the attacker over the entire time horizon. Second, we propose a new algorithm to compute an optimal manipulative attack plan for the attacker, which tackles the challenge of multiple connected optimization components involved in the computation across multiple time steps. Finally, we present extensive experimental results on the impact of such misleading attacks, showing a significant benefit for the attacker and loss for the defender.

preprint2020arXiv

Generating Realistic Stock Market Order Streams

We propose an approach to generate realistic and high-fidelity stock market data based on generative adversarial networks (GANs). Our Stock-GAN model employs a conditional Wasserstein GAN to capture history dependence of orders. The generator design includes specially crafted aspects including components that approximate the market's auction mechanism, augmenting the order history with order-book constructions to improve the generation task. We perform an ablation study to verify the usefulness of aspects of our network structure. We provide a mathematical characterization of distribution learned by the generator. We also propose statistics to measure the quality of generated orders. We test our approach with synthetic and actual market data, compare to many baseline generative models, and find the generated data to be close to real data.

preprint2020arXiv

Proceedings of the Artificial Intelligence for Cyber Security (AICS) Workshop 2020

The workshop will focus on the application of artificial intelligence to problems in cyber security. AICS 2020 emphasis will be on human-machine teaming within the context of cyber security problems and will specifically explore collaboration between human operators and AI technologies. The workshop will address applicable areas of AI, such as machine learning, game theory, natural language processing, knowledge representation, automated and assistive reasoning and human machine interactions. Further, cyber security application areas with a particular emphasis on the characterization and deployment of human-machine teaming will be the focus.

preprint2016arXiv

Towards the Science of Security and Privacy in Machine Learning

Advances in machine learning (ML) in recent years have enabled a dizzying array of applications such as data analytics, autonomous systems, and security diagnostics. ML is now pervasive---new systems and models are being deployed in every domain imaginable, leading to rapid and widespread deployment of software based inference and decision making. There is growing recognition that ML exposes new vulnerabilities in software systems, yet the technical community's understanding of the nature and extent of these vulnerabilities remains limited. We systematize recent findings on ML security and privacy, focusing on attacks identified on these systems and defenses crafted to date. We articulate a comprehensive threat model for ML, and categorize attacks and defenses within an adversarial framework. Key insights resulting from works both in the ML and security communities are identified and the effectiveness of approaches are related to structural elements of ML algorithms and the data used to train them. We conclude by formally exploring the opposing relationship between model accuracy and resilience to adversarial manipulation. Through these explorations, we show that there are (possibly unavoidable) tensions between model complexity, accuracy, and resilience that must be calibrated for the environments in which they will be used.

preprint2015arXiv

Audit Games with Multiple Defender Resources

Modern organizations (e.g., hospitals, social networks, government agencies) rely heavily on audit to detect and punish insiders who inappropriately access and disclose confidential information. Recent work on audit games models the strategic interaction between an auditor with a single audit resource and auditees as a Stackelberg game, augmenting associated well-studied security games with a configurable punishment parameter. We significantly generalize this audit game model to account for multiple audit resources where each resource is restricted to audit a subset of all potential violations, thus enabling application to practical auditing scenarios. We provide an FPTAS that computes an approximately optimal solution to the resulting non-convex optimization problem. The main technical novelty is in the design and correctness proof of an optimization transformation that enables the construction of this FPTAS. In addition, we experimentally demonstrate that this transformation significantly speeds up computation of solutions for a class of audit games and security games.

preprint2015arXiv

Learning Adversary Behavior in Security Games: A PAC Model Perspective

Recent applications of Stackelberg Security Games (SSG), from wildlife crime to urban crime, have employed machine learning tools to learn and predict adversary behavior using available data about defender-adversary interactions. Given these recent developments, this paper commits to an approach of directly learning the response function of the adversary. Using the PAC model, this paper lays a firm theoretical foundation for learning in SSGs (e.g., theoretically answer questions about the numbers of samples required to learn adversary behavior) and provides utility guarantees when the learned adversary model is used to plan the defender's strategy. The paper also aims to answer practical questions such as how much more data is needed to improve an adversary model's accuracy. Additionally, we explain a recently observed phenomenon that prediction accuracy of learned adversary behavior is not enough to discover the utility maximizing defender strategy. We provide four main contributions: (1) a PAC model of learning adversary response functions in SSGs; (2) PAC-model analysis of the learning of key, existing bounded rationality models in SSGs; (3) an entirely new approach to adversary modeling based on a non-parametric class of response functions with PAC-model analysis and (4) identification of conditions under which computing the best defender strategy against the learned adversary behavior is indeed the optimal strategy. Finally, we conduct experiments with real-world data from a national park in Uganda, showing the benefit of our new adversary modeling approach and verification of our PAC model predictions.

preprint2015arXiv

Program Actions as Actual Causes: A Building Block for Accountability

Protocols for tasks such as authentication, electronic voting, and secure multiparty computation ensure desirable security properties if agents follow their prescribed programs. However, if some agents deviate from their prescribed programs and a security property is violated, it is important to hold agents accountable by determining which deviations actually caused the violation. Motivated by these applications, we initiate a formal study of program actions as actual causes. Specifically, we define in an interacting program model what it means for a set of program actions to be an actual cause of a violation. We present a sound technique for establishing program actions as actual causes. We demonstrate the value of this formalism in two ways. First, we prove that violations of a specific class of safety properties always have an actual cause. Thus, our definition applies to relevant security properties. Second, we provide a cause analysis of a representative protocol designed to address weaknesses in the current public key certification infrastructure.

preprint2015arXiv

Programs as Actual Causes: A Building Block for Accountability

An updated version of this paper is available at http://arxiv.org/abs/1505.01131

preprint2015arXiv

Security Games with Information Leakage: Modeling and Computation

Most models of Stackelberg security games assume that the attacker only knows the defender's mixed strategy, but is not able to observe (even partially) the instantiated pure strategy. Such partial observation of the deployed pure strategy -- an issue we refer to as information leakage -- is a significant concern in practical applications. While previous research on patrolling games has considered the attacker's real-time surveillance, our settings, therefore models and techniques, are fundamentally different. More specifically, after describing the information leakage model, we start with an LP formulation to compute the defender's optimal strategy in the presence of leakage. Perhaps surprisingly, we show that a key subproblem to solve this LP (more precisely, the defender oracle) is NP-hard even for the simplest of security game models. We then approach the problem from three possible directions: efficient algorithms for restricted cases, approximation algorithms, and heuristic algorithms for sampling that improves upon the status quo. Our experiments confirm the necessity of handling information leakage and the advantage of our algorithms.

preprint2013arXiv

Adaptive Regret Minimization in Bounded-Memory Games

Online learning algorithms that minimize regret provide strong guarantees in situations that involve repeatedly making decisions in an uncertain environment, e.g. a driver deciding what route to drive to work every day. While regret minimization has been extensively studied in repeated games, we study regret minimization for a richer class of games called bounded memory games. In each round of a two-player bounded memory-m game, both players simultaneously play an action, observe an outcome and receive a reward. The reward may depend on the last m outcomes as well as the actions of the players in the current round. The standard notion of regret for repeated games is no longer suitable because actions and rewards can depend on the history of play. To account for this generality, we introduce the notion of k-adaptive regret, which compares the reward obtained by playing actions prescribed by the algorithm against a hypothetical k-adaptive adversary with the reward obtained by the best expert in hindsight against the same adversary. Roughly, a hypothetical k-adaptive adversary adapts her strategy to the defender's actions exactly as the real adversary would within each window of k rounds. Our definition is parametrized by a set of experts, which can include both fixed and adaptive defender strategies. We investigate the inherent complexity of and design algorithms for adaptive regret minimization in bounded memory games of perfect and imperfect information. We prove a hardness result showing that, with imperfect information, any k-adaptive regret minimizing algorithm (with fixed strategies as experts) must be inefficient unless NP=RP even when playing against an oblivious adversary. In contrast, for bounded memory games of perfect and imperfect information we present approximate 0-adaptive regret minimization algorithms against an oblivious adversary running in time n^{O(1)}.

preprint2013arXiv

Audit Games

Effective enforcement of laws and policies requires expending resources to prevent and detect offenders, as well as appropriate punishment schemes to deter violators. In particular, enforcement of privacy laws and policies in modern organizations that hold large volumes of personal information (e.g., hospitals, banks, and Web services providers) relies heavily on internal audit mechanisms. We study economic considerations in the design of these mechanisms, focusing in particular on effective resource allocation and appropriate punishment schemes. We present an audit game model that is a natural generalization of a standard security game model for resource allocation with an additional punishment parameter. Computing the Stackelberg equilibrium for this game is challenging because it involves solving an optimization problem with non-convex quadratic constraints. We present an additive FPTAS that efficiently computes a solution that is arbitrarily close to the optimal solution.

Arunesh Sinha

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Handling Long and Richly Constrained Tasks through Constrained Hierarchical Reinforcement Learning

AI for Social Impact: Learning and Planning in the Data-to-Deployment Pipeline

Multiscale Generative Models: Improving Performance of a Generative Model Using Feedback from Other Dependent Generative Models

Proceedings of the Artificial Intelligence for Cyber Security (AICS) Workshop at AAAI 2022

Safe Delivery of Critical Services in Areas with Volatile Security Situation via a Stackelberg Game Approach

Scalable Distributional Robustness in a Class of Non Convex Optimization with Guarantees

The Art of Manipulation: Threat of Multi-Step Manipulative Attacks in Security Games

Generating Realistic Stock Market Order Streams

Proceedings of the Artificial Intelligence for Cyber Security (AICS) Workshop 2020

Towards the Science of Security and Privacy in Machine Learning

Audit Games with Multiple Defender Resources

Learning Adversary Behavior in Security Games: A PAC Model Perspective

Program Actions as Actual Causes: A Building Block for Accountability

Programs as Actual Causes: A Building Block for Accountability

Security Games with Information Leakage: Modeling and Computation

Adaptive Regret Minimization in Bounded-Memory Games

Audit Games