Researcher profile

Subbarao Kambhampati

Subbarao Kambhampati contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Who is Helping Whom? Analyzing Inter-dependencies to Evaluate Cooperation in Human-AI Teaming

State-of-the-art methods for Human-AI Teaming and Zero-shot Cooperation focus on task completion, i.e., task rewards, as the sole evaluation metric while being agnostic to how the two agents work with each other. Furthermore, subjective user studies only offer limited insight into the quality of cooperation existing within the team. Specifically, we are interested in understanding the cooperative behaviors arising within the team when trained agents are paired with humans -- a problem that has been overlooked by the existing literature. To formally address this problem, we propose the concept of constructive interdependence -- measuring how much agents rely on each other's actions to achieve the shared goal -- as a key metric for evaluating cooperation in human-agent teams. We interpret interdependence in terms of action interactions in a STRIPS formalism, and define metrics that allow us to assess the degree of reliance between the agents' actions. We pair state-of-the-art agents HAT with learned human models as well as human participants in a user study for the popular Overcooked domain, and evaluate the task reward and teaming performance for these human-agent teams. Our results demonstrate that although trained agents attain high task rewards, they fail to induce cooperative behavior, showing very low levels of interdependence across teams. Furthermore, our analysis reveals that teaming performance is not necessarily correlated with task reward, highlighting that task reward alone cannot reliably measure cooperation arising in a team.

preprint2022arXiv

Bridging the Gap: Providing Post-Hoc Symbolic Explanations for Sequential Decision-Making Problems with Inscrutable Representations

As increasingly complex AI systems are introduced into our daily lives, it becomes important for such systems to be capable of explaining the rationale for their decisions and allowing users to contest these decisions. A significant hurdle to allowing for such explanatory dialogue could be the vocabulary mismatch between the user and the AI system. This paper introduces methods for providing contrastive explanations in terms of user-specified concepts for sequential decision-making settings where the system's model of the task may be best represented as an inscrutable model. We do this by building partial symbolic models of a local approximation of the task that can be leveraged to answer the user queries. We test these methods on a popular Atari game (Montezuma's Revenge) and variants of Sokoban (a well-known planning benchmark) and report the results of user studies to evaluate whether people find explanations generated in this form useful.

preprint2022arXiv

Computing Policies That Account For The Effects Of Human Agent Uncertainty During Execution In Markov Decision Processes

When humans are given a policy to execute, there can be policy execution errors and deviations in policy if there is uncertainty in identifying a state. This can happen due to the human agent's cognitive limitations and/or perceptual errors. So an algorithm that computes a policy for a human to execute ought to consider these effects in its computations. An optimal Markov Decision Process (MDP) policy that is poorly executed (because of a human agent) maybe much worse than another policy that is suboptimal in the MDP, but considers the human-agent's execution behavior. In this paper we consider two problems that arise from state uncertainty; these are erroneous state-inference, and extra-sensing actions that a person might take as a result of their uncertainty. We present a framework to model the human agent's behavior with respect to state uncertainty, and can be used to compute MDP policies that accounts for these problems. This is followed by a hill climbing algorithm to search for good policies given our model of the human agent. We also present a branch and bound algorithm which can find the optimal policy for such problems. We show experimental results in a Gridworld domain, and warehouse-worker domain. Finally, we present human-subject studies that support our human model assumptions.

preprint2022arXiv

Inference of Human's Observation Strategy for Monitoring Robot's Behavior based on a Game-Theoretic Model of Trust

We consider scenarios where a worker robot, who may be unaware of the human's exact expectations, may have the incentive to deviate from a preferred plan (e.g. safe but costly) when a human supervisor is not monitoring it. On the other hand, continuous monitoring of the robot's behavior is often difficult for humans because it costs them valuable resources (e.g., time, cognitive overload, etc.). Thus, to optimize the cost of monitoring while ensuring the robots follow the {\em safe} behavior and to assist the human to deal with the possible unsafe robots, we model this problem in a game-theoretic framework of trust. In settings where the human does not initially trust the robot, pure-strategy Nash Equilibrium provides a useful policy for the human. Unfortunately, we show the formulated game often lacks a pure strategy Nash equilibrium. Thus, we define the concept of a trust boundary over the mixed strategy space of the human and show that it helps to discover optimal monitoring strategies. We conduct humans subject studies that demonstrate (1) the need for coming up with optimal monitoring strategies, and (2) the benefits of using strategies suggested by our approach.

preprint2022arXiv

Leveraging Approximate Symbolic Models for Reinforcement Learning via Skill Diversity

Creating reinforcement learning (RL) agents that are capable of accepting and leveraging task-specific knowledge from humans has been long identified as a possible strategy for developing scalable approaches for solving long-horizon problems. While previous works have looked at the possibility of using symbolic models along with RL approaches, they tend to assume that the high-level action models are executable at low level and the fluents can exclusively characterize all desirable MDP states. Symbolic models of real world tasks are however often incomplete. To this end, we introduce Approximate Symbolic-Model Guided Reinforcement Learning, wherein we will formalize the relationship between the symbolic model and the underlying MDP that will allow us to characterize the incompleteness of the symbolic model. We will use these models to extract high-level landmarks that will be used to decompose the task. At the low level, we learn a set of diverse policies for each possible task subgoal identified by the landmark, which are then stitched together. We evaluate our system by testing on three different benchmark domains and show how even with incomplete symbolic model information, our approach is able to discover the task structure and efficiently guide the RL agent towards the goal.

preprint2022arXiv

Minimizing Robot Navigation-Graph For Position-Based Predictability By Humans

In situations where humans and robots are moving in the same space whilst performing their own tasks, predictable paths taken by mobile robots can not only make the environment feel safer, but humans can also help with the navigation in the space by avoiding path conflicts or not blocking the way. So predictable paths become vital. The cognitive effort for the human to predict the robot's path becomes untenable as the number of robots increases. As the number of humans increase, it also makes it harder for the robots to move while considering the motion of multiple humans. Additionally, if new people are entering the space -- like in restaurants, banks, and hospitals -- they would have less familiarity with the trajectories typically taken by the robots; this further increases the needs for predictable robot motion along paths. With this in mind, we propose to minimize the navigation-graph of the robot for position-based predictability, which is predictability from just the current position of the robot. This is important since the human cannot be expected to keep track of the goals and prior actions of the robot in addition to doing their own tasks. In this paper, we define measures for position-based predictability, then present and evaluate a hill-climbing algorithm to minimize the navigation-graph (directed graph) of robot motion. This is followed by the results of our human-subject experiments which support our proposed methodology.

preprint2022arXiv

RADAR-X: An Interactive Mixed Initiative Planning Interface Pairing Contrastive Explanations and Revised Plan Suggestions

Decision support systems seek to enable informed decision-making. In the recent years, automated planning techniques have been leveraged to empower such systems to better aid the human-in-the-loop. The central idea for such decision support systems is to augment the capabilities of the human-in-the-loop with automated planning techniques and enhance the quality of decision-making. In addition to providing planning support, effective decision support systems must be able to provide intuitive explanations based on specific user queries for proposed decisions to its end users. Using this as motivation, we present our decision support system RADAR-X that showcases the ability to engage the user in an interactive explanatory dialogue by first enabling them to specify an alternative to a proposed decision (which we refer to as foils), and then providing contrastive explanations to these user-specified foils which helps the user understand why a specific plan was chosen over the alternative (or foil). Furthermore, the system uses this dialogue to elicit the user's latent preferences and provides revised plan suggestions through three different interaction strategies.

preprint2020arXiv

A Survey of Moving Target Defenses for Network Security

Network defenses based on traditional tools, techniques, and procedures fail to account for the attacker's inherent advantage present due to the static nature of network services and configurations. To take away this asymmetric advantage, Moving Target Defense (MTD) continuously shifts the configuration of the underlying system, in turn reducing the success rate of cyberattacks. In this survey, we analyze the recent advancements made in the development of MTDs and define categorizations that capture the key aspects of such defenses. We first categorize these defenses into different sub-classes depending on what they move, when they move and how they move. In trying to answer the latter question, we showcase the use of domain knowledge and game-theoretic modeling can help the defender come up with effective and efficient movement strategies. Second, to understand the practicality of these defense methods, we discuss how various MTDs have been implemented and find that networking technologies such as Software Defined Networking and Network Function Virtualization act as key enablers for implementing these dynamic defenses. We then briefly highlight MTD test-beds and case-studies to aid readers who want to examine or deploy existing MTD techniques. Third, our survey categorizes proposed MTDs based on the qualitative and quantitative metrics they utilize to evaluate their effectiveness in terms of security and performance. We use well-defined metrics such as risk analysis and performance costs for qualitative evaluation and metrics based on Confidentiality, Integrity, Availability (CIA), attack representation, QoS impact, and targeted threat models for quantitative evaluation. Finally, we show that our categorization of MTDs is effective in identifying novel research areas and highlight directions for future research.

preprint2020arXiv

Designing Environments Conducive to Interpretable Robot Behavior

Designing robots capable of generating interpretable behavior is a prerequisite for achieving effective human-robot collaboration. This means that the robots need to be capable of generating behavior that aligns with human expectations and, when required, provide explanations to the humans in the loop. However, exhibiting such behavior in arbitrary environments could be quite expensive for robots, and in some cases, the robot may not even be able to exhibit the expected behavior. Given structured environments (like warehouses and restaurants), it may be possible to design the environment so as to boost the interpretability of the robot's behavior or to shape the human's expectations of the robot's behavior. In this paper, we investigate the opportunities and limitations of environment design as a tool to promote a type of interpretable behavior -- known in the literature as explicable behavior. We formulate a novel environment design framework that considers design over multiple tasks and over a time horizon. In addition, we explore the longitudinal aspect of explicable behavior and the trade-off that arises between the cost of design and the cost of generating explicable behavior over a time horizon.

preprint2020arXiv

Multi-agent Reinforcement Learning in Bayesian Stackelberg Markov Games for Adaptive Moving Target Defense

The field of cybersecurity has mostly been a cat-and-mouse game with the discovery of new attacks leading the way. To take away an attacker's advantage of reconnaissance, researchers have proposed proactive defense methods such as Moving Target Defense (MTD). To find good movement strategies, researchers have modeled MTD as leader-follower games between the defender and a cyber-adversary. We argue that existing models are inadequate in sequential settings when there is incomplete information about a rational adversary and yield sub-optimal movement strategies. Further, while there exists an array of work on learning defense policies in sequential settings for cyber-security, they are either unpopular due to scalability issues arising out of incomplete information or tend to ignore the strategic nature of the adversary simplifying the scenario to use single-agent reinforcement learning techniques. To address these concerns, we propose (1) a unifying game-theoretic model, called the Bayesian Stackelberg Markov Games (BSMGs), that can model uncertainty over attacker types and the nuances of an MTD system and (2) a Bayesian Strong Stackelberg Q-learning (BSS-Q) approach that can, via interaction, learn the optimal movement policy for BSMGs within a reasonable time. We situate BSMGs in the landscape of incomplete-information Markov games and characterize the notion of Strong Stackelberg Equilibrium (SSE) in them. We show that our learning approach converges to an SSE of a BSMG and then highlight that the learned movement policy (1) improves the state-of-the-art in MTD for web-application security and (2) converges to an optimal policy in MTD domains with incomplete information about adversaries even when prior information about rewards and transitions is absent.

preprint2020arXiv

Signaling Friends and Head-Faking Enemies Simultaneously: Balancing Goal Obfuscation and Goal Legibility

In order to be useful in the real world, AI agents need to plan and act in the presence of others, who may include adversarial and cooperative entities. In this paper, we consider the problem where an autonomous agent needs to act in a manner that clarifies its objectives to cooperative entities while preventing adversarial entities from inferring those objectives. We show that this problem is solvable when cooperative entities and adversarial entities use different types of sensors and/or prior knowledge. We develop two new solution approaches for computing such plans. One approach provides an optimal solution to the problem by using an IP solver to provide maximum obfuscation for adversarial entities while providing maximum legibility for cooperative entities in the environment, whereas the other approach provides a satisficing solution using heuristic-guided forward search to achieve preset levels of obfuscation and legibility for adversarial and cooperative entities respectively. We show the feasibility and utility of our algorithms through extensive empirical evaluation on problems derived from planning benchmarks.

preprint2020arXiv

The Emerging Landscape of Explainable AI Planning and Decision Making

In this paper, we provide a comprehensive outline of the different threads of work in Explainable AI Planning (XAIP) that has emerged as a focus area in the last couple of years and contrast that with earlier efforts in the field in terms of techniques, target users, and delivery mechanisms. We hope that the survey will provide guidance to new researchers in automated planning towards the role of explanations in the effective design of human-in-the-loop systems, as well as provide the established researcher with some perspective on the evolution of the exciting world of explainable planning.

preprint2011arXiv

SmartInt: Using Mined Attribute Dependencies to Integrate Fragmented Web Databases

Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple sources. At first blush this is just the inverse of traditional database normalization problem - rather than go from a universal relation to normalized tables, we want to reconstruct the universal relation given the tables (sources). The standard way of reconstructing the entities will involve joining the tables. Unfortunately, because of the autonomous and decentralized way in which the sources are populated, they often do not have Primary Key - Foreign Key relations. While tables may share attributes, naive joins over these shared attributes can result in reconstruction of many spurious entities thus seriously compromising precision. Our system, \smartint\ is aimed at addressing the problem of data integration in such scenarios. Given a query, our system uses the Approximate Functional Dependencies (AFDs) to piece together a tree of relevant tables to answer it. The result tuples produced by our system are able to strike a favorable balance between precision and recall.

preprint2010arXiv

Defining and Mining Functional Dependencies in Probabilistic Databases

Functional dependencies -- traditional, approximate and conditional are of critical importance in relational databases, as they inform us about the relationships between attributes. They are useful in schema normalization, data rectification and source selection. Most of these were however developed in the context of deterministic data. Although uncertain databases have started receiving attention, these dependencies have not been defined for them, nor are fast algorithms available to evaluate their confidences. This paper defines the logical extensions of various forms of functional dependencies for probabilistic databases and explores the connections between them. We propose a pruning-based exact algorithm to evaluate the confidence of functional dependencies, a Monte-Carlo based algorithm to evaluate the confidence of approximate functional dependencies and algorithms for their conditional counterparts in probabilistic databases. Experiments are performed on both synthetic and real data evaluating the performance of these algorithms in assessing the confidence of dependencies and mining them from data. We believe that having these dependencies and algorithms available for probabilistic databases will drive adoption of probabilistic data storage in the industry.