Source author record

Kartik Talamadupula

Kartik Talamadupula appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language Human-Computer Interaction Machine Learning Software Engineering Information Retrieval Multiagent Systems Neural and Evolutionary Computing

Catalog footprint

What is connected

12works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Better Together? An Evaluation of AI-Supported Code Translation

Generative machine learning models have recently been applied to source code, for use cases including translating code between programming languages, creating documentation from code, and auto-completing methods. Yet, state-of-the-art models often produce code that is erroneous or incomplete. In a controlled study with 32 software engineers, we examined whether such imperfect outputs are helpful in the context of Java-to-Python code translation. When aided by the outputs of a code translation model, participants produced code with fewer errors than when working alone. We also examined how the quality and quantity of AI translations affected the work process and quality of outcomes, and observed that providing multiple translations had a larger impact on the translation process than varying the quality of provided translations. Our results tell a complex, nuanced story about the benefits of generative code models and the challenges software engineers face when working with their outputs. Our work motivates the need for intelligent user interfaces that help software engineers effectively work with generative code models in order to understand and evaluate their outputs and achieve superior outcomes to working alone.

preprint2022arXiv

Investigating Explainability of Generative AI for Code through Scenario-based Design

What does it mean for a generative AI model to be explainable? The emergent discipline of explainable AI (XAI) has made great strides in helping people understand discriminative models. Less attention has been paid to generative models that produce artifacts, rather than decisions, as output. Meanwhile, generative AI (GenAI) technologies are maturing and being applied to application domains such as software engineering. Using scenario-based design and question-driven XAI design approaches, we explore users' explainability needs for GenAI in three software engineering use cases: natural language to code, code translation, and code auto-completion. We conducted 9 workshops with 43 software engineers in which real examples from state-of-the-art generative AI models were used to elicit users' explainability needs. Drawing from prior work, we also propose 4 types of XAI features for GenAI for code and gathered additional design ideas from participants. Our work explores explainability needs for GenAI for code and demonstrates how human-centered approaches can drive the technical development of XAI in novel domains.

preprint2021arXiv

Type-augmented Relation Prediction in Knowledge Graphs

Knowledge graphs (KGs) are of great importance to many real world applications, but they generally suffer from incomplete information in the form of missing relations between entities. Knowledge graph completion (also known as relation prediction) is the task of inferring missing facts given existing ones. Most of the existing work is proposed by maximizing the likelihood of observed instance-level triples. Not much attention, however, is paid to the ontological information, such as type information of entities and relations. In this work, we propose a type-augmented relation prediction (TaRP) method, where we apply both the type information and instance-level information for relation prediction. In particular, type information and instance-level information are encoded as prior probabilities and likelihoods of relations respectively, and are combined by following Bayes' rule. Our proposed TaRP method achieves significantly better performance than state-of-the-art methods on four benchmark datasets: FB15K, FB15K-237, YAGO26K-906, and DB111K-174. In addition, we show that TaRP achieves significantly improved data efficiency. More importantly, the type information extracted from a specific dataset can generalize well to other datasets through the proposed TaRP model.

preprint2020arXiv

Enhancing Text-based Reinforcement Learning Agents with Commonsense Knowledge

In this paper, we consider the recent trend of evaluating progress on reinforcement learning technology by using text-based environments and games as evaluation environments. This reliance on text brings advances in natural language processing into the ambit of these agents, with a recurring thread being the use of external knowledge to mimic and better human-level performance. We present one such instantiation of agents that use commonsense knowledge from ConceptNet to show promising performance on two text-based environments.

preprint2020arXiv

Looking Beyond Sentence-Level Natural Language Inference for Downstream Tasks

In recent years, the Natural Language Inference (NLI) task has garnered significant attention, with new datasets and models achieving near human-level performance on it. However, the full promise of NLI -- particularly that it learns knowledge that should be generalizable to other downstream NLP tasks -- has not been realized. In this paper, we study this unfulfilled promise from the lens of two downstream tasks: question answering (QA), and text summarization. We conjecture that a key difference between the NLI datasets and these downstream tasks concerns the length of the premise; and that creating new long premise NLI datasets out of existing QA datasets is a promising avenue for training a truly generalizable NLI model. We validate our conjecture by showing competitive results on the task of QA and obtaining the best reported results on the task of Checking Factual Correctness of Summaries.

preprint2020arXiv

Path-Based Contextualization of Knowledge Graphs for Textual Entailment

In this paper, we introduce the problem of knowledge graph contextualization -- that is, given a specific NLP task, the problem of extracting meaningful and relevant sub-graphs from a given knowledge graph. The task in the case of this paper is the textual entailment problem, and the context is a relevant sub-graph for an instance of the textual entailment problem -- where given two sentences p and h, the entailment relationship between them has to be predicted automatically. We base our methodology on finding paths in a cost-customized external knowledge graph, and building the most relevant sub-graph that connects p and h. We show that our path selection mechanism to generate sub-graphs not only reduces noise, but also retrieves meaningful information from large knowledge graphs. Our evaluation shows that using information on entities as well as the relationships between them improves on the performance of purely text-based systems.

preprint2020arXiv

Project CLAI: Instrumenting the Command Line as a New Environment for AI Agents

This whitepaper reports on Project CLAI (Command Line AI), which aims to bring the power of AI to the command line interface (CLI). The CLAI platform sets up the CLI as a new environment for AI researchers to conquer by surfacing the command line as a generic environment that researchers can interface to using a simple sense-act API, much like the traditional AI agent architecture. In this paper, we discuss the design and implementation of the platform in detail, through illustrative use cases of new end user interaction patterns enabled by this design, and through quantitative evaluation of the system footprint of a CLAI-enabled terminal. We also report on some early user feedback on CLAI's features from an internal survey.

preprint2016arXiv

Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation

We introduce the multiresolution recurrent neural network, which extends the sequence-to-sequence framework to model natural language generation as two parallel discrete stochastic processes: a sequence of high-level coarse tokens, and a sequence of natural language tokens. There are many ways to estimate or learn the high-level coarse tokens, but we argue that a simple extraction procedure is sufficient to capture a wealth of high-level discourse semantics. Such procedure allows training the multiresolution recurrent neural network by maximizing the exact joint log-likelihood over both sequences. In contrast to the standard log- likelihood objective w.r.t. natural language tokens (word perplexity), optimizing the joint log-likelihood biases the model towards modeling high-level abstractions. We apply the proposed model to the task of dialogue response generation in two challenging domains: the Ubuntu technical support domain, and Twitter conversations. On Ubuntu, the model outperforms competing approaches by a substantial margin, achieving state-of-the-art results according to both automatic evaluation metrics and a human evaluation study. On Twitter, the model appears to generate more relevant and on-topic responses according to automatic evaluation metrics. Finally, our experiments demonstrate that the proposed model is more adept at overcoming the sparsity of natural language and is better able to capture long-term structure.

preprint2014arXiv

The Metrics Matter! On the Incompatibility of Different Flavors of Replanning

When autonomous agents are executing in the real world, the state of the world as well as the objectives of the agent may change from the agent's original model. In such cases, the agent's planning process must modify the plan under execution to make it amenable to the new conditions, and to resume execution. This brings up the replanning problem, and the various techniques that have been proposed to solve it. In all, three main techniques -- based on three different metrics -- have been proposed in prior automated planning work. An open question is whether these metrics are interchangeable; answering this requires a normalized comparison of the various replanning quality metrics. In this paper, we show that it is possible to support such a comparison by compiling all the respective techniques into a single substrate. Using this novel compilation, we demonstrate that these different metrics are not interchangeable, and that they are not good surrogates for each other. Thus we focus attention on the incompatibility of the various replanning flavors with each other, founded in the differences between the metrics that they respectively seek to optimize.

preprint2013arXiv

Herding the Crowd: Automated Planning for Crowdsourced Planning

There has been significant interest in crowdsourcing and human computation. One subclass of human computation applications are those directed at tasks that involve planning (e.g. travel planning) and scheduling (e.g. conference scheduling). Much of this work appears outside the traditional automated planning forums, and at the outset it is not clear whether automated planning has much of a role to play in these human computation systems. Interestingly however, work on these systems shows that even primitive forms of automated oversight of the human planner does help in significantly improving the effectiveness of the humans/crowd. In this paper, we will argue that the automated oversight used in these systems can be viewed as a primitive automated planner, and that there are several opportunities for more sophisticated automated planning in effectively steering crowdsourced planning. Straightforward adaptation of current planning technology is however hampered by the mismatch between the capabilities of human workers and automated planners. We identify two important challenges that need to be overcome before such adaptation of planning technology can occur: (i) interpreting the inputs of the human workers (and the requester) and (ii) steering or critiquing the plans being produced by the human workers armed only with incomplete domain and preference models. In this paper, we discuss approaches for handling these challenges, and characterize existing human computation systems in terms of the specific choices they make in handling these challenges.

preprint2013arXiv

RAProp: Ranking Tweets by Exploiting the Tweet/User/Web Ecosystem and Inter-Tweet Agreement

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweets' content alone. We present a novel ranking method, called RAProp, which combines two orthogonal measures of relevance and trustworthiness of a tweet. The first, called Feature Score, measures the trustworthiness of the source of the tweet. This is done by extracting features from a 3-layer twitter ecosystem, consisting of users, tweets and the pages referred to in the tweets. The second measure, called agreement analysis, estimates the trustworthiness of the content of the tweet, by analyzing how and whether the content is independently corroborated by other tweets. We view the candidate result set of tweets as the vertices of a graph, with the edges measuring the estimated agreement between each pair of tweets. The feature score is propagated over this agreement graph to compute the top-k tweets that have both trustworthy sources and independent corroboration. The evaluation of our method on 16 million tweets from the TREC 2011 Microblog Dataset shows that for top-30 precision we achieve 53% higher than current best performing method on the Dataset and over 300% over current Twitter Search. We also present a detailed internal empirical evaluation of RAProp in comparison to several alternative approaches proposed by us.

preprint2013arXiv

Strategic Planning for Network Data Analysis

As network traffic monitoring software for cybersecurity, malware detection, and other critical tasks becomes increasingly automated, the rate of alerts and supporting data gathered, as well as the complexity of the underlying model, regularly exceed human processing capabilities. Many of these applications require complex models and constituent rules in order to come up with decisions that influence the operation of entire systems. In this paper, we motivate the novel "strategic planning" problem -- one of gathering data from the world and applying the underlying model of the domain in order to come up with decisions that will monitor the system in an automated manner. We describe our use of automated planning methods to this problem, including the technique that we used to solve it in a manner that would scale to the demands of a real-time, real world scenario. We then present a PDDL model of one such application scenario related to network administration and monitoring, followed by a description of a novel integrated system that was built to accept generated plans and to continue the execution process. Finally, we present evaluations of two different automated planners and their different capabilities with our integrated system, both on a six-month window of network data, and using a simulator.

Kartik Talamadupula

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Better Together? An Evaluation of AI-Supported Code Translation

Investigating Explainability of Generative AI for Code through Scenario-based Design

Type-augmented Relation Prediction in Knowledge Graphs

Enhancing Text-based Reinforcement Learning Agents with Commonsense Knowledge

Looking Beyond Sentence-Level Natural Language Inference for Downstream Tasks

Path-Based Contextualization of Knowledge Graphs for Textual Entailment

Project CLAI: Instrumenting the Command Line as a New Environment for AI Agents

Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation

The Metrics Matter! On the Incompatibility of Different Flavors of Replanning

Herding the Crowd: Automated Planning for Crowdsourced Planning

RAProp: Ranking Tweets by Exploiting the Tweet/User/Web Ecosystem and Inter-Tweet Agreement

Strategic Planning for Network Data Analysis