Source author record

Sumit Bhatia

Sumit Bhatia appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Artificial Intelligence Information Retrieval Social and Information Networks cs.CY Databases Machine Learning Multimedia

Catalog footprint

What is connected

7works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

CoSe-Co: Text Conditioned Generative CommonSense Contextualizer

Pre-trained Language Models (PTLMs) have been shown to perform well on natural language tasks. Many prior works have leveraged structured commonsense present in the form of entities linked through labeled relations in Knowledge Graphs (KGs) to assist PTLMs. Retrieval approaches use KG as a separate static module which limits coverage since KGs contain finite knowledge. Generative methods train PTLMs on KG triples to improve the scale at which knowledge can be obtained. However, training on symbolic KG entities limits their applicability in tasks involving natural language text where they ignore overall context. To mitigate this, we propose a CommonSense Contextualizer (CoSe-Co) conditioned on sentences as input to make it generically usable in tasks for generating knowledge relevant to the overall context of input text. To train CoSe-Co, we propose a novel dataset comprising of sentence and commonsense knowledge pairs. The knowledge inferred by CoSe-Co is diverse and contain novel entities not present in the underlying KG. We augment generated knowledge in Multi-Choice QA and Open-ended CommonSense Reasoning tasks leading to improvements over current best methods on CSQA, ARC, QASC and OBQA datasets. We also demonstrate its applicability in improving performance of a baseline model for paraphrase generation task.

preprint2022arXiv

Expressive Reasoning Graph Store: A Unified Framework for Managing RDF and Property Graph Databases

Resource Description Framework (RDF) and Property Graph (PG) are the two most commonly used data models for representing, storing, and querying graph data. We present Expressive Reasoning Graph Store (ERGS) -- a graph store built on top of JanusGraph (a Property Graph store) that also allows storing and querying of RDF datasets. First, we describe how RDF data can be translated into a Property Graph representation and then describe a query translation module that converts SPARQL queries into a series of Gremlin traversals. The converters and translators thus developed can allow any Apache Tinkerpop compliant graph database to store and query RDF datasets. We demonstrate the effectiveness of our proposed approach using JanusGraph as the base Property Graph store and compare its performance with standard RDF systems.

preprint2022arXiv

LM-CORE: Language Models with Contextually Relevant External Knowledge

Large transformer-based pre-trained language models have achieved impressive performance on a variety of knowledge-intensive tasks and can capture factual knowledge in their parameters. We argue that storing large amounts of knowledge in the model parameters is sub-optimal given the ever-growing amounts of knowledge and resource requirements. We posit that a more efficient alternative is to provide explicit access to contextually relevant structured knowledge to the model and train it to use that knowledge. We present LM-CORE -- a general framework to achieve this -- that allows \textit{decoupling} of the language model training from the external knowledge source and allows the latter to be updated without affecting the already trained model. Experimental results show that LM-CORE, having access to external knowledge, achieves significant and robust outperformance over state-of-the-art knowledge-enhanced language models on knowledge probing tasks; can effectively handle knowledge updates; and performs well on two downstream tasks. We also present a thorough error analysis highlighting the successes and failures of LM-CORE.

preprint2022arXiv

Why Did You Not Compare With That? Identifying Papers for Use as Baselines

We propose the task of automatically identifying papers used as baselines in a scientific article. We frame the problem as a binary classification task where all the references in a paper are to be classified as either baselines or non-baselines. This is a challenging problem due to the numerous ways in which a baseline reference can appear in a paper. We develop a dataset of $2,075$ papers from ACL anthology corpus with all their references manually annotated as one of the two classes. We develop a multi-module attention-based neural classifier for the baseline classification task that outperforms four state-of-the-art citation role classification methods when applied to the baseline classification task. We also present an analysis of the errors made by the proposed classifier, eliciting the challenges that make baseline identification a challenging problem.

preprint2020arXiv

ECIR 2020 Workshops: Assessing the Impact of Going Online

ECIR 2020 https://ecir2020.org/ was one of the many conferences affected by the COVID-19 pandemic. The Conference Chairs decided to keep the initially planned dates (April 14-17, 2020) and move to a fully online event. In this report, we describe the experience of organizing the ECIR 2020 Workshops in this scenario from two perspectives: the workshop organizers and the workshop participants. We provide a report on the organizational aspect of these events and the consequences for participants. Covering the scientific dimension of each workshop is outside the scope of this article.

preprint2020arXiv

Link Prediction using Graph Neural Networks for Master Data Management

Learning graph representations of n-ary relational data has a number of real world applications like anti-money laundering, fraud detection, and customer due diligence. Contact tracing of COVID19 positive persons could also be posed as a Link Prediction problem. Predicting links between people using Graph Neural Networks requires careful ethical and privacy considerations than in domains where GNNs have typically been applied so far. We introduce novel methods for anonymizing data, model training, explainability and verification for Link Prediction in Master Data Management, and discuss our results.

preprint2015arXiv

A Picture Tells a Thousand Words -- About You! User Interest Profiling from User Generated Visual Content

Inference of online social network users' attributes and interests has been an active research topic. Accurate identification of users' attributes and interests is crucial for improving the performance of personalization and recommender systems. Most of the existing works have focused on textual content generated by the users and have successfully used it for predicting users' interests and other identifying attributes. However, little attention has been paid to user generated visual content (images) that is becoming increasingly popular and pervasive in recent times. We posit that images posted by users on online social networks are a reflection of topics they are interested in and propose an approach to infer user attributes from images posted by them. We analyze the content of individual images and then aggregate the image-level knowledge to infer user-level interest distribution. We employ image-level similarity to propagate the label information between images, as well as utilize the image category information derived from the user created organization structure to further propagate the category-level knowledge for all images. A real life social network dataset created from Pinterest is used for evaluation and the experimental results demonstrate the effectiveness of our proposed approach.

Sumit Bhatia

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

CoSe-Co: Text Conditioned Generative CommonSense Contextualizer

Expressive Reasoning Graph Store: A Unified Framework for Managing RDF and Property Graph Databases

LM-CORE: Language Models with Contextually Relevant External Knowledge

Why Did You Not Compare With That? Identifying Papers for Use as Baselines

ECIR 2020 Workshops: Assessing the Impact of Going Online

Link Prediction using Graph Neural Networks for Master Data Management

A Picture Tells a Thousand Words -- About You! User Interest Profiling from User Generated Visual Content