Source author record

Debasis Ganguly

Debasis Ganguly appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Retrieval Computation and Language Artificial Intelligence Computer Vision cs.CY Machine Learning physics.soc-ph Social and Information Networks

Catalog footprint

What is connected

8works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

RAQG-QPP: Query Performance Prediction with Retrieved Query Variants and Retrieval Augmented Query Generation

Query Performance Prediction (QPP) estimates the retrieval quality of ranking models without the use of any human-assessed relevance judgements, and finds applications in query-specific selective decision making to improve overall retrieval effectiveness. Although unsupervised QPP approaches are effective for lexical retrieval models, they usually perform weaker for neural rankers. Recent work shows that leveraging query variants (QVs), i.e., queries with potentially similar information needs to a given query, can enhance unsupervised QPP accuracy. However, existing QV-based prediction methods rely on query variants generated by term expansion of the input query, which is likely to yield incoherent, hallucinatory and off-topic QVs. In this paper, we propose to make use of queries retrieved from a log of past queries as QVs to be subsequently used for QPP. In addition to directly applying retrieved QVs in QPP, we further propose to leverage large language models (LLMs) to generate QVs conditioned on the retrieved QVs, thus mitigating the limitation of relying only on existing queries in a log. Experiments on TREC DL'19 and DL'20 show that QPP enhanced with RAQG outperform the best-performing existing QV-based prediction approach by as much as 30% on neural ranking models such as MonoT5.

preprint2022arXiv

An Analysis of Variations in the Effectiveness of Query Performance Prediction

A query performance predictor estimates the retrieval effectiveness of an IR system for a given query. An important characteristic of QPP evaluation is that, since the ground truth retrieval effectiveness for QPP evaluation can be measured with different metrics, the ground truth itself is not absolute, which is in contrast to other retrieval tasks, such as that of ad-hoc retrieval. Motivated by this argument, the objective of this paper is to investigate how such variances in the ground truth for QPP evaluation can affect the outcomes of QPP experiments. We consider this not only in terms of the absolute values of the evaluation metrics being reported (e.g. Pearson's $r$, Kendall's $τ$), but also with respect to the changes in the ranks of different QPP systems when ordered by the QPP metric scores. Our experiments reveal that the observed QPP outcomes can vary considerably, both in terms of the absolute evaluation metric values and also in terms of the relative system ranks. Through our analysis, we report the optimal combinations of QPP evaluation metric and experimental settings that are likely to lead to smaller variations in the observed results.

preprint2022arXiv

Deep-QPP: A Pairwise Interaction-based Deep Learning Model for Supervised Query Performance Prediction

Motivated by the recent success of end-to-end deep neural models for ranking tasks, we present here a supervised end-to-end neural approach for query performance prediction (QPP). In contrast to unsupervised approaches that rely on various statistics of document score distributions, our approach is entirely data-driven. Further, in contrast to weakly supervised approaches, our method also does not rely on the outputs from different QPP estimators. In particular, our model leverages information from the semantic interactions between the terms of a query and those in the top-documents retrieved with it. The architecture of the model comprises multiple layers of 2D convolution filters followed by a feed-forward layer of parameters. Experiments on standard test collections demonstrate that our proposed supervised approach outperforms other state-of-the-art supervised and unsupervised approaches.

preprint2021arXiv

TDMSci: A Specialized Corpus for Scientific Literature Entity Tagging of Tasks Datasets and Metrics

Tasks, Datasets and Evaluation Metrics are important concepts for understanding experimental scientific papers. However, most previous work on information extraction for scientific literature mainly focuses on the abstracts only, and does not treat datasets as a separate type of entity (Zadeh and Schumann, 2016; Luan et al., 2018). In this paper, we present a new corpus that contains domain expert annotations for Task (T), Dataset (D), Metric (M) entities on 2,000 sentences extracted from NLP papers. We report experiment results on TDM extraction using a simple data augmentation strategy and apply our tagger to around 30,000 NLP papers from the ACL Anthology. The corpus is made publicly available to the community for fostering research on scientific publication summarization (Erera et al., 2019) and knowledge discovery.

preprint2020arXiv

ALEX: Active Learning based Enhancement of a Model's Explainability

An active learning (AL) algorithm seeks to construct an effective classifier with a minimal number of labeled examples in a bootstrapping manner. While standard AL heuristics, such as selecting those points for annotation for which a classification model yields least confident predictions, there has been no empirical investigation to see if these heuristics lead to models that are more interpretable to humans. In the era of data-driven learning, this is an important research direction to pursue. This paper describes our work-in-progress towards developing an AL selection function that in addition to model effectiveness also seeks to improve on the interpretability of a model during the bootstrapping steps. Concretely speaking, our proposed selection function trains an `explainer' model in addition to the classifier model, and favours those instances where a different part of the data is used, on an average, to explain the predicted class. Initial experiments exhibited encouraging trends in showing that such a heuristic can lead to developing more effective and more explainable end-to-end data-driven classifiers.

preprint2020arXiv

Community Structure aware Embedding of Nodes in a Network

Detecting communities or the modular structure of real-life networks (e.g. a social network or a product purchase network) is an important task because the way a network functions is often determined by its communities. Traditional approaches to community detection involve modularity-based algorithms, which generally speaking, construct partitions based on heuristics that seek to maximize the ratio of the edges within the partitions to those between them. On the other hand, node embedding approaches represent each node in a graph as a real-valued vector and is thereby able to transform the problem of community detection in a graph to that of clustering a set of vectors. Existing node embedding approaches are primarily based on, first, initiating random walks from each node to construct a context of a node, and then make the vector representation of a node close to its context. However, standard node embedding approaches do not directly take into account the community structure of a network while constructing the context around each node. To alleviate this, we explore two different threads of work. First, we investigate the use of maximum entropy-based random walks to obtain more centrality preserving embedding of nodes, which may lead to more effective clusters in the embedded space. Second, we propose a community structure-aware node embedding approach, where we incorporate modularity-based partitioning heuristics into the objective function of node embedding. We demonstrate that our proposed combination of the combinatorial and the embedding approaches for community detection outperforms a number of modularity-based baselines and K-means clustering on a standard node-embedded (node2vec) vector space on a wide range of real-life and synthetic networks of different sizes and densities.

preprint2020arXiv

Towards Socially Responsible AI: Cognitive Bias-Aware Multi-Objective Learning

Human society had a long history of suffering from cognitive biases leading to social prejudices and mass injustice. The prevalent existence of cognitive biases in large volumes of historical data can pose a threat of being manifested as unethical and seemingly inhuman predictions as outputs of AI systems trained on such data. To alleviate this problem, we propose a bias-aware multi-objective learning framework that given a set of identity attributes (e.g. gender, ethnicity etc.) and a subset of sensitive categories of the possible classes of prediction outputs, learns to reduce the frequency of predicting certain combinations of them, e.g. predicting stereotypes such as `most blacks use abusive language', or `fear is a virtue of women'. Our experiments conducted on an emotion prediction task with balanced class priors shows that a set of baseline bias-agnostic models exhibit cognitive biases with respect to gender, such as women are prone to be afraid whereas men are more prone to be angry. In contrast, our proposed bias-aware multi-objective learning methodology is shown to reduce such biases in the predictied emotions.

preprint2016arXiv

Representing Documents and Queries as Sets of Word Embedded Vectors for Information Retrieval

A major difficulty in applying word vector embeddings in IR is in devising an effective and efficient strategy for obtaining representations of compound units of text, such as whole documents, (in comparison to the atomic words), for the purpose of indexing and scoring documents. Instead of striving for a suitable method for obtaining a single vector representation of a large document of text, we rather aim for developing a similarity metric that makes use of the similarities between the individual embedded word vectors in a document and a query. More specifically, we represent a document and a query as sets of word vectors, and use a standard notion of similarity measure between these sets, computed as a function of the similarities between each constituent word pair from these sets. We then make use of this similarity measure in combination with standard IR based similarities for document ranking. The results of our initial experimental investigations shows that our proposed method improves MAP by up to $5.77\%$, in comparison to standard text-based language model similarity, on the TREC ad-hoc dataset.

Debasis Ganguly

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

RAQG-QPP: Query Performance Prediction with Retrieved Query Variants and Retrieval Augmented Query Generation

An Analysis of Variations in the Effectiveness of Query Performance Prediction

Deep-QPP: A Pairwise Interaction-based Deep Learning Model for Supervised Query Performance Prediction

TDMSci: A Specialized Corpus for Scientific Literature Entity Tagging of Tasks Datasets and Metrics

ALEX: Active Learning based Enhancement of a Model's Explainability

Community Structure aware Embedding of Nodes in a Network

Towards Socially Responsible AI: Cognitive Bias-Aware Multi-Objective Learning

Representing Documents and Queries as Sets of Word Embedded Vectors for Information Retrieval