Researcher profile

Saptarshi Ghosh

Saptarshi Ghosh contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
16works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

16 published item(s)

preprint2025arXiv

Unraveling the Nature of HAWC J1844-034 with Fermi-LAT Data Analysis and Multi-wavelength Modeling

The extended ultra-high-energy (UHE) gamma-ray source HAWC J1844-034 is closely associated with two other sources, HAWC J1843-032 and HWC J1846-025. Moreover, other gamma-ray observatories like H.E.S.S., LHAASO, and Tibet AS$_γ$ have detected UHE gamma-ray sources whose spatial positions coincide with the position of HAWC J1844-034. The UHE gamma-ray data from several observatories help analyse the spectral features of this source in detail at TeV energies. Of the four pulsars near HAWC J1844-034, PSR J1844-0346 is closest to it and possibly supplies the cosmic-ray leptons to power this source. We have analysed the Fermi-LAT data to explore this source's morphology and identify its spectral feature in the Fermi-LAT energy band. After removing the contribution of the pulsar to the gamma-ray spectral energy distribution by pulsar phased analysis, we have obtained upper limits on the photon flux and identified the GeV counterpart PS J1844.2-0342 in the Fermi-LAT energy band with more than 5$σ$ significance, which may be a pulsar wind nebula (PWN). Finally, the multi-wavelength spectral energy distribution is modeled, assuming HAWC J1844-034 is a PWN.

preprint2023arXiv

Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets

Generating natural language questions from visual scenes, known as Visual Question Generation (VQG), has been explored in the recent past where large amounts of meticulously labeled data provide the training corpus. However, in practice, it is not uncommon to have only a few images with question annotations corresponding to a few types of answers. In this paper, we propose a new and challenging Few-Shot Visual Question Generation (FS-VQG) task and provide a comprehensive benchmark to it. Specifically, we evaluate various existing VQG approaches as well as popular few-shot solutions based on meta-learning and self-supervised strategies for the FS-VQG task. We conduct experiments on two popular existing datasets VQG and Visual7w. In addition, we have also cleaned and extended the VQG dataset for use in a few-shot scenario, with additional image-question pairs as well as additional answer categories. We call this new dataset VQG-23. Several important findings emerge from our experiments, that shed light on the limits of current models in few-shot vision and language generation tasks. We find that trivially extending existing VQG approaches with transfer learning or meta-learning may not be enough to tackle the inherent challenges in few-shot VQG. We believe that this work will contribute to accelerating the progress in few-shot learning research.

preprint2022arXiv

Alexa, in you, I trust! Fairness and Interpretability Issues in E-commerce Search through Smart Speakers

In traditional (desktop) e-commerce search, a customer issues a specific query and the system returns a ranked list of products in order of relevance to the query. An increasingly popular alternative in e-commerce search is to issue a voice-query to a smart speaker (e.g., Amazon Echo) powered by a voice assistant (VA, e.g., Alexa). In this situation, the VA usually spells out the details of only one product, an explanation citing the reason for its selection, and a default action of adding the product to the customer's cart. This reduced autonomy of the customer in the choice of a product during voice-search makes it necessary for a VA to be far more responsible and trustworthy in its explanation and default action. In this paper, we ask whether the explanation presented for a product selection by the Alexa VA installed on an Amazon Echo device is consistent with human understanding as well as with the observations on other traditional mediums (e.g., desktop ecommerce search). Through a user survey, we find that in 81% cases the interpretation of 'a top result' by the users is different from that of Alexa. While investigating for the fairness of the default action, we observe that over a set of as many as 1000 queries, in nearly 68% cases, there exist one or more products which are more relevant (as per Amazon's own desktop search results) than the product chosen by Alexa. Finally, we conducted a survey over 30 queries for which the Alexa-selected product was different from the top desktop search result, and observed that in nearly 73% cases, the participants preferred the top desktop search result as opposed to the product chosen by Alexa. Our results raise several concerns and necessitates more discussions around the related fairness and interpretability issues of VAs for e-commerce search.

preprint2022arXiv

FaiRIR: Mitigating Exposure Bias from Related Item Recommendations in Two-Sided Platforms

Related Item Recommendations (RIRs) are ubiquitous in most online platforms today, including e-commerce and content streaming sites. These recommendations not only help users compare items related to a given item, but also play a major role in bringing traffic to individual items, thus deciding the exposure that different items receive. With a growing number of people depending on such platforms to earn their livelihood, it is important to understand whether different items are receiving their desired exposure. To this end, our experiments on multiple real-world RIR datasets reveal that the existing RIR algorithms often result in very skewed exposure distribution of items, and the quality of items is not a plausible explanation for such skew in exposure. To mitigate this exposure bias, we introduce multiple flexible interventions (FaiRIR) in the RIR pipeline. We instantiate these mechanisms with two well-known algorithms for constructing related item recommendations -- rating-SVD and item2vec -- and show on real-world data that our mechanisms allow for a fine-grained control on the exposure distribution, often at a small or no cost in terms of recommendation quality, measured in terms of relatedness and user satisfaction.

preprint2021arXiv

An Unsupervised Normalization Algorithm for Noisy Text: A Case Study for Information Retrieval and Stance Detection

A large fraction of textual data available today contains various types of 'noise', such as OCR noise in digitized documents, noise due to informal writing style of users on microblogging sites, and so on. To enable tasks such as search/retrieval and classification over all the available data, we need robust algorithms for text normalization, i.e., for cleaning different kinds of noise in the text. There have been several efforts towards cleaning or normalizing noisy text; however, many of the existing text normalization methods are supervised and require language-dependent resources or large amounts of training data that is difficult to obtain. We propose an unsupervised algorithm for text normalization that does not need any training data / human intervention. The proposed algorithm is applicable to text over different languages, and can handle both machine-generated and human-generated noise. Experiments over several standard datasets show that text normalization through the proposed algorithm enables better retrieval and stance detection, as compared to that using several baseline text normalization methods. Implementation of our algorithm can be found at https://github.com/ranarag/UnsupClean.

preprint2021arXiv

Fairness for Whom? Understanding the Reader's Perception of Fairness in Text Summarization

With the surge in user-generated textual information, there has been a recent increase in the use of summarization algorithms for providing an overview of the extensive content. Traditional metrics for evaluation of these algorithms (e.g. ROUGE scores) rely on matching algorithmic summaries to human-generated ones. However, it has been shown that when the textual contents are heterogeneous, e.g., when they come from different socially salient groups, most existing summarization algorithms represent the social groups very differently compared to their distribution in the original data. To mitigate such adverse impacts, some fairness-preserving summarization algorithms have also been proposed. All of these studies have considered normative notions of fairness from the perspective of writers of the contents, neglecting the readers' perceptions of the underlying fairness notions. To bridge this gap, in this work, we study the interplay between the fairness notions and how readers perceive them in textual summaries. Through our experiments, we show that reader's perception of fairness is often context-sensitive. Moreover, standard ROUGE evaluation metrics are unable to quantify the perceived (un)fairness of the summaries. To this end, we propose a human-in-the-loop metric and an automated graph-based methodology to quantify the perceived bias in textual summaries. We demonstrate their utility by quantifying the (un)fairness of several summaries of heterogeneous socio-political microblog datasets.

preprint2021arXiv

Machine Learning assisted Chimera and Solitary states in Networks

Chimera and Solitary states have captivated scientists and engineers due to their peculiar dynamical states corresponding to the co-existence of coherent and incoherent dynamical evolution in coupled units in various natural and artificial systems. It has been further demonstrated that such states can be engineered in systems of coupled oscillators by the suitable implementation of communication delays. Here, using supervised machine learning, we predict (a) the precise value of delay which is sufficient for engineering chimera and solitary states for a given set of system parameters, as well as (b) the intensity of incoherence for such engineered states. The results are demonstrated for two different examples consisting of single layer and multi layer networks. First, the chimera states (solitary states) are engineered by establishing delays in the neighboring links of a node (the interlayer links) in a 2-D lattice (multiplex network) of oscillators. Then, different machine learning classifiers, KNN, SVM and MLP-Neural Network are employed by feeding the data obtained from the network models. Once a machine learning model is trained using a limited amount of data, it makes predictions for a given unknown systems parameter values. Testing accuracy, sensitivity, and specificity analysis reveal that MLP-NN classifier is better suited than Knn or SVM classifier for the predictions of parameters values for engineered chimera and solitary states. The technique provides an easy methodology to predict critical delay values as well as the intensity of incoherence for designing an experimental setup to create solitary and chimera states.

preprint2021arXiv

When the Umpire is also a Player: Bias in Private Label Product Recommendations on E-commerce Marketplaces

Algorithmic recommendations mediate interactions between millions of customers and products (in turn, their producers and sellers) on large e-commerce marketplaces like Amazon. In recent years, the producers and sellers have raised concerns about the fairness of black-box recommendation algorithms deployed on these marketplaces. Many complaints are centered around marketplaces biasing the algorithms to preferentially favor their own `private label' products over competitors. These concerns are exacerbated as marketplaces increasingly de-emphasize or replace `organic' recommendations with ad-driven `sponsored' recommendations, which include their own private labels. While these concerns have been covered in popular press and have spawned regulatory investigations, to our knowledge, there has not been any public audit of these marketplace algorithms. In this study, we bridge this gap by performing an end-to-end systematic audit of related item recommendations on Amazon. We propose a network-centric framework to quantify and compare the biases across organic and sponsored related item recommendations. Along a number of our proposed bias measures, we find that the sponsored recommendations are significantly more biased toward Amazon private label products compared to organic recommendations. While our findings are primarily interesting to producers and sellers on Amazon, our proposed bias measures are generally useful for measuring link formation bias in any social or content networks.

preprint2020arXiv

A Survey on Disaster: Understanding the After-effects of Super-cyclone Amphan and Helping Hand of Social Media

The super-cyclonic storm "Amphan" hit Eastern India, specifically the state of West Bengal, Odisha and parts of Bangladesh in May 2020, and caused severe damage to the regions. In this study, we aim to understand the self-reported effects of this natural disaster on residents of the state of West Bengal. To that end, we conducted an online survey to understand the effects of the cyclone. In total, 201 participants (spanning five districts) from the worst-affected state of West Bengal participated in the survey. This report describes our findings from the survey, with respect to the damages caused by the cyclone, how it affected the population in various districts of West Bengal, and how prepared the authorities were in responding to the disaster. We found that the participants were most adversely affected in this disaster due to disruption of services like electricity, phone and internet (as opposed to uprooting of trees and water-logging). Furthermore, we found that receiving responses to Amphan-related queries is highly positively correlated with the favorable perception of people about preparedness of authorities. Additionally, we study the usage of online social media by the affected population in the days immediately after the disaster. Our results strongly suggest how social media platforms can help authorities to better prepare for future disasters. In summary, our study analyzes self-reported data collected from grassroots, and brings out several key insights that can help authorities deal better with disaster events in future.

preprint2020arXiv

Hier-SPCNet: A Legal Statute Hierarchy-based Heterogeneous Network for Computing Legal Case Document Similarity

Computing similarity between two legal case documents is an important and challenging task in Legal IR, for which text-based and network-based measures have been proposed in literature. All prior network-based similarity methods considered a precedent citation network among case documents only (PCNet). However, this approach misses an important source of legal knowledge -- the hierarchy of legal statutes that are applicable in a given legal jurisdiction (e.g., country). We propose to augment the PCNet with the hierarchy of legal statutes, to form a heterogeneous network Hier-SPCNet, having citation links between case documents and statutes, as well as citation and hierarchy links among the statutes. Experiments over a set of Indian Supreme Court case documents show that our proposed heterogeneous network enables significantly better document similarity estimation, as compared to existing approaches using PCNet. We also show that the proposed network-based method can complement text-based measures for better estimation of legal document similarity.

preprint2020arXiv

Identification of Chimera using Machine Learning

Chimera state refers to coexistence of coherent and non-coherent phases in identically coupled dynamical units found in various complex dynamical systems. Identification of Chimera, on one hand is essential due to its applicability in various areas including neuroscience, and on other hand is challenging due to its widely varied appearance in different systems and the peculiar nature of its profile. Therefore, a simple yet universal method for its identification remains an open problem. Here, we present a very distinctive approach using machine learning techniques to characterize different dynamical phases and identify the chimera state from given spatial profiles generated using various different models. The experimental results show that the performance of the classification algorithms varies for different dynamical models. The machine learning algorithms, namely random forest, oblique random forest based on tikhonov, parallel-axis split and null space regularization achieved more than $96\% $ accuracy for the Kuramoto model. For the logistic-maps, random forest and tikhonov regularization based oblique random forest showed more than $90\%$ accuracy, and for the Hénon-Map model, random forest, null-space and axis-parallel split regularization based oblique random forest achieved more than $80\%$ accuracy. The oblique random forest with null space regularization achieved consistent performance (more than $83\%$ accuracy) across different dynamical models while the auto-encoder based random vector functional link neural network showed relatively lower performance. This work provides a direction for employing machine learning techniques to identify dynamical patterns arising in coupled non-linear units on large-scale, and for characterizing complex spatio-temporal patterns in real-world systems for various applications.

preprint2020arXiv

Methods for Computing Legal Document Similarity: A Comparative Study

Computing similarity between two legal documents is an important and challenging task in the domain of Legal Information Retrieval. Finding similar legal documents has many applications in downstream tasks, including prior-case retrieval, recommendation of legal articles, and so on. Prior works have proposed two broad ways of measuring similarity between legal documents - analyzing the precedent citation network, and measuring similarity based on textual content similarity measures. But there has not been a comprehensive comparison of these existing methods on a common platform. In this paper, we perform the first systematic analysis of the existing methods. In addition, we explore two promising new similarity computation methods - one text-based and the other based on network embeddings, which have not been considered till now.

preprint2020arXiv

NARMADA: Need and Available Resource Managing Assistant for Disasters and Adversities

Although a lot of research has been done on utilising Online Social Media during disasters, there exists no system for a specific task that is critical in a post-disaster scenario -- identifying resource-needs and resource-availabilities in the disaster-affected region, coupled with their subsequent matching. To this end, we present NARMADA, a semi-automated platform which leverages the crowd-sourced information from social media posts for assisting post-disaster relief coordination efforts. The system employs Natural Language Processing and Information Retrieval techniques for identifying resource-needs and resource-availabilities from microblogs, extracting resources from the posts, and also matching the needs to suitable availabilities. The system is thus capable of facilitating the judicious management of resources during post-disaster relief operations.

preprint2020arXiv

Stance Detection in Web and Social Media: A Comparative Study

Online forums and social media platforms are increasingly being used to discuss topics of varying polarities where different people take different stances. Several methodologies for automatic stance detection from text have been proposed in literature. To our knowledge, there has not been any systematic investigation towards their reproducibility, and their comparative performances. In this work, we explore the reproducibility of several existing stance detection models, including both neural models and classical classifier-based models. Through experiments on two datasets -- (i)~the popular SemEval microblog dataset, and (ii)~a set of health-related online news articles -- we also perform a detailed comparative analysis of various methods and explore their shortcomings. Implementations of all algorithms discussed in this paper are available at https://github.com/prajwal1210/Stance-Detection-in-Web-and-Social-Media.

preprint2020arXiv

Utilizing Microblogs for Assisting Post-Disaster Relief Operations via Matching Resource Needs and Availabilities

During a disaster event, two types of information that are especially useful for coordinating relief operations are needs and availabilities of resources (e.g., food, water, medicines) in the affected region. Information posted on microblogging sites is increasingly being used for assisting post-disaster relief operations. In this context, two practical challenges are (i)~to identify tweets that inform about resource needs and availabilities (termed as need-tweets and availability-tweets respectively), and (ii)~to automatically match needs with appropriate availabilities. While several works have addressed the first problem, there has been little work on automatically matching needs with availabilities. The few prior works that attempted matching only considered the resources, and no attempt has been made to understand other aspects of needs/availabilities that are essential for matching in practice. In this work, we develop a methodology for understanding five important aspects of need-tweets and availability-tweets, including what resource and what quantity is needed/available, the geographical location of the need/availability, and who needs / is providing the resource. Understanding these aspects helps us to address the need-availability matching problem considering not only the resources, but also other factors such as the geographical proximity between the need and the availability. To our knowledge, this study is the first attempt to develop methods for understanding the semantics of need-tweets and availability-tweets. We also develop a novel methodology for matching need-tweets with availability-tweets, considering both resource similarity and geographical proximity. Experiments on two datasets corresponding to two disaster events, demonstrate that our proposed methods perform substantially better matching than those in prior works.

preprint2019arXiv

Taming Chimeras in Networks through Multiplexing Delays

Chimera referring to a coexistence of coherent and incoherent states, is traditionally very difficult to control due to its peculiar nature. Here, we provide a recipe to construct chimera states in the multiplex networks with the aid of multiplexing-delays. The chimera state in multiplex networks is produced by introducing heterogeneous delays in a fraction of inter-layer links, referred as multiplexing-delay, in a sequence. Additionally, the emergence of the incoherence in the chimera state can be regulated by making appropriate choice of both inter- and intra-layer coupling strengths, whereas the extent and the position of the incoherence regime can be regulated by appropriate placing and {strength} of the multiplexing delays. The proposed technique to construct such {engineered} chimera equips us with multiplex network's structural parameters as tools in gaining both qualitative- and quantitative-control over the incoherent section of the chimera states and, in turn, the chimera. Our investigation can be of worth in controlling dynamics of multi-level delayed systems and attain desired chimeric patterns.