Researcher profile

Abhinaba Roy

Abhinaba Roy contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2026arXiv

KARMA-MV: A Benchmark for Causal Question Answering on Music Videos

While significant progress has been made in Video Question Answering and cross-modal understanding, causal reasoning about how visual dynamics drive musical structure in music videos remains under-explored. We introduce KARMA-MV, a large-scale multiple-choice QA dataset derived from 2,682 YouTube music videos, designed to test models' ability to integrate temporal audio-visual cues and reason about visual-to-musical influence across reasoning, prediction, and counterfactual questions. Unlike traditional datasets requiring manual annotation, KARMA-MV leverages LLM reasoning for scalable generation and validation, yielding 37,737 MCQs. We propose a causal knowledge graph (CKG) approach that augments vision-language models (VLMs) with structured retrieval of cross-modal dependencies. Experiments on state-of-the-art VLMs and LLMs show consistent gains from CKG grounding -- especially for smaller models -- establishing the value of explicit causal structure for music-video reasoning. KARMA-MV provides a new benchmark for advancing causal audio-visual understanding beyond correlation.

preprint2020arXiv

Discriminative Dictionary Design for Action Classification in Still Images and Videos

In this paper, we address the problem of action recognition from still images and videos. Traditional local features such as SIFT, STIP etc. invariably pose two potential problems: 1) they are not evenly distributed in different entities of a given category and 2) many of such features are not exclusive of the visual concept the entities represent. In order to generate a dictionary taking the aforementioned issues into account, we propose a novel discriminative method for identifying robust and category specific local features which maximize the class separability to a greater extent. Specifically, we pose the selection of potent local descriptors as filtering based feature selection problem which ranks the local features per category based on a novel measure of distinctiveness. The underlying visual entities are subsequently represented based on the learned dictionary and this stage is followed by action classification using the random forest model followed by label propagation refinement. The framework is validated on the action recognition datasets based on still images (Stanford-40) as well as videos (UCF-50) and exhibits superior performances than the representative methods from the literature.

preprint2020arXiv

KinGDOM: Knowledge-Guided DOMain adaptation for sentiment analysis

Cross-domain sentiment analysis has received significant attention in recent years, prompted by the need to combat the domain gap between different applications that make use of sentiment analysis. In this paper, we take a novel perspective on this task by exploring the role of external commonsense knowledge. We introduce a new framework, KinGDOM, which utilizes the ConceptNet knowledge graph to enrich the semantics of a document by providing both domain-specific and domain-general background concepts. These concepts are learned by training a graph convolutional autoencoder that leverages inter-domain concepts in a domain-invariant manner. Conditioning a popular domain-adversarial baseline method with these learned concepts helps improve its performance over state-of-the-art approaches, demonstrating the efficacy of our proposed framework.