Source author record

Amanmeet Garg

Amanmeet Garg appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Multimedia Applications Artificial Intelligence Computer Vision Machine Learning Neurons and Cognition

Catalog footprint

What is connected

3works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CounterVid: Counterfactual Video Generation for Mitigating Action and Temporal Hallucinations in Video-Language Models

Video-language models (VLMs) achieve strong multimodal understanding but remain prone to hallucinations, especially when reasoning about actions and temporal order. Existing mitigation strategies, such as textual filtering or random video perturbations, often fail to address the root cause: over-reliance on language priors rather than fine-grained visual dynamics. We propose a scalable framework for counterfactual video generation that synthesizes videos differing only in actions or temporal structure while preserving scene context. Our pipeline combines multimodal LLMs for action proposal and editing guidance with diffusion-based image and video models to generate semantic hard negatives at scale. Using this framework, we build CounterVid, a synthetic dataset of ~26k preference pairs targeting action recognition and temporal reasoning. We further introduce MixDPO, a unified Direct Preference Optimization approach that jointly leverages textual and visual preferences. Fine-tuning Qwen2.5-VL with MixDPO yields consistent improvements, notably in temporal ordering, and transfers effectively to standard video hallucination benchmarks. Code and models will be made publicly available.

preprint2020arXiv

PodSumm -- Podcast Audio Summarization

The diverse nature, scale, and specificity of podcasts present a unique challenge to content discovery systems. Listeners often rely on text descriptions of episodes provided by the podcast creators to discover new content. Some factors like the presentation style of the narrator and production quality are significant indicators of subjective user preference but are difficult to quantify and not reflected in the text descriptions provided by the podcast creators. We propose the automated creation of podcast audio summaries to aid in content discovery and help listeners to quickly preview podcast content before investing time in listening to an entire episode. In this paper, we present a method to automatically construct a podcast summary via guidance from the text-domain. Our method performs two key steps, namely, audio to text transcription and text summary generation. Motivated by a lack of datasets for this task, we curate an internal dataset, find an effective scheme for data augmentation, and design a protocol to gather summaries from annotators. We fine-tune a PreSumm[10] model with our augmented dataset and perform an ablation study. Our method achieves ROUGE-F(1/2/L) scores of 0.63/0.53/0.63 on our dataset. We hope these results may inspire future research in this direction.

preprint2016arXiv

Cortical Geometry Network and Topology Markers for Parkinson's Disease

Neurodegeneration affects cortical gray matter leading to loss of cortical mantle volume. As a result of such volume loss, the geometrical arrangement of the regions on the cortical surface is expected to be altered in comparison to healthy brains. Here we present a novel method to study the alterations in brain cortical surface geometry in Parkinson's disease (PD) subjects with a \emph{Geometry Networks (GN)} framework. The local geometrical arrangement of the cortical surface is captured as the 3D coordinates of the centroids of anatomically defined parcels on the surface. The inter-regional distance between cortical patches is the signal of interest and is captured as a geometry network. We study its topology by computing the dimensionality of simplicial complexes induced on a filtration of binary undirected networks for each geometry network. In a permutation statistics test, a statistically significant ($p<0.05$) difference was observed in the homology features between PD and healthy control groups highlighting its potential to differentiate between the groups and their potential utility in disease diagnosis.