Researcher profile

Tanuja Ganu

Tanuja Ganu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

A Nash Equilibrium Framework For Training-Free Multimodal Step Verification

Multimodal large language models often generate reasoning chains containing subtle errors that lead to incorrect answers. Current verification approaches have notable limitations. Learned critics need extensive labeled data and show inconsistent performance across different tasks. Meanwhile, existing training-free methods simply average scores from different sources, missing a key insight: when these scores disagree, that disagreement itself carries important information about whether a reasoning step is truly valid or not. We propose a training-free verification approach that treats step-wise verification as a coordination problem among specialized judges. We formalize these judges' interaction as a Nash equilibrium game where agreement signals valid steps while disagreement reveals instability. Our method computes equilibrium scores through a closed-form solution, enabling both disagreement-aware filtering and stability-conscious ranking of reasoning steps. Evaluated across six benchmarks, our approach achieves consistent improvements of 2.4% to 5.2% over baseline models and shows competitive performance against learned critics, demonstrating that cross-modal agreement (not just average confidence) provides robust verification signals without task-specific adaptation.

preprint2022arXiv

Broken News: Making Newspapers Accessible to Print-Impaired

Accessing daily news content still remains a big challenge for people with print-impairment including blind and low-vision due to opacity of printed content and hindrance from online sources. In this paper, we present our approach for digitization of print newspaper into an accessible file format such as HTML. We use an ensemble of instance segmentation and detection framework for newspaper layout analysis and then OCR to recognize text elements such as headline and article text. Additionally, we propose EdgeMask loss function for Mask-RCNN framework to improve segmentation mask boundary and hence accuracy of downstream OCR task. Empirically, we show that our proposed loss function reduces the Word Error Rate (WER) of news article text by 32.5 %.

preprint2022arXiv

Document Navigability: A Need for Print-Impaired

Printed documents continue to be a challenge for blind, low-vision, and other print-disabled (BLV) individuals. In this paper, we focus on the specific problem of (in-)accessibility of internal references to citations, footnotes, figures, tables and equations. While sighted users can flip to the referenced content and flip back in seconds, linear audio narration that BLV individuals rely on makes following these references extremely hard. We propose a vision based technique to locate the referenced content and extract metadata needed to (in subsequent work) inline a content summary into the audio narration. We apply our technique to citations in scientific documents and find it works well both on born-digital as well as scanned documents.

preprint2022arXiv

Towards Optimizing OCR for Accessibility

Visual cues such as structure, emphasis, and icons play an important role in efficient information foraging by sighted individuals and make for a pleasurable reading experience. Blind, low-vision and other print-disabled individuals miss out on these cues since current OCR and text-to-speech software ignore them, resulting in a tedious reading experience. We identify four semantic goals for an enjoyable listening experience, and identify syntactic visual cues that help make progress towards these goals. Empirically, we find that preserving even one or two visual cues in aural form significantly enhances the experience for listening to print content.

preprint2020arXiv

Optimising Lockdown Policies for Epidemic Control using Reinforcement Learning

In the context of the ongoing Covid-19 pandemic, several reports and studies have attempted to model and predict the spread of the disease. There is also intense debate about policies for limiting the damage, both to health and to the economy. On the one hand, the health and safety of the population is the principal consideration for most countries. On the other hand, we cannot ignore the potential for long-term economic damage caused by strict nation-wide lockdowns. In this working paper, we present a quantitative way to compute lockdown decisions for individual cities or regions, while balancing health and economic considerations. Furthermore, these policies are learnt automatically by the proposed algorithm, as a function of disease parameters (infectiousness, gestation period, duration of symptoms, probability of death) and population characteristics (density, movement propensity). We account for realistic considerations such as imperfect lockdowns, and show that the policy obtained using reinforcement learning is a viable quantitative approach towards lockdowns.