Researcher profile

Amogh Joshi

Amogh Joshi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

Visual serial processing deficits explain divergences in human and VLM reasoning

Why do Vision Language Models (VLMs), despite success on standard benchmarks, often fail to match human performance on surprisingly simple visual reasoning tasks? While the underlying computational principles are still debated, we hypothesize that a crucial factor is a deficit in visually-grounded serial processing. To test this hypothesis, we compared human and VLM performance across tasks designed to vary serial processing demands in three distinct domains: geometric reasoning, perceptual enumeration, and mental rotation. Tasks within each domain varied serial processing load by manipulating factors such as geometric concept complexity, perceptual individuation load, and transformation difficulty. Across all domains, our results revealed a consistent pattern: decreased VLM accuracy was strongly correlated with increased human reaction time (used as a proxy for serial processing load). As tasks require more demanding serial processing -- whether composing concepts, enumerating items, or performing mental transformations -- the VLM-human performance gap widens reliably. These findings support our hypothesis, indicating that limitations in serial, visually grounded reasoning represent a fundamental bottleneck that distinguishes current VLMs from humans.

preprint2022arXiv

Audio Matters Too: How Audial Avatar Customization Enhances Visual Avatar Customization

Avatar customization is known to positively affect crucial outcomes in numerous domains. However, it is unknown whether audial customization can confer the same benefits as visual customization. We conducted a preregistered 2 x 2 (visual choice vs. visual assignment x audial choice vs. audial assignment) study in a Java programming game. Participants with visual choice experienced higher avatar identification and autonomy. Participants with audial choice experienced higher avatar identification and autonomy, but only within the group of participants who had visual choice available. Visual choice led to an increase in time spent, and indirectly led to increases in intrinsic motivation, immersion, time spent, future play motivation, and likelihood of game recommendation. Audial choice moderated the majority of these effects. Our results suggest that audial customization plays an important enhancing role vis-à-vis visual customization. However, audial customization appears to have a weaker effect compared to visual customization. We discuss the implications for avatar customization more generally across digital applications.

preprint2022arXiv

Exploiting the Right: Inferring Ideological Alignment in Online Influence Campaigns Using Shared Images

This work advances investigations into the visual media shared by agents in disinformation campaigns by characterizing the images shared by accounts identified by Twitter as being part of such campaigns. Using images shared by US politicians' Twitter accounts as a baseline and training set, we build models for inferring the ideological presentation of accounts using the images they share. Results show that, while our models recover the expected bimodal ideological distribution of US politicians, we find that, on average, four separate influence campaigns -- attributed to Iran, Russia, China, and Venezuela -- all present conservative ideological presentations in the images they share. Given that prior work has shown Twitter accounts used by Russian disinformation agents are ideologically diverse in the text and news they share, these image-oriented findings provide new insights into potential axes of coordination and suggest these accounts may not present consistent ideological positions across modalities.

preprint2022arXiv

Standardizing and Centralizing Datasets to Enable Efficient Training of Agricultural Deep Learning Models

In recent years, deep learning models have become the standard for agricultural computer vision. Such models are typically fine-tuned to agricultural tasks using model weights that were originally fit to more general, non-agricultural datasets. This lack of agriculture-specific fine-tuning potentially increases training time and resource use, and decreases model performance, leading an overall decrease in data efficiency. To overcome this limitation, we collect a wide range of existing public datasets for three distinct tasks, standardize them, and construct standard training and evaluation pipelines, providing us with a set of benchmarks and pretrained models. We then conduct a number of experiments using methods which are commonly used in deep learning tasks, but unexplored in their domain-specific applications for agriculture. Our experiments guide us in developing a number of approaches to improve data efficiency when training agricultural deep learning models, without large-scale modifications to existing pipelines. Our results demonstrate that even slight training modifications, such as using agricultural pretrained model weights, or adopting specific spatial augmentations into data processing pipelines, can significantly boost model performance and result in shorter convergence time, saving training resources. Furthermore, we find that even models trained on low-quality annotations can produce comparable levels of performance to their high-quality equivalents, suggesting that datasets with poor annotations can still be used for training, expanding the pool of currently available datasets. Our methods are broadly applicable throughout agricultural deep learning, and present high potential for significant data efficiency improvements.