Source author record

Amogh Joshi

Amogh Joshi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Vision cs.CY Human-Computer Interaction

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Visual serial processing deficits explain divergences in human and VLM reasoning

Why do Vision Language Models (VLMs), despite success on standard benchmarks, often fail to match human performance on surprisingly simple visual reasoning tasks? While the underlying computational principles are still debated, we hypothesize that a crucial factor is a deficit in visually-grounded serial processing. To test this hypothesis, we compared human and VLM performance across tasks designed to vary serial processing demands in three distinct domains: geometric reasoning, perceptual enumeration, and mental rotation. Tasks within each domain varied serial processing load by manipulating factors such as geometric concept complexity, perceptual individuation load, and transformation difficulty. Across all domains, our results revealed a consistent pattern: decreased VLM accuracy was strongly correlated with increased human reaction time (used as a proxy for serial processing load). As tasks require more demanding serial processing -- whether composing concepts, enumerating items, or performing mental transformations -- the VLM-human performance gap widens reliably. These findings support our hypothesis, indicating that limitations in serial, visually grounded reasoning represent a fundamental bottleneck that distinguishes current VLMs from humans.

preprint2022arXiv

Audio Matters Too: How Audial Avatar Customization Enhances Visual Avatar Customization

Avatar customization is known to positively affect crucial outcomes in numerous domains. However, it is unknown whether audial customization can confer the same benefits as visual customization. We conducted a preregistered 2 x 2 (visual choice vs. visual assignment x audial choice vs. audial assignment) study in a Java programming game. Participants with visual choice experienced higher avatar identification and autonomy. Participants with audial choice experienced higher avatar identification and autonomy, but only within the group of participants who had visual choice available. Visual choice led to an increase in time spent, and indirectly led to increases in intrinsic motivation, immersion, time spent, future play motivation, and likelihood of game recommendation. Audial choice moderated the majority of these effects. Our results suggest that audial customization plays an important enhancing role vis-à-vis visual customization. However, audial customization appears to have a weaker effect compared to visual customization. We discuss the implications for avatar customization more generally across digital applications.

preprint2022arXiv

Exploiting the Right: Inferring Ideological Alignment in Online Influence Campaigns Using Shared Images

This work advances investigations into the visual media shared by agents in disinformation campaigns by characterizing the images shared by accounts identified by Twitter as being part of such campaigns. Using images shared by US politicians' Twitter accounts as a baseline and training set, we build models for inferring the ideological presentation of accounts using the images they share. Results show that, while our models recover the expected bimodal ideological distribution of US politicians, we find that, on average, four separate influence campaigns -- attributed to Iran, Russia, China, and Venezuela -- all present conservative ideological presentations in the images they share. Given that prior work has shown Twitter accounts used by Russian disinformation agents are ideologically diverse in the text and news they share, these image-oriented findings provide new insights into potential axes of coordination and suggest these accounts may not present consistent ideological positions across modalities.

preprint2022arXiv

Standardizing and Centralizing Datasets to Enable Efficient Training of Agricultural Deep Learning Models

In recent years, deep learning models have become the standard for agricultural computer vision. Such models are typically fine-tuned to agricultural tasks using model weights that were originally fit to more general, non-agricultural datasets. This lack of agriculture-specific fine-tuning potentially increases training time and resource use, and decreases model performance, leading an overall decrease in data efficiency. To overcome this limitation, we collect a wide range of existing public datasets for three distinct tasks, standardize them, and construct standard training and evaluation pipelines, providing us with a set of benchmarks and pretrained models. We then conduct a number of experiments using methods which are commonly used in deep learning tasks, but unexplored in their domain-specific applications for agriculture. Our experiments guide us in developing a number of approaches to improve data efficiency when training agricultural deep learning models, without large-scale modifications to existing pipelines. Our results demonstrate that even slight training modifications, such as using agricultural pretrained model weights, or adopting specific spatial augmentations into data processing pipelines, can significantly boost model performance and result in shorter convergence time, saving training resources. Furthermore, we find that even models trained on low-quality annotations can produce comparable levels of performance to their high-quality equivalents, suggesting that datasets with poor annotations can still be used for training, expanding the pool of currently available datasets. Our methods are broadly applicable throughout agricultural deep learning, and present high potential for significant data efficiency improvements.

Amogh Joshi

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Visual serial processing deficits explain divergences in human and VLM reasoning

Audio Matters Too: How Audial Avatar Customization Enhances Visual Avatar Customization

Exploiting the Right: Inferring Ideological Alignment in Online Influence Campaigns Using Shared Images

Standardizing and Centralizing Datasets to Enable Efficient Training of Agricultural Deep Learning Models