Researcher profile

Ehsan Hoque

Ehsan Hoque contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

Can LLMs Emulate Human Belief Dynamics?

Can LLMs simulate how humans form and change beliefs in social networks? We put this to the test by replicating an established study on belief dynamics, evaluating 12 LLMs across multiple model families and parameter sizes. The answer is a clear no, and in systematic ways. LLMs fail to capture initial human belief distributions and tend to be overall more conformist than humans, shifting their responses to align with those around them. They also take a nuanced approach to emulating human homophilic tendencies within networks. Our findings carry a double payoff: they highlight fundamental properties of LLM behavior, and they raise a sharp warning against deploying LLMs as human proxies in social simulations.

preprint2026arXiv

HAL: Inducing Human-likeness in LLMs with Alignment

Conversational human-likeness plays a central role in human-AI interaction, yet it has remained difficult to define, measure, and optimize. As a result, improvements in human-like behavior are largely driven by scale or broad supervised training, rather than targeted alignment. We introduce Human Aligning LLMs (HAL), a framework for aligning language models to conversational human-likeness using an interpretable, data-driven reward. HAL derives explicit conversational traits from contrastive dialogue data, combines them into a compact scalar score, and uses this score as a transparent reward signal for alignment with standard preference optimization methods. Using this approach, we align models of varying sizes without affecting their overall performance. In large-scale human evaluations, models aligned with HAL are more frequently perceived as human-like in conversation. Because HAL operates over explicit, interpretable traits, it enables inspection of alignment behavior and diagnosis of unintended effects. More broadly, HAL demonstrates how soft, qualitative properties of language--previously outside the scope for alignment--can be made measurable and aligned in an interpretable and explainable way.

preprint2026arXiv

Hi5: Synthetic Data for Inclusive, Robust, Hand Pose Estimation

Hand pose estimation plays a vital role in capturing subtle nonverbal cues essential for understanding human affect. However, collecting diverse, expressive real-world data remains challenging due to labor-intensive manual annotation that often underrepresents demographic diversity and natural expressions. To address this issue, we introduce a cost-effective approach to generating synthetic data using high-fidelity 3D hand models and a wide range of affective hand poses. Our method includes varied skin tones, genders, dynamic environments, realistic lighting conditions, and diverse naturally occurring gesture animations. The resulting dataset, Hi5, contains 583,000 pose-annotated images, carefully balanced to reflect natural diversity and emotional expressiveness. Models trained exclusively on Hi5 achieve performance comparable to human-annotated datasets, exhibiting superior robustness to occlusions and consistent accuracy across diverse skin tones -- which is crucial for reliably recognizing expressive gestures in affective computing applications. Our results demonstrate that synthetic data effectively addresses critical limitations of existing datasets, enabling more inclusive, expressive, and reliable gesture recognition systems while achieving competitive performance in pose estimation benchmarks. The Hi5 dataset, data synthesis pipeline, source code, and game engine project are publicly released to support further research in synthetic hand-gesture applications.

preprint2026arXiv

VisualActBench: Can VLMs See and Act like a Human?

Vision-Language Models (VLMs) have achieved impressive progress in perceiving and describing visual environments. However, their ability to proactively reason and act based solely on visual inputs, without explicit textual prompts, remains underexplored. We introduce a new task, Visual Action Reasoning, and propose VisualActBench, a large-scale benchmark comprising 1,074 videos and 3,733 human-annotated actions across four real-world scenarios. Each action is labeled with an Action Prioritization Level (APL) and a proactive-reactive type to assess models' human-aligned reasoning and value sensitivity. We evaluate 29 VLMs on VisualActBench and find that while frontier models like GPT4o demonstrate relatively strong performance, a significant gap remains compared to human-level reasoning, particularly in generating proactive, high-priority actions. Our results highlight limitations in current VLMs' ability to interpret complex context, anticipate outcomes, and align with human decision-making frameworks. VisualActBench establishes a comprehensive foundation for assessing and improving the real-world readiness of proactive, vision-centric AI agents.

preprint2023arXiv

Don't follow the leader: Independent thinkers create scientific innovation

Academic success is distributed unequally; a few top scientists receive the bulk of attention, citations, and resources. However, do these ``superstars" foster leadership in scientific innovation? We introduce three information-theoretic measures that quantify novelty, innovation, and impact from scholarly citation networks, and compare the scholarly output of scientists who are either not connected or strongly connected to superstar scientists. We find that while connected scientists do indeed publish more, garner more citations, and produce more diverse content, this comes at a cost of lower innovation and higher redundancy of ideas. Further, once one removes papers co-authored with superstars, the academic output of these connected scientists diminishes. In contrast, authors that produce innovative content without the benefit of collaborations with scientific superstars produce papers that connect a greater diversity of concepts, publish more, and have comparable citation rates, once one controls for transferred prestige of superstars. On balance, our results indicate that academia pays a price by focusing attention and resources on superstars.

preprint2022arXiv

A Flexible Schema-Guided Dialogue Management Framework: From Friendly Peer to Virtual Standardized Cancer Patient

A schema-guided approach to dialogue management has been shown in recent work to be effective in creating robust customizable virtual agents capable of acting as friendly peers or task assistants. However, successful applications of these methods in open-ended, mixed-initiative domains remain elusive -- particularly within medical domains such as virtual standardized patients, where such complex interactions are commonplace -- and require more extensive and flexible dialogue management capabilities than previous systems provide. In this paper, we describe a general-purpose schema-guided dialogue management framework used to develop SOPHIE, a virtual standardized cancer patient that allows a doctor to conveniently practice for interactions with patients. We conduct a crowdsourced evaluation of conversations between medical students and SOPHIE. Our agent is judged to produce responses that are natural, emotionally appropriate, and consistent with her role as a cancer patient. Furthermore, it significantly outperforms an end-to-end neural model fine-tuned on a human standardized patient corpus, attesting to the advantages of a schema-guided approach.

preprint2022arXiv

Auto-Gait: Automatic Ataxia Risk Assessment with Computer Vision on Gait Task Videos

In this paper, we investigated whether we can 1) detect participants with ataxia-specific gait characteristics (risk-prediction), and 2) assess severity of ataxia from gait (severity-assessment) using computer vision. We created a dataset of 155 videos from 89 participants, 24 controls and 65 diagnosed with (or are pre-manifest) spinocerebellar ataxias (SCAs), performing the gait task of the Scale for the Assessment and Rating of Ataxia (SARA) from 11 medical sites located in 8 different states across the United States. We develop a computer vision pipeline to detect, track, and separate out the participants from their surroundings and construct several features from their body pose coordinates to capture gait characteristics like step width, step length, swing, stability, speed, etc. Our risk-prediction model achieves 83.06% accuracy and an 80.23% F1 score. Similarly, our severity-assessment model achieves a mean absolute error (MAE) score of 0.6225 and a Pearson's correlation coefficient score of 0.7268. Our models still performed competitively when evaluated on data from sites not used during training. Furthermore, through feature importance analysis, we found that our models associate wider steps, decreased walking speed, and increased instability with greater ataxia severity, which is consistent with previously established clinical knowledge. Our models create possibilities for remote ataxia assessment in non-clinical settings in the future, which could significantly improve accessibility of ataxia care. Furthermore, our underlying dataset was assembled from a geographically diverse cohort, highlighting its potential to further increase equity. The code used in this study is open to the public, and the anonymized body pose landmark dataset is also available upon request.

preprint2022arXiv

SEER: Sustainable E-commerce with Environmental-impact Rating

With online shopping gaining massive popularity over the past few years, e-commerce platforms can play a significant role in tackling climate change and other environmental problems. In this study, we report that the "attitude-behavior" gap identified by prior sustainable consumption literature also exists in an online setting. We propose SEER, a concept design for online shopping websites to help consumers make more sustainable choices. We introduce explainable environmental impact ratings to increase knowledge, trust, and convenience for consumers willing to purchase eco-friendly products. In our quasi-randomized case-control experiment with 98 subjects across the United States, we found that the case group using SEER demonstrates significantly more eco-friendly consumption behavior than the control group using a traditional e-commerce setting. While there are challenges in generating reliable explanations and environmental ratings for products, if implemented, in the United States alone, SEER has the potential to reduce approximately 2.88 million tonnes of carbon emission every year.

preprint2021arXiv

A Mental Trespass? Unveiling Truth, Exposing Thoughts and Threatening Civil Liberties with Non-Invasive AI Lie Detection

Imagine an app on your phone or computer that can tell if you are being dishonest, just by processing affective features of your facial expressions, body movements, and voice. People could ask about your political preferences, your sexual orientation, and immediately determine which of your responses are honest and which are not. In this paper we argue why artificial intelligence-based, non-invasive lie detection technologies are likely to experience a rapid advancement in the coming years, and that it would be irresponsible to wait any longer before discussing its implications. Legal and popular perspectives are reviewed to evaluate the potential for these technologies to cause societal harm. To understand the perspective of a reasonable person, we conducted a survey of 129 individuals, and identified consent and accuracy as the major factors in their decision-making process regarding the use of these technologies. In our analysis, we distinguish two types of lie detection technology, accurate truth metering and accurate thought exposing. We generally find that truth metering is already largely within the scope of existing US federal and state laws, albeit with some notable exceptions. In contrast, we find that current regulation of thought exposing technologies is ambiguous and inadequate to safeguard civil liberties. In order to rectify these shortcomings, we introduce the legal concept of mental trespass and use this concept as the basis for proposed regulation.

preprint2021arXiv

Could you become more credible by being White? Assessing Impact of Race on Credibility with Deepfakes

Computer mediated conversations (e.g., videoconferencing) is now the new mainstream media. How would credibility be impacted if one could change their race on the fly in these environments? We propose an approach using Deepfakes and a supporting GAN architecture to isolate visual features and alter racial perception. We then crowd-sourced over 800 survey responses to measure how credibility was influenced by changing the perceived race. We evaluate the effect of showing a still image of a Black person versus a still image of a White person using the same audio clip for each survey. We also test the effect of showing either an original video or an altered video where the appearance of the person in the original video is modified to appear more White. We measure credibility as the percent of participant responses who believed the speaker was telling the truth. We found that changing the race of a person in a static image has negligible impact on credibility. However, the same manipulation of race on a video increases credibility significantly (61\% to 73\% with p $<$ 0.05). Furthermore, a VADER sentiment analysis over the free response survey questions reveals that more positive sentiment is used to justify the credibility of a White individual in a video.

preprint2020arXiv

The Relationship between Deteriorating Mental Health Conditions and Longitudinal Behavioral Changes in Google and YouTube Usages among College Students in the United States during COVID-19: Observational Study

Mental health problems among the global population are worsened during the coronavirus disease (COVID-19). How individuals engage with online platforms such as Google Search and YouTube undergoes drastic shifts due to pandemic and subsequent lockdowns. Such ubiquitous daily behaviors on online platforms have the potential to capture and correlate with clinically alarming deteriorations in mental health profiles in a non-invasive manner. The goal of this study is to examine, among college students, the relationship between deteriorating mental health conditions and changes in user behaviors when engaging with Google Search and YouTube during COVID-19. This study recruited a cohort of 49 students from a U.S. college campus during January 2020 (prior to the pandemic) and measured the anxiety and depression levels of each participant. This study followed up with the same cohort during May 2020 (during the pandemic), and the anxiety and depression levels were assessed again. The longitudinal Google Search and YouTube history data were anonymized and collected. From individual-level Google Search and YouTube histories, we developed 5 signals that can quantify shifts in online behaviors during the pandemic. We then assessed the differences between groups with and without deteriorating mental health profiles in terms of these features. Significant features included late-night online activities, continuous usages, and time away from the internet, porn consumptions, and keywords associated with negative emotions, social activities, and personal affairs. Though further studies are required, our results demonstrated the feasibility of utilizing pervasive online data to establish non-invasive surveillance systems for mental health conditions that bypasses many disadvantages of existing screening methods.