Source author record

Munmun De Choudhury

Munmun De Choudhury appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cs.CY Social and Information Networks Human-Computer Interaction Artificial Intelligence Computation and Language Machine Learning Multimedia physics.soc-ph

Catalog footprint

What is connected

12works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CALM-IT: Generating Realistic Long-Form Motivational Interviewing Dialogues with Dual-Actor Conversational Dynamics Tracking

Large Language Models (LLMs) are increasingly used in mental health-related settings, yet they struggle to sustain realistic, goal-directed dialogue over extended interactions. While LLMs generate fluent responses, they optimize locally for the next turn rather than maintaining a coherent model of therapeutic progress, leading to brittleness and long-horizon drift. We introduce CALM-IT, a framework for generating and evaluating long-form Motivational Interviewing (MI) dialogues that explicitly models dual-actor conversational dynamics. CALM-IT represents therapist-client interaction as a bidirectional state-space process, in which both agents continuously update inferred alignment, mental states, and short-term goals to guide strategy selection and utterance generation. Across large-scale evaluations, CALM-IT consistently outperforms strong baselines in Effectiveness and Goal Alignment and remains substantially more stable as conversation length increases. Although CALM-IT initiates fewer therapist redirections, it achieves the highest client acceptance rate (64.3%), indicating more precise and therapeutically aligned intervention timing. Overall, CALM-IT provides evidence for modeling evolving conversational state being essential for generating high-quality long-form synthetic conversations.

preprint2022arXiv

How is Vaping Framed on Online Knowledge Dissemination Platforms?

We analyze 1,888 articles and 1,119,453 vaping posts to study how vaping is framed across multiple knowledge dissemination platforms (Wikipedia, Quora, Medium, Reddit, Stack Exchange, wikiHow). We use various NLP techniques to understand these differences. For example, n-grams, emotion recognition, and question answering results indicate that Medium, Quora, and Stack Exchange are appropriate venues for those looking to transition from smoking to vaping. Other platforms (Reddit, wikiHow) are more for vaping hobbyists and may not sufficiently dissuade youth vaping. Conversely, Wikipedia may exaggerate vaping harms, dissuading smokers from transitioning. A strength of our work is how the different techniques we have applied validate each other. Based on our results, we provide several recommendations. Stakeholders may utilize our findings to design informational tools to reinforce or mitigate vaping (mis)perceptions online.

preprint2022arXiv

Overcoming Language Disparity in Online Content Classification with Multimodal Learning

Advances in Natural Language Processing (NLP) have revolutionized the way researchers and practitioners address crucial societal problems. Large language models are now the standard to develop state-of-the-art solutions for text detection and classification tasks. However, the development of advanced computational techniques and resources is disproportionately focused on the English language, sidelining a majority of the languages spoken globally. While existing research has developed better multilingual and monolingual language models to bridge this language disparity between English and non-English languages, we explore the promise of incorporating the information contained in images via multimodal machine learning. Our comparative analyses on three detection tasks focusing on crisis information, fake news, and emotion recognition, as well as five high-resource non-English languages, demonstrate that: (a) detection frameworks based on pre-trained large language models like BERT and multilingual-BERT systematically perform better on the English language compared against non-English languages, and (b) including images via multimodal learning bridges this performance gap. We situate our findings with respect to existing work on the pitfalls of large language models, and discuss their theoretical and practical implications. Resources for this paper are available at https://multimodality-language-disparity.github.io/.

preprint2022arXiv

Partisan US News Media Representations of Syrian Refugees

We investigate how representations of Syrian refugees (2011-2021) differ across US partisan news outlets. We analyze 47,388 articles from the online US media about Syrian refugees to detail differences in reporting between left- and right-leaning media. We use various NLP techniques to understand these differences. Our polarization and question answering results indicated that left-leaning media tended to represent refugees as child victims, welcome in the US, and right-leaning media cast refugees as Islamic terrorists. We noted similar results with our sentiment and offensive speech scores over time, which detail possibly unfavorable representations of refugees in right-leaning media. A strength of our work is how the different techniques we have applied validate each other. Based on our results, we provide several recommendations. Stakeholders may utilize our findings to intervene around refugee representations, and design communications campaigns that improve the way society sees refugees and possibly aid refugee outcomes.

preprint2022arXiv

US News and Social Media Framing around Vaping

In this paper, we investigate how vaping is framed differently (2008-2021) between US news and social media. We analyze 15,711 news articles and 1,231,379 Facebook posts about vaping to study the differences in framing between media varieties. We use word embeddings to provide two-dimensional visualizations of the semantic changes around vaping for news and for social media. We detail that news media framing of vaping shifted over time in line with emergent regulatory trends, such as; flavored vaping bans, with little discussion around vaping as a smoking cessation tool. We found that social media discussions were far more varied, with transitions toward vaping both as a public health harm and as a smoking cessation tool. Our cloze test, dynamic topic model, and question answering showed similar patterns, where social media, but not news media, characterizes vaping as combustible cigarette substitute. We use n-grams to detail that social media data first centered on vaping as a smoking cessation tool, and in 2019 moved toward narratives around vaping regulation, similar to news media frames. Overall, social media tracks the evolution of vaping as a social practice, while news media reflects more risk based concerns. A strength of our work is how the different techniques we have applied validate each other. Stakeholders may utilize our findings to intervene around the framing of vaping, and may design communications campaigns that improve the way society sees vaping, thus possibly aiding smoking cessation; and reducing youth vaping.

preprint2020arXiv

Computational Support for Substance Use Disorder Prevention, Detection, Treatment, and Recovery

Substance Use Disorders (SUDs) involve the misuse of any or several of a wide array of substances, such as alcohol, opioids, marijuana, and methamphetamine. SUDs are characterized by an inability to decrease use despite severe social, economic, and health-related consequences to the individual. A 2017 national survey identified that 1 in 12 US adults have or have had a substance use disorder. The National Institute on Drug Abuse estimates that SUDs relating to alcohol, prescription opioids, and illicit drug use cost the United States over $520 billion annually due to crime, lost work productivity, and health care expenses. Most recently, the US Department of Health and Human Services has declared the national opioid crisis a public health emergency to address the growing number of opioid overdose deaths in the United States. In this interdisciplinary workshop, we explored how computational support - digital systems, algorithms, and sociotechnical approaches (which consider how technology and people interact as complex systems) - may enhance and enable innovative interventions for prevention, detection, treatment, and long-term recovery from SUDs. The Computing Community Consortium (CCC) sponsored a two-day workshop titled "Computational Support for Substance Use Disorder Prevention, Detection, Treatment, and Recovery" on November 14-15, 2019 in Washington, DC. As outcomes from this visioning process, we identified three broad opportunity areas for computational support in the SUD context: 1. Detecting and mitigating risk of SUD relapse, 2. Establishing and empowering social support networks, and 3. Collecting and sharing data meaningfully across ecologies of formal and informal care.

preprint2020arXiv

Jointly Predicting Job Performance, Personality, Cognitive Ability, Affect, and Well-Being

Assessment of job performance, personalized health and psychometric measures are domains where data-driven and ubiquitous computing exhibits the potential of a profound impact in the future. Existing techniques use data extracted from questionnaires, sensors (wearable, computer, etc.), or other traits, to assess well-being and cognitive attributes of individuals. However, these techniques can neither predict individual's well-being and psychological traits in a global manner nor consider the challenges associated to processing the data available, that is incomplete and noisy. In this paper, we create a benchmark for predictive analysis of individuals from a perspective that integrates: physical and physiological behavior, psychological states and traits, and job performance. We design data mining techniques as benchmark and uses real noisy and incomplete data derived from wearable sensors to predict 19 constructs based on 12 standardized well-validated tests. The study included 757 participants who were knowledge workers in organizations across the USA with varied work roles. We developed a data mining framework to extract the meaningful predictors for each of the 19 variables under consideration. Our model is the first benchmark that combines these various instrument-derived variables in a single framework to understand people's behavior by leveraging real uncurated data from wearable, mobile, and social media sources. We verify our approach experimentally using the data obtained from our longitudinal study. The results show that our framework is consistently reliable and capable of predicting the variables under study better than the baselines when prediction is restricted to the noisy, incomplete data.

preprint2016arXiv

Quote RTs on Twitter: Usage of the New Feature for Political Discourse

Social media platforms provide several social interactional features. Due to the large scale reach of social media, these interactional features help enable various types of political discourse. Constructive and diversified discourse is important for sustaining healthy communities and reducing the impact of echo chambers. In this paper, we empirically examine the role of a newly introduced Twitter feature, 'quote retweets' (or 'quote RTs') in political discourse, specifically whether it has led to improved, civil, and balanced exchange. Quote RTs allow users to quote the tweet they retweet, while adding a short comment. Our analysis using content, network and crowd labeled data indicates that the feature has increased political discourse and its diffusion, compared to existing features. We discuss the implications of our findings in understanding and reducing online polarization.

preprint2016arXiv

Smart Societies: From Citizens as Sensors to Collective Action

Social media has become globally ubiquitous, transforming how people are networked and mobilized. This forum explores research and applications of these new networked publics at individual, organizational, and societal levels.

preprint2015arXiv

"Narco" Emotions: Affect and Desensitization in Social Media during the Mexican Drug War

Social media platforms have emerged as prominent information sharing ecosystems in the context of a variety of recent crises, ranging from mass emergencies, to wars and political conflicts. We study affective responses in social media and how they might indicate desensitization to violence experienced in communities embroiled in an armed conflict. Specifically, we examine three established affect measures: negative affect, activation, and dominance as observed on Twitter in relation to a number of statistics on protracted violence in four major cities afflicted by the Mexican Drug War. During a two year period (Aug 2010-Dec 2012), while violence was on the rise in these regions, our findings show a decline in negative emotional expression as well as a rise in emotional arousal and dominance in Twitter posts: aspects known to be psychological markers of desensitization. We discuss the implications of our work for behavioral health, facilitating rehabilitation efforts in communities enmeshed in an acute and persistent urban warfare, and the impact on civic engagement.

preprint2015arXiv

The New War Correspondents: the Rise of Civic Media Curation in Urban Warfare

In this paper we examine the information sharing practices of people living in cities amid armed conflict. We describe the volume and frequency of microblogging activity on Twitter from four cities afflicted by the Mexican Drug War, showing how citizens use social media to alert one another and to comment on the violence that plagues their communities. We then investigate the emergence of civic media "curators," individuals who act as "war correspondents" by aggregating and disseminating information to large numbers of people on social media. We conclude by outlining the implications of our observations for the design of civic media systems in wartime.

preprint2010arXiv

"Birds of a Feather": Does User Homophily Impact Information Diffusion in Social Media?

This article investigates the impact of user homophily on the social process of information diffusion in online social media. Over several decades, social scientists have been interested in the idea that similarity breeds connection: precisely known as "homophily". Homophily has been extensively studied in the social sciences and refers to the idea that users in a social system tend to bond more with ones who are similar to them than to ones who are dissimilar. The key observation is that homophily structures the ego-networks of individuals and impacts their communication behavior. It is therefore likely to effect the mechanisms in which information propagates among them. To this effect, we investigate the interplay between homophily along diverse user attributes and the information diffusion process on social media. In our approach, we first extract diffusion characteristics---corresponding to the baseline social graph as well as graphs filtered on different user attributes (e.g. location, activity). Second, we propose a Dynamic Bayesian Network based framework to predict diffusion characteristics at a future time. Third, the impact of attribute homophily is quantified by the ability of the predicted characteristics in explaining actual diffusion, and external variables, including trends in search and news. Experimental results on a large Twitter dataset demonstrate that choice of the homophilous attribute can impact the prediction of information diffusion, given a specific metric and a topic. In most cases, attribute homophily is able to explain the actual diffusion and external trends by ~15-25% over cases when homophily is not considered.

Munmun De Choudhury

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

CALM-IT: Generating Realistic Long-Form Motivational Interviewing Dialogues with Dual-Actor Conversational Dynamics Tracking

How is Vaping Framed on Online Knowledge Dissemination Platforms?

Overcoming Language Disparity in Online Content Classification with Multimodal Learning

Partisan US News Media Representations of Syrian Refugees

US News and Social Media Framing around Vaping

Computational Support for Substance Use Disorder Prevention, Detection, Treatment, and Recovery

Jointly Predicting Job Performance, Personality, Cognitive Ability, Affect, and Well-Being

Quote RTs on Twitter: Usage of the New Feature for Political Discourse

Smart Societies: From Citizens as Sensors to Collective Action

"Narco" Emotions: Affect and Desensitization in Social Media during the Mexican Drug War

The New War Correspondents: the Rise of Civic Media Curation in Urban Warfare

"Birds of a Feather": Does User Homophily Impact Information Diffusion in Social Media?