Source author record

Hanjia Lyu

Hanjia Lyu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Social and Information Networks Computer Vision Information Retrieval cs.CY Machine Learning

Catalog footprint

What is connected

8works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs

When exploring the development of Artificial General Intelligence (AGI), a critical task for these models involves interpreting and processing information from multiple image inputs. However, Large Multimodal Models (LMMs) encounter two issues in such scenarios: (1) a lack of fine-grained perception, and (2) a tendency to blend information across multiple images. We first extensively investigate the capability of LMMs to perceive fine-grained visual details when dealing with multiple input images. The research focuses on two aspects: first, image-to-image matching (to evaluate whether LMMs can effectively reason and pair relevant images), and second, multi-image-to-text matching (to assess whether LMMs can accurately capture and summarize detailed image information). We conduct evaluations on a range of both open-source and closed-source large models, including GPT-4V, Gemini, OpenFlamingo, and MMICL. To enhance model performance, we further develop a Contrastive Chain-of-Thought (CoCoT) prompting approach based on multi-input multimodal models. This method requires LMMs to compare the similarities and differences among multiple image inputs, and then guide the models to answer detailed questions about multi-image inputs based on the identified similarities and differences. Our experimental results showcase CoCoT's proficiency in enhancing the multi-image comprehension capabilities of large multimodal models.

preprint2022arXiv

American Twitter Users Revealed Social Determinants-related Oral Health Disparities amid the COVID-19 Pandemic

Objectives: To assess self-reported population oral health conditions amid COVID-19 pandemic using user reports on Twitter. Method and Material: We collected oral health-related tweets during the COVID-19 pandemic from 9,104 Twitter users across 26 states (with sufficient samples) in the United States between November 12, 2020 and June 14, 2021. We inferred user demographics by leveraging the visual information from the user profile images. Other characteristics including income, population density, poverty rate, health insurance coverage rate, community water fluoridation rate, and relative change in the number of daily confirmed COVID-19 cases were acquired or inferred based on retrieved information from user profiles. We performed logistic regression to examine whether discussions vary across user characteristics. Results: Overall, 26.70% of the Twitter users discuss wisdom tooth pain/jaw hurt, 23.86% tweet about dental service/cavity, 18.97% discuss chipped tooth/tooth break, 16.23% talk about dental pain, and the rest are about tooth decay/gum bleeding. Women and younger adults (19-29) are more likely to talk about oral health problems. Health insurance coverage rate is the most significant predictor in logistic regression for topic prediction. Conclusion: Tweets inform social disparities in oral health during the pandemic. For instance, people from counties at a higher risk of COVID-19 talk more about tooth decay/gum bleeding and chipped tooth/tooth break. Older adults, who are vulnerable to COVID-19, are more likely to discuss dental pain. Topics of interest vary across user characteristics. Through the lens of social media, our findings may provide insights for oral health practitioners and policy makers.

preprint2022arXiv

Learning to Aggregate and Refine Noisy Labels for Visual Sentiment Analysis

Visual sentiment analysis has received increasing attention in recent years. However, the dataset's quality is a concern because the sentiment labels are crowd-sourcing, subjective, and prone to mistakes, and poses a severe threat to the data-driven models, especially the deep neural networks. The deep models would generalize poorly on the testing cases when trained to over-fit the training samples with noisy sentiment labels. Inspired by the recent progress on learning with noisy labels, we propose a robust learning method to perform robust visual sentiment analysis. Our method relies on external memory to aggregate and filters noisy labels during training. The memory is composed of the prototypes with corresponding labels, which can be updated online. The learned prototypes and their labels can be regarded as denoising features and labels for the local regions and can guide the training process to prevent the model from overfitting the noisy cases. We establish a benchmark for visual sentiment analysis with label noise using publicly available datasets. The experiment results of the proposed benchmark settings comprehensively show the effectiveness of our method.

preprint2022arXiv

Misinformation versus Facts: Understanding the Influence of News Regarding COVID-19 Vaccines on Vaccine Uptake

There is a lot of fact-based information and misinformation in the online discourses and discussions about the COVID-19 vaccines. Using a sample of nearly four million geotagged English tweets and the data from the CDC COVID Data Tracker, we conducted the Fama-MacBeth regression with the Newey-West adjustment to understand the influence of both misinformation and fact-based news on Twitter on the COVID-19 vaccine uptake in the U.S. from April 19 when U.S. adults were vaccine eligible to June 30, 2021, after controlling state-level factors such as demographics, education, and the pandemic severity. We identified the tweets related to either misinformation or fact-based news by analyzing the URLs. One percent increase in fact-related Twitter users is associated with an approximately 0.87 decrease (B = -0.87, SE = 0.25, p<.001) in the number of daily new vaccinated people per hundred. No significant relationship was found between the percentage of fake-news-related users and the vaccination rate. The negative association between the percentage of fact-related users and the vaccination rate might be due to a combination of a larger user-level influence and the negative impact of online social endorsement on vaccination intent.

preprint2022arXiv

Understanding Political Polarization via Jointly Modeling Users, Connections and Multimodal Contents on Heterogeneous Graphs

Understanding political polarization on social platforms is important as public opinions may become increasingly extreme when they are circulated in homogeneous communities, thus potentially causing damage in the real world. Automatically detecting the political ideology of social media users can help better understand political polarization. However, it is challenging due to the scarcity of ideology labels, complexity of multimodal contents, and cost of time-consuming data collection process. In this study, we adopt a heterogeneous graph neural network to jointly model user characteristics, multimodal post contents as well as user-item relations in a bipartite graph to learn a comprehensive and effective user embedding without requiring ideology labels. We apply our framework to online discussions about economy and public health topics. The learned embeddings are then used to detect political ideology and understand political polarization. Our framework outperforms the unimodal, early/late fusion baselines, and homogeneous GNN frameworks by a margin of at least 9% absolute gain in the area under the receiver operating characteristic on two social media datasets. More importantly, our work does not require a time-consuming data collection process, which allows faster detection and in turn allows the policy makers to conduct analysis and design policies in time to respond to crises. We also show that our framework learns meaningful user embeddings and can help better understand political polarization. Notable differences in user descriptions, topics, images, and levels of retweet/quote activities are observed. Our framework for decoding user-content interaction shows wide applicability in understanding political polarization. Furthermore, it can be extended to user-item bipartite information networks for other applications such as content and product recommendation.

preprint2021arXiv

Understanding Patterns of Users Who Repost Censored Posts on Weibo

In this study, we focus on understanding patterns of users whose repost contents would later be censored on Weibo, a counterpart of Twitter in China as a social media platform. Little is known about the way regulations and censorship work in this indigenous platform and what role it plays in affecting users' expression of ideas. We collect over a million reposts from over 18,000 users and investigate the patterns of users whose reposts contain content that is no longer visible to the public, from the perspective of user location, device, gender, social capital as well as certified status. We find that user characteristics play an important role in affecting behaviors on Weibo.

preprint2020arXiv

In the Eyes of the Beholder: Analyzing Social Media Use of Neutral and Controversial Terms for COVID-19

During the COVID-19 pandemic, "Chinese Virus" emerged as a controversial term for coronavirus. To some, it may seem like a neutral term referring to the physical origin of the virus. To many others, however, the term is in fact attaching ethnicity to the virus. While both arguments appear reasonable, quantitative analysis of the term's real-world usage is lacking to shed light on the issues behind the controversy. In this paper, we attempt to fill this gap. To model the substantive difference of tweets with controversial terms and those with non-controversial terms, we apply topic modeling and LIWC-based sentiment analysis. To test whether "Chinese Virus" and "COVID-19" are interchangeable, we formulate it as a classification task, mask out these terms, and classify them using the state-of-the-art transformer models. Our experiments consistently show that the term "Chinese Virus" is associated with different substantive topics and sentiment compared with "COVID-19" and that the two terms are easily distinguishable by looking at their context.

preprint2020arXiv

Monitoring Depression Trend on Twitter during the COVID-19 Pandemic

The COVID-19 pandemic has severely affected people's daily lives and caused tremendous economic loss worldwide. However, its influence on people's mental health conditions has not received as much attention. To study this subject, we choose social media as our main data resource and create by far the largest English Twitter depression dataset containing 2,575 distinct identified depression users with their past tweets. To examine the effect of depression on people's Twitter language, we train three transformer-based depression classification models on the dataset, evaluate their performance with progressively increased training sizes, and compare the model's "tweet chunk"-level and user-level performances. Furthermore, inspired by psychological studies, we create a fusion classifier that combines deep learning model scores with psychological text features and users' demographic information and investigate these features' relations to depression signals. Finally, we demonstrate our model's capability of monitoring both group-level and population-level depression trends by presenting two of its applications during the COVID-19 pandemic. We hope this study can raise awareness among researchers and the general public of COVID-19's impact on people's mental health.

Hanjia Lyu

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs

American Twitter Users Revealed Social Determinants-related Oral Health Disparities amid the COVID-19 Pandemic

Learning to Aggregate and Refine Noisy Labels for Visual Sentiment Analysis

Misinformation versus Facts: Understanding the Influence of News Regarding COVID-19 Vaccines on Vaccine Uptake

Understanding Political Polarization via Jointly Modeling Users, Connections and Multimodal Contents on Heterogeneous Graphs

Understanding Patterns of Users Who Repost Censored Posts on Weibo

In the Eyes of the Beholder: Analyzing Social Media Use of Neutral and Controversial Terms for COVID-19

Monitoring Depression Trend on Twitter during the COVID-19 Pandemic