Researcher profile

Kai-Cheng Yang

Kai-Cheng Yang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2023arXiv

Social Bots: Detection and Challenges

While social media are a key source of data for computational social science, their ease of manipulation by malicious actors threatens the integrity of online information exchanges and their analysis. In this Chapter, we focus on malicious social bots, a prominent vehicle for such manipulation. We start by discussing recent studies about the presence and actions of social bots in various online discussions to show their real-world implications and the need for detection methods. Then we discuss the challenges of bot detection methods and use Botometer, a publicly available bot detection tool, as a case study to describe recent developments in this area. We close with a practical guide on how to handle social bots in social media research.

preprint2022arXiv

Botometer 101: Social bot practicum for computational social scientists

Social bots have become an important component of online social media. Deceptive bots, in particular, can manipulate online discussions of important issues ranging from elections to public health, threatening the constructive exchange of information. Their ubiquity makes them an interesting research subject and requires researchers to properly handle them when conducting studies using social media data. Therefore, it is important for researchers to gain access to bot detection tools that are reliable and easy to use. This paper aims to provide an introductory tutorial of Botometer, a public tool for bot detection on Twitter, for readers who are new to this topic and may not be familiar with programming and machine learning. We introduce how Botometer works, the different ways users can access it, and present a case study as a demonstration. Readers can use the case study code as a template for their own research. We also discuss recommended practice for using Botometer.

preprint2022arXiv

Network localization strength regulates innovation diffusion with macro-level social influence

Innovation diffusion in the networked population is an essential process that drives the progress of human society. Despite the recent advances in network science, a fundamental understanding of network properties that regulate such processes is still lacking. Focusing on an innovation diffusion model with pairwise transmission and macro-level social influence, i.e., more adopters in the networked population lead to a higher adoption tendency among the remaining individuals, we observe discontinuous phase transitions when the influence is sufficiently strong. Through extensive analyses of a large corpus of empirical networks, we show that the tricritical point depends on the network localization strength, which our newly proposed metric can effectively quantify. The metric reveals the deep connection between the critical and tricritical points and further indicates a trade-off: networks that allow less attractive products to prevail tend to yield slower diffusion and lower market penetration and verse versa. Guided by this trade-off, we demonstrate how marketers can rewire the networks to modulate product diffusion according to their needs.

preprint2022arXiv

Online misinformation is linked to early COVID-19 vaccination hesitancy and refusal

Widespread uptake of vaccines is necessary to achieve herd immunity. However, uptake rates have varied across U.S. states during the first six months of the COVID-19 vaccination program. Misbeliefs may play an important role in vaccine hesitancy, and there is a need to understand relationships between misinformation, beliefs, behaviors, and health outcomes. Here we investigate the extent to which COVID-19 vaccination rates and vaccine hesitancy are associated with levels of online misinformation about vaccines. We also look for evidence of directionality from online misinformation to vaccine hesitancy. We find a negative relationship between misinformation and vaccination uptake rates. Online misinformation is also correlated with vaccine hesitancy rates taken from survey data. Associations between vaccine outcomes and misinformation remain significant when accounting for political as well as demographic and socioeconomic factors. While vaccine hesitancy is strongly associated with Republican vote share, we observe that the effect of online misinformation on hesitancy is strongest across Democratic rather than Republican counties. Granger causality analysis shows evidence for a directional relationship from online misinformation to vaccine hesitancy. Our results support a need for interventions that address misbeliefs, allowing individuals to make better-informed health decisions.

preprint2021arXiv

Model-free hidden geometry of complex networks

The fundamental idea of embedding a network in a metric space is rooted in the principle of proximity preservation. Nodes are mapped into points of the space with pairwise distance that reflects their proximity in the network. Popular methods employed in network embedding either rely on implicit approximations of the principle of proximity preservation or implement it by enforcing the geometry of the embedding space, thus hindering geometric properties that networks may spontaneously exhibit. Here, we take advantage of a model-free embedding method explicitly devised for preserving pairwise proximity, and characterize the geometry emerging from the mapping of several networks, both real and synthetic. We show that the learned embedding has simple and intuitive interpretations: the distance of a node from the geometric center is representative for its closeness centrality, and the relative positions of nodes reflect the community structure of the network. Proximity can be preserved in relatively low-dimensional embedding spaces, and the hidden geometry displays optimal performance in guiding greedy navigation regardless of the specific network topology. We finally show that the mapping provides a natural description of contagion processes on networks, with complex spatiotemporal patterns represented by waves propagating from the geometric center to the periphery. The findings deepen our understanding of the model-free hidden geometry of complex networks.

preprint2020arXiv

How Twitter Data Sampling Biases U.S. Voter Behavior Characterizations

Online social media are key platforms for the public to discuss political issues. As a result, researchers have used data from these platforms to analyze public opinions and forecast election results. Recent studies reveal the existence of inauthentic actors such as malicious social bots and trolls, suggesting that not every message is a genuine expression from a legitimate user. However, the prevalence of inauthentic activities in social data streams is still unclear, making it difficult to gauge biases of analyses based on such data. In this paper, we aim to close this gap using Twitter data from the 2018 U.S. midterm elections. Hyperactive accounts are over-represented in volume samples. We compare their characteristics with those of randomly sampled accounts and self-identified voters using a fast and low-cost heuristic. We show that hyperactive accounts are more likely to exhibit various suspicious behaviors and share low-credibility information compared to likely voters. Random accounts are more similar to likely voters, although they have slightly higher chances to display suspicious behaviors. Our work provides insights into biased voter characterizations when using online observations, underlining the importance of accounting for inauthentic actors in studies of political issues based on social media data.

preprint2019arXiv

Bot Electioneering Volume: Visualizing Social Bot Activity During Elections

It has been widely recognized that automated bots may have a significant impact on the outcomes of national events. It is important to raise public awareness about the threat of bots on social media during these important events, such as the 2018 US midterm election. To this end, we deployed a web application to help the public explore the activities of likely bots on Twitter on a daily basis. The application, called Bot Electioneering Volume (BEV), reports on the level of likely bot activities and visualizes the topics targeted by them. With this paper we release our code base for the BEV framework, with the goal of facilitating future efforts to combat malicious bots on social media.

preprint2019arXiv

Scalable and Generalizable Social Bot Detection through Data Selection

Efficient and reliable social bot classification is crucial for detecting information manipulation on social media. Despite rapid development, state-of-the-art bot detection models still face generalization and scalability challenges, which greatly limit their applications. In this paper we propose a framework that uses minimal account metadata, enabling efficient analysis that scales up to handle the full stream of public tweets of Twitter in real time. To ensure model accuracy, we build a rich collection of labeled datasets for training and validation. We deploy a strict validation system so that model performance on unseen datasets is also optimized, in addition to traditional cross-validation. We find that strategically selecting a subset of training data yields better model accuracy and generalization than exhaustively training on all available data. Thanks to the simplicity of the proposed model, its logic can be interpreted to provide insights into social bot characteristics.