Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2022arXiv

A Survey on Bias and Fairness in Machine Learning

With the widespread use of AI systems and applications in our everyday lives, it is important to take fairness issues into consideration while designing and engineering these types of systems. Such systems can be used in many sensitive environments to make important and life-changing decisions; thus, it is crucial to ensure that the decisions do not reflect discriminatory behavior toward certain groups or populations. We have recently seen work in machine learning, natural language processing, and deep learning that addresses such challenges in different subdomains. With the commercialization of these systems, researchers are becoming aware of the biases that these applications can contain and have attempted to address them. In this survey we investigated different real-world applications that have shown biases in various ways, and we listed different sources of biases that can affect AI applications. We then created a taxonomy for fairness definitions that machine learning researchers have defined in order to avoid the existing bias in AI systems. In addition to that, we examined different domains and subdomains in AI showing what researchers have observed with regard to unfair outcomes in the state-of-the-art methods and how they have tried to address them. There are still many future directions and solutions that can be taken to mitigate the problem of bias in AI systems. We are hoping that this survey will motivate researchers to tackle these issues in the near future by observing existing work in their respective fields.

preprint2022arXiv

Emergent Instabilities in Algorithmic Feedback Loops

Algorithms that aid human tasks, such as recommendation systems, are ubiquitous. They appear in everything from social media to streaming videos to online shopping. However, the feedback loop between people and algorithms is poorly understood and can amplify cognitive and social biases (algorithmic confounding), leading to unexpected outcomes. In this work, we explore algorithmic confounding in collaborative filtering-based recommendation algorithms through teacher-student learning simulations. Namely, a student collaborative filtering-based model, trained on simulated choices, is used by the recommendation algorithm to recommend items to agents. Agents might choose some of these items, according to an underlying teacher model, with new choices then fed back into the student model as new training data (approximating online machine learning). These simulations demonstrate how algorithmic confounding produces erroneous recommendations which in turn lead to instability, i.e., wide variations in an item's popularity between each simulation realization. We use the simulations to demonstrate a novel approach to training collaborative filtering models that can create more stable and accurate recommendations. Our methodology is general enough that it can be extended to other socio-technical systems in order to better quantify and improve the stability of algorithms. These results highlight the need to account for emergent behaviors from interactions between people and algorithms.

preprint2022arXiv

Infusing Knowledge from Wikipedia to Enhance Stance Detection

Stance detection infers a text author's attitude towards a target. This is challenging when the model lacks background knowledge about the target. Here, we show how background knowledge from Wikipedia can help enhance the performance on stance detection. We introduce Wikipedia Stance Detection BERT (WS-BERT) that infuses the knowledge into stance encoding. Extensive results on three benchmark datasets covering social media discussions and online debates indicate that our model significantly outperforms the state-of-the-art methods on target-specific stance detection, cross-target stance detection, and zero/few-shot stance detection.

preprint2022arXiv

Road Network Evolution in the Urban and Rural United States Since 1900

Road networks represent a key component of human settlements, such as cities, towns, and villages, that mediate pollution and congestion, as well as economic development. However, little is known about the long-term development trajectories of road networks in rural and urban settings. We leverage novel spatial data sources to reconstruct and analyze road networks in more than 850 US cities and over 2,500 US counties since 1900. Our analysis reveals significant variations in the structure of roads both within cities and across the conterminous US. Despite differences in the evolution of these networks, there are commonalities and strong geographic patterns. These results persist across the rural-urban continuum and are therefore not just a product of accelerated urban growth. These findings refine and extend existing knowledge and illuminate the need for policies for urban and rural planning including the critical assessment of new development trends.

preprint2021arXiv

A Model of Densifying Collaboration Networks

Research collaborations provide the foundation for scientific advances, but we have only recently begun to understand how they form and grow on a global scale. Here we analyze a model of the growth of research collaboration networks to explain the empirical observations that the number of collaborations scales superlinearly with institution size, though at different rates (heterogeneous densification), the number of institutions grows as a power of the number of researchers (Heaps' law) and institution sizes approximate Zipf's law. This model has three mechanisms: (i) researchers are preferentially hired by large institutions, (ii) new institutions trigger more potential institutions, and (iii) researchers collaborate with friends-of-friends. We show agreement between these assumptions and empirical data, through analysis of co-authorship networks spanning two centuries. We then develop a theoretical understanding of this model, which reveals emergent heterogeneous scaling such that the number of collaborations between institutions scale with an institution's size.

preprint2021arXiv

Individualized Context-Aware Tensor Factorization for Online Games Predictions

Individual behavior and decisions are substantially influenced by their contexts, such as location, environment, and time. Changes along these dimensions can be readily observed in Multiplayer Online Battle Arena games (MOBA), where players face different in-game settings for each match and are subject to frequent game patches. Existing methods utilizing contextual information generalize the effect of a context over the entire population, but contextual information tailored to each individual can be more effective. To achieve this, we present the Neural Individualized Context-aware Embeddings (NICE) model for predicting user performance and game outcomes. Our proposed method identifies individual behavioral differences in different contexts by learning latent representations of users and contexts through non-negative tensor factorization. Using a dataset from the MOBA game League of Legends, we demonstrate that our model substantially improves the prediction of winning outcome, individual user performance, and user engagement.

preprint2021arXiv

The Emergence of Heterogeneous Scaling in Research Institutions

Research institutions provide the infrastructure for scientific discovery, yet their role in the production of knowledge is not well characterized. To address this gap, we analyze interactions of researchers within and between institutions from millions of scientific papers. Our analysis reveals that the number of collaborations scales superlinearly with institution size, though at different rates (heterogeneous densification). We also find that the number of institutions scales with the number of researchers as a power law (Heaps' law) and institution sizes approximate Zipf's law. These patterns can be reproduced by a simple model with three mechanisms: (i) researchers collaborate with friends-of-friends, (ii) new institutions trigger more potential institutions, and (iii) researchers are preferentially hired by large institutions. This model reveals an economy of scale in research: larger institutions grow faster and amplify collaborations. Our work provides a new understanding of emergent behavior in research institutions and how they facilitate innovation.

preprint2020arXiv

Challenges in Forecasting Malicious Events from Incomplete Data

The ability to accurately predict cyber-attacks would enable organizations to mitigate their growing threat and avert the financial losses and disruptions they cause. But how predictable are cyber-attacks? Researchers have attempted to combine external data -- ranging from vulnerability disclosures to discussions on Twitter and the darkweb -- with machine learning algorithms to learn indicators of impending cyber-attacks. However, successful cyber-attacks represent a tiny fraction of all attempted attacks: the vast majority are stopped, or filtered by the security appliances deployed at the target. As we show in this paper, the process of filtering reduces the predictability of cyber-attacks. The small number of attacks that do penetrate the target's defenses follow a different generative process compared to the whole data which is much harder to learn for predictive models. This could be caused by the fact that the resulting time series also depends on the filtering process in addition to all the different factors that the original time series depended on. We empirically quantify the loss of predictability due to filtering using real-world data from two organizations. Our work identifies the limits to forecasting cyber-attacks from highly filtered data.

preprint2020arXiv

Learning Behavioral Representations from Wearable Sensors

Continuous collection of physiological data from wearable sensors enables temporal characterization of individual behaviors. Understanding the relation between an individual's behavioral patterns and psychological states can help identify strategies to improve quality of life. One challenge in analyzing physiological data is extracting the underlying behavioral states from the temporal sensor signals and interpreting them. Here, we use a non-parametric Bayesian approach to model sensor data from multiple people and discover the dynamic behaviors they share. We apply this method to data collected from sensors worn by a population of hospital workers and show that the learned states can cluster participants into meaningful groups and better predict their cognitive and psychological states. This method offers a way to learn interpretable compact behavioral representations from multivariate sensor signals.

preprint2020arXiv

Predictability limit of partially observed systems

Applications from finance to epidemiology and cyber-security require accurate forecasts of dynamic phenomena, which are often only partially observed. We demonstrate that a system's predictability degrades as a function of temporal sampling, regardless of the adopted forecasting model. We quantify the loss of predictability due to sampling, and show that it cannot be recovered by using external signals. We validate the generality of our theoretical findings in real-world partially observed systems representing infectious disease outbreaks, online discussions, and software development projects. On a variety of prediction tasks---forecasting new infections, the popularity of topics in online discussions, or interest in cryptocurrency projects---predictability irrecoverably decays as a function of sampling, unveiling fundamental predictability limits in partially observed systems.

preprint2020arXiv

Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set

At the time of this writing, the novel coronavirus (COVID-19) pandemic outbreak has already put tremendous strain on many countries' citizens, resources and economies around the world. Social distancing measures, travel bans, self-quarantines, and business closures are changing the very fabric of societies worldwide. With people forced out of public spaces, much conversation about these phenomena now occurs online, e.g., on social media platforms like Twitter. In this paper, we describe a multilingual coronavirus (COVID-19) Twitter dataset that we have been continuously collecting since January 22, 2020. We are making our dataset available to the research community (https://github.com/echen102/COVID-19-TweetIDs). It is our hope that our contribution will enable the study of online conversation dynamics in the context of a planetary-scale epidemic outbreak of unprecedented proportions and implications. This dataset could also help track scientific coronavirus misinformation and unverified rumors, or enable the understanding of fear and panic -- and undoubtedly more. Ultimately, this dataset may contribute towards enabling informed solutions and prescribing targeted policy interventions to fight this global crisis.

preprint2020arXiv

Unequal Impact and Spatial Aggregation Distort COVID-19 Growth Rates

The COVID-19 pandemic has emerged as a global public health crisis. To make decisions about mitigation strategies and to understand the disease dynamics, policy makers and epidemiologists must know how the disease is spreading in their communities. We analyze confirmed infections and deaths over multiple geographic scales to show that COVID-19's impact is highly unequal: many subregions have nearly zero infections, and others are hot spots. We attribute the effect to a Reed-Hughes-like mechanism in which disease arrives at different times and grows exponentially. Hot spots, however, appear to grow faster than neighboring subregions and dominate spatially aggregated statistics, thereby amplifying growth rates. The staggered spread of COVID-19 can also make aggregated growth rates appear higher even when subregions grow at the same rate. Public policy, economic analysis and epidemic modeling need to account for potential distortions introduced by spatial aggregation.

preprint2019arXiv

Friendship Paradox Biases Perceptions in Directed Networks

How popular a topic or an opinion appears to be in a network can be very different from its actual popularity. For example, in an online network of a social media platform, the number of people who mention a topic in their posts---i.e., its global popularity---can be dramatically different from how people see it in their social feeds---i.e., its perceived popularity---where the feeds aggregate their friends' posts. We trace the origin of this discrepancy to the friendship paradox in directed networks, which states that people are less popular than their friends (or followers) are, on average. We identify conditions on network structure that give rise to this perception bias, and validate the findings empirically using data from Twitter. Within messages posted by Twitter users in our sample, we identify topics that appear more frequently within the users' social feeds, than they do globally, i.e., among all posts. In addition, we present a polling algorithm that leverages the friendship paradox to obtain a statistically efficient estimate of a topic's global prevalence from biased perceptions of individuals. We characterize the bias of the polling estimate, provide an upper bound for its variance, and validate the algorithm's efficiency through synthetic polling experiments on our Twitter data. Our paper elucidates the non-intuitive ways in which the structure of directed networks can distort social perceptions and resulting behaviors.