Source author record

Kenneth Joseph

Kenneth Joseph appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cs.CY Computation and Language Machine Learning Multimedia Social and Information Networks

Catalog footprint

What is connected

7works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Data-Driven Simulation of the New York State Foster Care System

We introduce an analytic pipeline to model and simulate youth trajectories through the New York state foster care system. Our goal in doing so is to forecast how proposed interventions may impact the foster care system's ability to achieve it's stated goals \emph{before these interventions are actually implemented and impact the lives of thousands of youth}. Here, we focus on two specific stated goals of the system: racial equity, and, as codified most recently by the 2018 Family First Prevention Services Act (FFPSA), a focus on keeping all youth out of foster care. We also focus on one specific potential intervention -- a predictive model, proposed in prior work and implemented elsewhere in the U.S., which aims to determine whether or not a youth is in need of care. We use our method to explore how the implementation of this predictive model in New York would impact racial equity and the number of youth in care. While our findings, as in any simulation model, ultimately rely on modeling assumptions, we find evidence that the model would not necessarily achieve either goal. Primarily, then, we aim to further promote the use of data-driven simulation to help understand the ramifications of algorithmic interventions in public systems.

preprint2022arXiv

Mutual Information Scoring: Increasing Interpretability in Categorical Clustering Tasks with Applications to Child Welfare Data

Youth in the American foster care system are significantly more likely than their peers to face a number of negative life outcomes, from homelessness to incarceration. Administrative data on these youth have the potential to provide insights that can help identify ways to improve their path towards a better life. However, such data also suffer from a variety of biases, from missing data to reflections of systemic inequality. The present work proposes a novel, prescriptive approach to using these data to provide insights about both data biases and the systems and youth they track. Specifically, we develop a novel categorical clustering and cluster summarization methodology that allows us to gain insights into subtle biases in existing data on foster youth, and to provide insight into where further (often qualitative) research is needed to identify potential ways of assisting youth.

preprint2022arXiv

NELA-Local: A Dataset of U.S. Local News Articles for the Study of County-level News Ecosystems

In this paper, we present a dataset of over 1.4M online news articles from 313 local U.S. news outlets published over 20 months (between April 4th, 2020 and December 31st, 2021). These outlets cover a geographically diverse set of communities across the United States. In order to estimate characteristics of the local audience, included with this news article data is a wide range of county-level metadata, including demographics, 2020 Presidential Election vote shares, and community resilience estimates from the U.S. Census Bureau. The NELA-Local dataset can be found at: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/GFE66K.

preprint2021arXiv

An Agent-based Model to Evaluate Interventions on Online Dating Platforms to Decrease Racial Homogamy

Perhaps the most controversial questions in the study of online platforms today surround the extent to which platforms can intervene to reduce the societal ills perpetrated on them. Up for debate is whether there exist any effective and lasting interventions a platform can adopt to address, e.g., online bullying, or if other, more far-reaching change is necessary to address such problems. Empirical work is critical to addressing such questions. But it is also challenging, because it is time-consuming, expensive, and sometimes limited to the questions companies are willing to ask. To help focus and inform this empirical work, we here propose an agent-based modeling (ABM) approach. As an application, we analyze the impact of a set of interventions on a simulated online dating platform on the lack of long-term interracial relationships in an artificial society. In the real world, a lack of interracial relationships are a critical vehicle through which inequality is maintained. Our work shows that many previously hypothesized interventions online dating platforms could take to increase the number of interracial relationships from their website have limited effects, and that the effectiveness of any intervention is subject to assumptions about sociocultural structure. Further, interventions that are effective in increasing diversity in long-term relationships are at odds with platforms' profit-oriented goals. At a general level, the present work shows the value of using an ABM approach to help understand the potential effects and side effects of different interventions that a platform could take.

preprint2020arXiv

MDR Cluster-Debias: A Nonlinear WordEmbedding Debiasing Pipeline

Existing methods for debiasing word embeddings often do so only superficially, in that words that are stereotypically associated with, e.g., a particular gender in the original embedding space can still be clustered together in the debiased space. However, there has yet to be a study that explores why this residual clustering exists, and how it might be addressed. The present work fills this gap. We identify two potential reasons for which residual bias exists and develop a new pipeline, MDR Cluster-Debias, to mitigate this bias. We explore the strengths and weaknesses of our method, finding that it significantly outperforms other existing debiasing approaches on a variety of upstream bias tests but achieves limited improvement on decreasing gender bias in a downstream task. This indicates that word embeddings encode gender bias in still other ways, not necessarily captured by upstream tests.

preprint2020arXiv

Theory In, Theory Out: The uses of social theory in machine learning for social science

Research at the intersection of machine learning and the social sciences has provided critical new insights into social behavior. At the same time, a variety of critiques have been raised ranging from technical issues with the data used and features constructed, problematic assumptions built into models, their limited interpretability, and their contribution to bias and inequality. We argue such issues arise primarily because of the lack of social theory at various stages of the model building and analysis. In the first half of this paper, we walk through how social theory can be used to answer the basic methodological and interpretive questions that arise at each stage of the machine learning pipeline. In the second half, we show how theory can be used to assess and compare the quality of different social learning models, including interpreting, generalizing, and assessing the fairness of models. We believe this paper can act as a guide for computer and social scientists alike to navigate the substantive questions involved in applying the tools of machine learning to social data.

preprint2020arXiv

When do Word Embeddings Accurately Reflect Surveys on our Beliefs About People?

Social biases are encoded in word embeddings. This presents a unique opportunity to study society historically and at scale, and a unique danger when embeddings are used in downstream applications. Here, we investigate the extent to which publicly-available word embeddings accurately reflect beliefs about certain kinds of people as measured via traditional survey methods. We find that biases found in word embeddings do, on average, closely mirror survey data across seventeen dimensions of social meaning. However, we also find that biases in embeddings are much more reflective of survey data for some dimensions of meaning (e.g. gender) than others (e.g. race), and that we can be highly confident that embedding-based measures reflect survey data only for the most salient biases.

Kenneth Joseph

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

A Data-Driven Simulation of the New York State Foster Care System

Mutual Information Scoring: Increasing Interpretability in Categorical Clustering Tasks with Applications to Child Welfare Data

NELA-Local: A Dataset of U.S. Local News Articles for the Study of County-level News Ecosystems

An Agent-based Model to Evaluate Interventions on Online Dating Platforms to Decrease Racial Homogamy

MDR Cluster-Debias: A Nonlinear WordEmbedding Debiasing Pipeline

Theory In, Theory Out: The uses of social theory in machine learning for social science

When do Word Embeddings Accurately Reflect Surveys on our Beliefs About People?