Source author record

Kristian Lum

Kristian Lum appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications cs.CY Machine Learning Methodology Artificial Intelligence Information Retrieval Populations and Evolution Social and Information Networks stat.OT

Catalog footprint

What is connected

14works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Characterizing patterns in police stops by race in Minneapolis from 2016-2021

The murder of George Floyd centered Minneapolis, Minnesota, in conversations on racial injustice in the US. We leverage open data from the Minneapolis Police Department to analyze individual, geographic, and temporal patterns in more than 170,000 police stops since 2016. We evaluate person and vehicle searches at the individual level by race using generalized estimating equations with neighborhood clustering, directly addressing neighborhood differences in police activity. Minneapolis exhibits clear patterns of disproportionate policing by race, wherein Black people are searched at higher rates compared to White people. Temporal visualizations indicate that police stops declined following the murder of George Floyd. This analysis provides contemporary evidence on the state of policing for a major metropolitan area in the United States.

preprint2022arXiv

De-biasing "bias" measurement

When a model's performance differs across socially or culturally relevant groups--like race, gender, or the intersections of many such groups--it is often called "biased." While much of the work in algorithmic fairness over the last several years has focused on developing various definitions of model fairness (the absence of group-wise model performance disparities) and eliminating such "bias," much less work has gone into rigorously measuring it. In practice, it important to have high quality, human digestible measures of model performance disparities and associated uncertainty quantification about them that can serve as inputs into multi-faceted decision-making processes. In this paper, we show both mathematically and through simulation that many of the metrics used to measure group-wise model performance disparities are themselves statistically biased estimators of the underlying quantities they purport to represent. We argue that this can cause misleading conclusions about the relative group-wise model performance disparities along different dimensions, especially in cases where some sensitive variables consist of categories with few members. We propose the "double-corrected" variance estimator, which provides unbiased estimates and uncertainty quantification of the variance of model performance across groups. It is conceptually simple and easily implementable without statistical software package or numerical optimization. We demonstrate the utility of this approach through simulation and show on a real dataset that while statistically biased estimators of group-wise model performance disparities indicate statistically significant differences, when accounting for statistical bias in the estimator, the estimated between-group disparities are no longer statistically significant.

preprint2022arXiv

Flipping the Script on Criminal Justice Risk Assessment: An actuarial model for assessing the risk the federal sentencing system poses to defendants

In the criminal justice system, algorithmic risk assessment instruments are used to predict the risk a defendant poses to society; examples include the risk of recidivating or the risk of failing to appear at future court dates. However, defendants are also at risk of harm from the criminal justice system. To date, there exists no risk assessment instrument that considers the risk the system poses to the individual. We develop a risk assessment instrument that "flips the script." Using data about U.S. federal sentencing decisions, we build a risk assessment instrument that predicts the likelihood an individual will receive an especially lengthy sentence given factors that should be legally irrelevant to the sentencing decision. To do this, we develop a two-stage modeling approach. Our first-stage model is used to determine which sentences were "especially lengthy." We then use a second-stage model to predict the defendant's risk of receiving a sentence that is flagged as especially lengthy given factors that should be legally irrelevant. The factors that should be legally irrelevant include, for example, race, court location, and other socio-demographic information about the defendant. Our instrument achieves comparable predictive accuracy to risk assessment instruments used in pretrial and parole contexts. We discuss the limitations of our modeling approach and use the opportunity to highlight how traditional risk assessment instruments in various criminal justice settings also suffer from many of the same limitations and embedded value systems of their creators.

preprint2022arXiv

It's COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks

Risk assessment instrument (RAI) datasets, particularly ProPublica's COMPAS dataset, are commonly used in algorithmic fairness papers due to benchmarking practices of comparing algorithms on datasets used in prior work. In many cases, this data is used as a benchmark to demonstrate good performance without accounting for the complexities of criminal justice (CJ) processes. However, we show that pretrial RAI datasets can contain numerous measurement biases and errors, and due to disparities in discretion and deployment, algorithmic fairness applied to RAI datasets is limited in making claims about real-world outcomes. These reasons make the datasets a poor fit for benchmarking under assumptions of ground truth and real-world impact. Furthermore, conventional practices of simply replicating previous data experiments may implicitly inherit or edify normative positions without explicitly interrogating value-laden assumptions. Without context of how interdisciplinary fields have engaged in CJ research and context of how RAIs operate upstream and downstream, algorithmic fairness practices are misaligned for meaningful contribution in the context of CJ, and would benefit from transparent engagement with normative considerations and values related to fairness, justice, and equality. These factors prompt questions about whether benchmarks for intrinsically socio-technical systems like the CJ system can exist in a beneficial and ethical way.

preprint2022arXiv

Measuring and mitigating voting access disparities: a study of race and polling locations in Florida and North Carolina

Voter suppression and associated racial disparities in access to voting are long-standing civil rights concerns in the United States. Barriers to voting have taken many forms over the decades. A history of violent explicit discouragement has shifted to more subtle access limitations that can include long lines and wait times, long travel times to reach a polling station, and other logistical barriers to voting. Our focus in this work is on quantifying disparities in voting access pertaining to the overall time-to-vote, and how they could be remedied via a better choice of polling location or provisioning more sites where voters can cast ballots. However, appropriately calibrating access disparities is difficult because of the need to account for factors such as population density and different community expectations for reasonable travel times. In this paper, we quantify access to polling locations, developing a methodology for the calibrated measurement of racial disparities in polling location "load" and distance to polling locations. We apply this methodology to a study of real-world data from Florida and North Carolina to identify disparities in voting access from the 2020 election. We also introduce algorithms, with modifications to handle scale, that can reduce these disparities by suggesting new polling locations from a given list of identified public locations (including schools and libraries). Applying these algorithms on the 2020 election location data also helps to expose and explore tradeoffs between the cost of allocating more polling locations and the potential impact on access disparities. The developed voting access measurement methodology and algorithmic remediation technique is a first step in better polling location assignment.

preprint2022arXiv

Measuring Disparate Outcomes of Content Recommendation Algorithms with Distributional Inequality Metrics

The harmful impacts of algorithmic decision systems have recently come into focus, with many examples of systems such as machine learning (ML) models amplifying existing societal biases. Most metrics attempting to quantify disparities resulting from ML algorithms focus on differences between groups, dividing users based on demographic identities and comparing model performance or overall outcomes between these groups. However, in industry settings, such information is often not available, and inferring these characteristics carries its own risks and biases. Moreover, typical metrics that focus on a single classifier's output ignore the complex network of systems that produce outcomes in real-world settings. In this paper, we evaluate a set of metrics originating from economics, distributional inequality metrics, and their ability to measure disparities in content exposure in a production recommendation system, the Twitter algorithmic timeline. We define desirable criteria for metrics to be used in an operational setting, specifically by ML practitioners. We characterize different types of engagement with content on Twitter using these metrics, and use these results to evaluate the metrics with respect to the desired criteria. We show that we can use these metrics to identify content suggestion algorithms that contribute more strongly to skewed outcomes between users. Overall, we conclude that these metrics can be useful tools for understanding disparate outcomes in online social networks.

preprint2022arXiv

Random Isn't Always Fair: Candidate Set Imbalance and Exposure Inequality in Recommender Systems

Traditionally, recommender systems operate by returning a user a set of items, ranked in order of estimated relevance to that user. In recent years, methods relying on stochastic ordering have been developed to create "fairer" rankings that reduce inequality in who or what is shown to users. Complete randomization -- ordering candidate items randomly, independent of estimated relevance -- is largely considered a baseline procedure that results in the most equal distribution of exposure. In industry settings, recommender systems often operate via a two-step process in which candidate items are first produced using computationally inexpensive methods and then a full ranking model is applied only to those candidates. In this paper, we consider the effects of inequality at the first step and show that, paradoxically, complete randomization at the second step can result in a higher degree of inequality relative to deterministic ordering of items by estimated relevance scores. In light of this observation, we then propose a simple post-processing algorithm in pursuit of reducing exposure inequality that works both when candidate sets have a high level of imbalance and when they do not. The efficacy of our method is illustrated on both simulated data and a common benchmark data set used in studying fairness in recommender systems.

preprint2021arXiv

Closer than they appear: A Bayesian perspective on individual-level heterogeneity in risk assessment

Risk assessment instruments are used across the criminal justice system to estimate the probability of some future behavior given covariates. The estimated probabilities are then used in making decisions at the individual level. In the past, there has been controversy about whether the probabilities derived from group-level calculations can meaningfully be applied to individuals. Using Bayesian hierarchical models applied to a large longitudinal dataset from the court system in the state of Kentucky, we analyze variation in individual-level probabilities of failing to appear for court and the extent to which it is captured by covariates. We find that individuals within the same risk group vary widely in their probability of the outcome. In practice, this means that allocating individuals to risk groups based on standard approaches to risk assessment, in large part, results in creating distinctions among individuals who are not meaningfully different in terms of their likelihood of the outcome. This is because uncertainty about the probability that any particular individual will fail to appear is large relative to the difference in average probabilities among any reasonable set of risk groups.

preprint2020arXiv

Estimating the number of SARS-CoV-2 infections and the impact of social distancing in the United States

Understanding the number of individuals who have been infected with the novel coronavirus SARS-CoV-2, and the extent to which social distancing policies have been effective at limiting its spread, are critical for effective policy going forward. Here we present estimates of the extent to which confirmed cases in the United States undercount the true number of infections, and analyze how effective social distancing measures have been at mitigating or suppressing the virus. Our analysis uses a Bayesian model of COVID-19 fatalities with a likelihood based on an underlying differential equation model of the epidemic. We provide analysis for four states with significant epidemics: California, Florida, New York, and Washington. Our short-term forecasts suggest that these states may be following somewhat different trajectories for growth of the number of cases and fatalities.

preprint2020arXiv

The impact of overbooking on a pre-trial risk assessment tool

Pre-trial risk assessment tools are used to make recommendations to judges about appropriate conditions of pre-trial supervision for people who have been arrested. Increasingly, there is concern about whether these models are operating fairly, including concerns about whether the models' input factors are fair measures of one's criminal activity. In this paper, we assess the impact of booking charges that do not result in a conviction on a popular risk assessment tool, the Arnold Public Safety Assessment. Using data from a pilot run of the tool in San Francisco, CA, we find that booking charges that do not result in a conviction (i.e. charges that are dropped or end in an acquittal) increased the recommended level of pre-trial supervision in around 27% of cases evaluated by the tool

preprint2016arXiv

A statistical framework for fair predictive algorithms

Predictive modeling is increasingly being employed to assist human decision-makers. One purported advantage of replacing human judgment with computer models in high stakes settings-- such as sentencing, hiring, policing, college admissions, and parole decisions-- is the perceived "neutrality" of computers. It is argued that because computer models do not hold personal prejudice, the predictions they produce will be equally free from prejudice. There is growing recognition that employing algorithms does not remove the potential for bias, and can even amplify it, since training data were inevitably generated by a process that is itself biased. In this paper, we provide a probabilistic definition of algorithmic bias. We propose a method to remove bias from predictive models by removing all information regarding protected variables from the permitted training data. Unlike previous work in this area, our framework is general enough to accommodate arbitrary data types, e.g. binary, continuous, etc. Motivated by models currently in use in the criminal justice system that inform decisions on pre-trial release and paroling, we apply our proposed method to a dataset on the criminal histories of individuals at the time of sentencing to produce "race-neutral" predictions of re-arrest. In the process, we demonstrate that the most common approach to creating "race-neutral" models-- omitting race as a covariate-- still results in racially disparate predictions. We then demonstrate that the application of our proposed method to these data removes racial disparities from predictions with minimal impact on predictive accuracy.

preprint2016arXiv

Estimating the observable population size from biased samples: a new approach to population estimation with capture heterogeneity

Capture-recapture methods aim to estimate the size of a closed population on the basis of multiple incomplete enumerations of individuals. In many applications, the individual probability of being recorded is heterogeneous in the population. Previous studies have suggested that it is not possible to reliably estimate the total population size when capture heterogeneity exists. Here we approach population estimation in the presence of capture heterogeneity as a latent length biased nonparametric density estimation problem on the unit interval. We show that in this setting it is generally impossible to estimate the density on the entire unit interval in finite samples, and that estimators of the population size have high and sometimes unbounded risk when the density has significant mass near zero. As an alternative, we propose estimating the population of individuals with capture probability exceeding some threshold. We provide methods for selecting an appropriate threshold, and show that this approach results in estimators with substantially lower risk than estimators of the total population size, with correspondingly smaller uncertainty, even when the parameter of interest is the total population. The alternative paradigm is demonstrated in extensive simulation studies and an application to snowshoe hare multiple recapture data.

preprint2013arXiv

An agent-based epidemiological model of incarceration

We build an agent-based model of incarceration based on the SIS model of infectious disease propagation. Our central hypothesis is that the observed racial disparities in incarceration rates between Black and White Americans can be explained as the result of differential sentencing between the two demographic groups. We demonstrate that if incarceration can be spread through a social influence network, then even relatively small differences in sentencing can result in the large disparities in incarceration rates. Controlling for effects of transmissibility, susceptibility, and influence network structure, our model reproduces the observed large disparities in incarceration rates given the differences in sentence lengths for White and Black drug offenders in the United States without extensive parameter tuning. We further establish the suitability of the SIS model as applied to incarceration, as the observed structural patterns of recidivism are an emergent property of the model. In fact, our model shows a remarkably close correspondence with California incarceration data, without requiring any parameter tuning. This work advances efforts to combine the theories and methods of epidemiology and criminology.

preprint2012arXiv

Bayesian variable selection for spatially dependent generalized linear models

Despite the abundance of methods for variable selection and accommodating spatial structure in regression models, there is little precedent for incorporating spatial dependence in covariate inclusion probabilities for regionally varying regression models. The lone existing approach is limited by difficult computation and the requirement that the spatial dependence be represented on a lattice, making this method inappropriate for areal models with irregular structures that often arise in ecology, epidemiology, and the social sciences. Here we present a novel method for spatial variable selection in areal generalized linear models that can accommodate arbitrary spatial structures and works with a broad subset of GLM likelihoods. The method uses a latent probit model with a spatial dependence structure where the binary response is taken as a covariate inclusion indicator for area-specific GLMs. The covariate inclusion indicators arise via thresholding of latent standard normals on which we place a conditionally autoregressive prior. We propose an efficient MCMC algorithm for computation that is entirely conjugate in any model with a conditionally Gaussian representation of the likelihood, thereby encompassing logistic, probit, multinomial probit and logit, Gaussian, and negative binomial regressions through the use of existing data augmentation methods. We demonstrate superior parameter recovery and prediction in simulation studies as well as in applications to geographic voting patterns and population estimation. Though the method is very broadly applicable, we note in particular that prior to this work, spatial population estimation/capture-recapture models allowing for varying list dependence structures has not been possible.

Kristian Lum

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Characterizing patterns in police stops by race in Minneapolis from 2016-2021

De-biasing "bias" measurement

Flipping the Script on Criminal Justice Risk Assessment: An actuarial model for assessing the risk the federal sentencing system poses to defendants

It's COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks

Measuring and mitigating voting access disparities: a study of race and polling locations in Florida and North Carolina

Measuring Disparate Outcomes of Content Recommendation Algorithms with Distributional Inequality Metrics

Random Isn't Always Fair: Candidate Set Imbalance and Exposure Inequality in Recommender Systems

Closer than they appear: A Bayesian perspective on individual-level heterogeneity in risk assessment

Estimating the number of SARS-CoV-2 infections and the impact of social distancing in the United States

The impact of overbooking on a pre-trial risk assessment tool

A statistical framework for fair predictive algorithms

Estimating the observable population size from biased samples: a new approach to population estimation with capture heterogeneity

An agent-based epidemiological model of incarceration

Bayesian variable selection for spatially dependent generalized linear models